Conference item icon

Conference item

Pre-training concept frequency is predictive of CLIP zero-shot performance

Abstract:
Web-crawled pre-training datasets are speculated to be key drivers of zero-shot generalization abilities of Vision-Language Models (VLMs) like CLIP, across a range of downstream classification and retrieval tasks, spanning diverse visual concepts. However, it is unclear how meaningful the term “zero-shot” generalization is for CLIP, as its pre-training datasets (e.g., YFCC-15M, LAION-2B etc.) likely contain many samples of the “zero-shot” concept. To study this, for the first time, we analyze the composition of concepts in the pre-training datasets of CLIP. We robustly demonstrate that far from being “zero-shot”, CLIP’s zero-shot classification performance is strongly predictable by the frequency of a concept seen during pre-training. Precisely, the downstream zero-shot performance improves linearly as the pre-training concept frequency grows exponentially i.e., they follow a log-linear scaling trend. Our data-centric investigation further highlights two key findings: (1) The extreme “data-hunger” of CLIP, i.e., growing inability of “zero-shot” prediction on long-tailed concepts, and (2) A surprising degree of mis-alignment across image-text pairs in the pre-training datasets.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Publication website:
https://openreview.net/forum?id=55iCzZ1TtD

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0009-0006-0259-5732
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author


Publisher:
OpenReview
Host title:
Proceedings of the ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models
Article number:
65
Publication date:
2024-03-04
Acceptance date:
2024-01-15
Event title:
ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models (DPFM)
Event location:
Vienna, Austria
Event website:
https://sites.google.com/view/dpfm-iclr24/home
Event start date:
2024-05-11
Event end date:
2024-05-11


Language:
English
Keywords:
Pubs id:
2007449
Local pid:
pubs:2007449
Deposit date:
2024-06-10

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP