Conference item
Pre-training concept frequency is predictive of CLIP zero-shot performance
- Abstract:
- Web-crawled pre-training datasets are speculated to be key drivers of zero-shot generalization abilities of Vision-Language Models (VLMs) like CLIP, across a range of downstream classification and retrieval tasks, spanning diverse visual concepts. However, it is unclear how meaningful the term “zero-shot” generalization is for CLIP, as its pre-training datasets (e.g., YFCC-15M, LAION-2B etc.) likely contain many samples of the “zero-shot” concept. To study this, for the first time, we analyze the composition of concepts in the pre-training datasets of CLIP. We robustly demonstrate that far from being “zero-shot”, CLIP’s zero-shot classification performance is strongly predictable by the frequency of a concept seen during pre-training. Precisely, the downstream zero-shot performance improves linearly as the pre-training concept frequency grows exponentially i.e., they follow a log-linear scaling trend. Our data-centric investigation further highlights two key findings: (1) The extreme “data-hunger” of CLIP, i.e., growing inability of “zero-shot” prediction on long-tailed concepts, and (2) A surprising degree of mis-alignment across image-text pairs in the pre-training datasets.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 1.6MB, Terms of use)
-
- Publication website:
- https://openreview.net/forum?id=55iCzZ1TtD
Authors
- Publisher:
- OpenReview
- Host title:
- Proceedings of the ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models
- Article number:
- 65
- Publication date:
- 2024-03-04
- Acceptance date:
- 2024-01-15
- Event title:
- ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models (DPFM)
- Event location:
- Vienna, Austria
- Event website:
- https://sites.google.com/view/dpfm-iclr24/home
- Event start date:
- 2024-05-11
- Event end date:
- 2024-05-11
- Language:
-
English
- Keywords:
- Pubs id:
-
2007449
- Local pid:
-
pubs:2007449
- Deposit date:
-
2024-06-10
Terms of use
- Copyright holder:
- Udandarao et al.
- Copyright date:
- 2024
- Rights statement:
- © 2024 The Author(s). This work is licensed under a Creative Commons Attribution 4.0 License.
- Notes:
- This paper was presented at the ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models),11 May 2024, Vienna, Austria. It is available online from OpenReview at: https://openreview.net/forum?id=55iCzZ1TtD
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record