Conference item icon

Conference item

On pretraining data diversity for self-supervised learning

Abstract:
We explore the impact of training with more diverse datasets, characterized by the number of unique samples, on the performance of self-supervised learning (SSL) under a fixed computational budget. Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal. Notably, even with an exceptionally large pretraining data diversity achieved through methods like web crawling or diffusion-generated data, among other ways, the distribution shift remains a challenge. Our experiments are comprehensive with seven SSL methods using large-scale datasets such as ImageNet and YFCC100M amounting to over 200 GPU days. The code and trained models will be available at https: //github.com/hammoudhasan/DiversitySSL.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publisher copy:
10.1007/978-3-031-72992-8_4

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0009-0006-0259-5732
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author


Publisher:
Springer
Host title:
Computer Vision – ECCV 2024 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LVI
Pages:
54–71
Series:
Lecture Notes in Computer Science
Series number:
15114
Publication date:
2024-10-30
Acceptance date:
2024-02-26
Event title:
18th European Conference on Computer Vision (ECCV 2024)
Event location:
Seattle, WA, USA
Event website:
https://cvpr.thecvf.com/
Event start date:
2024-06-17
Event end date:
2024-06-21
DOI:
EISSN:
1611-3349
ISSN:
0302-9743
EISBN:
978-3-031-72992-8
ISBN:
978-3-031-72991-1


Language:
English
Pubs id:
2013485
Local pid:
pubs:2013485
Deposit date:
2024-07-10

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP