Conference item
End-to-end learning of visual representations from uncurated instructional videos
- Abstract:
- Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video models still rely on manually annotated data. With the recent introduction of the HowTo100M dataset, narrated videos now offer the possibility of learning video representations without manual supervision. In this work we propose a new learning approach, MIL-NCE, capable of addressing mis- alignments inherent in narrated videos. With this approach we are able to learn strong video representations from scratch, without the need for any manual annotation. We evaluate our representations on a wide range of four downstream tasks over eight datasets: action recognition (HMDB-51, UCF-101, Kinetics-700), text-to- video retrieval (YouCook2, MSR-VTT), action localization (YouTube-8M Segments, CrossTask) and action segmentation (COIN). Our method outperforms all published self-supervised approaches for these tasks as well as several fully supervised baselines.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 5.1MB, Terms of use)
-
- Publisher copy:
- 10.1109/CVPR42600.2020.00990
Authors
- Publisher:
- IEEE
- Host title:
- 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Pages:
- 9876-9886
- Publication date:
- 2020-08-05
- Acceptance date:
- 2020-02-23
- Event title:
- 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Event location:
- Online
- Event website:
- https://cvpr2020.thecvf.com/
- Event start date:
- 2020-06-14
- Event end date:
- 2020-06-19
- DOI:
- EISSN:
-
2575-7075
- ISSN:
-
1063-6919
- EISBN:
- 9781728171685
- ISBN:
- 9781728171692
- Language:
-
English
- Keywords:
- Pubs id:
-
1770544
- Local pid:
-
pubs:1770544
- Deposit date:
-
2024-06-14
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2020
- Rights statement:
- © 2020 IEEE.
- Notes:
- This is the accepted manuscript version of the article. The final version is available online from IEEE at https://dx.doi.org/10.1109/cvpr42600.2020.00990
If you are the owner of this record, you can report an update to it here: Report update to this record