Conference item icon

Conference item

Look, listen and learn

Abstract:

We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself – the correspondence between the visual and the audio streams, and we introduce a novel “Audio-Visual Correspondence” learning task that makes use of this. Training visual and audio networks from scratch, without any additional supervision other than the raw unconstrained videos themselves...

Expand abstract
Publication status:
Published
Peer review status:
Peer reviewed
Version:
Accepted Manuscript

Actions


Access Document


Files:
Publisher copy:
10.1109/iccv.2017.73

Authors


Arandjelovic, R More by this author
More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Engineering Science
Oxford college:
Brasenose College
ORCID:
0000-0002-8945-8573
Publisher:
IEEE Publisher's website
Publication date:
2017-12-25
Acceptance date:
2017-07-17
DOI:
ISSN:
1550-5499
Pubs id:
pubs:829574
URN:
uri:1323c912-31b9-46b9-b120-32b8251dcb07
UUID:
uuid:1323c912-31b9-46b9-b120-32b8251dcb07
Local pid:
pubs:829574

Terms of use


Metrics



If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP