Visual grounding in video for unsupervised word translation

Sigurdsson, GA; Alayrac, JB; Nematzadeh, A; Smaira, L; Malinowski, M; Carreira, J; Blunsom, P; Zisserman, A

Journal article

Visual grounding in video for unsupervised word translation

Abstract:: There are thousands of actively spoken languages on Earth, but a single visual world. Grounding in this visual world has the potential to bridge the gap between all these languages. Our goal is to use visual grounding to improve unsupervised word mapping between languages. The key idea is to establish a common visual representation between two languages by learning embeddings from unpaired instructional videos narrated in the native language. Given this shared embedding we demonstrate that (i) we can map words between the languages, particularly the 'visual' words; (ii) that the shared embedding provides a good initialization for existing unsupervised text-based word translation techniques, forming the basis for our proposed hybrid visual-text mapping algorithm, MUVE; and (iii) our approach achieves superior performance by addressing the shortcomings of text-based methods - it is more robust, handles datasets with less commonality, and is applicable to low-resource languages. We apply these methods to translate words from English to French, Korean, and Japanese - all without any parallel corpora and simply by watching many videos of people speaking while doing things.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Sigurdsson, G. A., Alayrac, J. B., Nematzadeh, A., Smaira, L., Malinowski, M., Carreira, J., Blunsom, P., & Zisserman, A. (2020). Visual grounding in video for unsupervised word translation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, 10847–10856.

MLA Style

Sigurdsson, G. A., et al. “Visual Grounding in Video for Unsupervised Word Translation.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, no. 2020, IEEE, 2020, pp. 10847–56.

Chicago Style

Sigurdsson, GA, JB Alayrac, A Nematzadeh, L Smaira, M Malinowski, J Carreira, P Blunsom, and A Zisserman. 2020. “Visual Grounding in Video for Unsupervised Word Translation.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, no. 2020: 10847–56.
Share
Print

Access Document

Files:: SigurdssonetalAAM2020.pdf

(Preview, Accepted manuscript, pdf, 8.2MB, Terms of use)

Publisher copy:: 10.1109/CVPR42600.2020.01086

Authors

+ Sigurdsson, GA More by this author

Role:: Author

+ Alayrac, JB More by this author

Role:: Author

+ Nematzadeh, A More by this author

Role:: Author

+ Smaira, L More by this author

Role:: Author

+ Malinowski, M More by this author

Role:: Author

More authors...

Publisher:: IEEE
Journal:: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition More from this journal
Issue:: 2020
Pages:: 10847-10856
Publication date:: 2020-08-05
Acceptance date:: 2020-02-27
Event title:: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Event location:: Online
Event website:: http://cvpr2020.thecvf.com/
Event start date:: 2020-06-14
Event end date:: 2020-06-19
DOI:: 10.1109/CVPR42600.2020.01086
EISSN:: 2575-7075
ISSN:: 1063-6919
EISBN:: 978-1-7281-7168-5
ISBN:: 978-1-7281-7169-2

Language:: English
Keywords:: FFR
Pubs id:: 1096574
Local pid:: pubs:1096574
Deposit date:: 2020-11-13

Terms of use

Copyright holder:: IEEE
Notes:: This paper was presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14th - 19th June 2020. This is the accepted manuscript version of the article. The final version is available from IEEE at: https://doi.org/10.1109/CVPR42600.2020.01086

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Journal article

Visual grounding in video for unsupervised word translation

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Journal article

Visual grounding in video for unsupervised word translation

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions