Conference item

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Abstract:: We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well — for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Duygulu, P., Barnard, K., Freitas, J., & Forsyth, D. (2002). Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. 2353.

MLA Style

Duygulu, P, et al. “Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary.” vol. 2353, 2002.

Chicago Style

Duygulu, P, K Barnard, J Freitas, and D Forsyth. 2002. “Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary.” 2353.
Print

Access Document

Publisher copy:: 10.1007/3-540-47979-1_7

Authors

+ Duygulu, P More by this author

Role:: Author

+ Barnard, K More by this author

Role:: Author

+ Freitas, J More by this author

Role:: Author

+ Forsyth, D More by this author

Role:: Author

Publisher:: Springer Berlin Heidelberg
Host title:: European Conference on Computer Vision (ECCV)
Volume:: 2353
Publication date:: 2002-01-01
DOI:: 10.1007/3-540-47979-1_7
ISBN:: 9783540437482

UUID:: uuid:92e538e8-10b7-461a-9f8e-db525ab8fdf4
Local pid:: cs:7532
Deposit date:: 2015-03-31
ARK identifier:: ark:/29072/ora_92e538e810b7461a9f8edb525ab8fdf4

Terms of use

Copyright date:: 2002

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP