Conference item icon

Conference item

A light touch approach to teaching transformers multi-view geometry

Abstract:
Transformers are powerful visual learners, in large part due to their conspicuous lack of manually-specified priors. This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propose a “light touch” approach, guiding visual Transformers to learn multiple-view geometry but allowing them to break free when needed. We achieve this by using epipolar lines to guide the Transformer's cross-attention maps during training, penalizing attention values outside the epipolar lines and encouraging higher attention along these lines since they contain geometrically plausible matches. Unlike previous methods, our proposal does not require any camera pose information at test-time. We focus on pose-invariant object instance retrieval, where standard Transformer networks struggle, due to the large differences in viewpoint between query and retrieved images. Experimentally, our method outperforms state-of-the-art approaches at object retrieval, without needing pose information at test-time.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1109/CVPR52729.2023.00480

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
Brasenose College
Role:
Author
ORCID:
0000-0002-8945-8573


Publisher:
IEEE
Host title:
Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR 2023)
Pages:
4958-4969
Publication date:
2023-08-22
Acceptance date:
2023-02-27
Event title:
Conference on Computer Vision and Pattern Recognition (CVPR 2023)
Event location:
Vancouver, Canada
Event website:
https://cvpr2023.thecvf.com/
Event start date:
2023-06-18
Event end date:
2023-06-22
DOI:
EISSN:
2575-7075
ISSN:
1063-6919
EISBN:
979-8-3503-0129-8
ISBN:
979-8-3503-0130-4


Language:
English
Keywords:
Pubs id:
1335384
Local pid:
pubs:1335384
Deposit date:
2023-04-03
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP