Conference item icon

Conference item : Poster

VGGT: Visual Geometry Grounded Transformer

Abstract:
We present VGGN, a feed-forward neural network that infers directly all key 3D attributes of a scene, such as camera poses, point maps, depth maps, and 3D point tracks, from few or hundreds of its views. Unlike recent alternatives, VGGN does not need to use visual geometry optimization techniques to refine the results in post-processing, obtaining all quantities of interest directly. This approach is simple and more efficient, reconstructing hundreds of images in seconds. We train VGGN on a large number of publicly available datasets with 3D annotations and demonstrate its ability to achieve state-of-the-art results in multiple 3D tasks, including camera pose estimation, multi-view depth estimation, dense point cloud reconstruction, and 3D point tracking. This is a step forward in 3D computer vision, where models have been typically constrained to and specialized for single tasks. We extensively evaluate our method on unseen datasets to demonstrate its superior performance. We will release the code and trained model.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publisher copy:
10.1109/CVPR52734.2025.00499

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Research group:
Visual Geometry Group
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Research group:
Visual Geometry Group
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Research group:
Visual Geometry Group
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Research group:
Visual Geometry Group
Oxford college:
New College
Role:
Author
ORCID:
0000-0003-1374-2858
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Research group:
Visual Geometry Group
Role:
Author


Publisher:
IEEE
Host title:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Pages:
5294-5306
Place of publication:
Los Alamitos, California, USA
Publication date:
2025-08-13
Acceptance date:
2025-02-26
Event title:
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
Event location:
Nashville, TN, USA
Event website:
https://cvpr.thecvf.com/Conferences/2025
Event start date:
2025-06-11
Event end date:
2025-06-15
DOI:
EISSN:
2575-7075
ISSN:
1063-6919
ISBN:
9798331543648

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP