Conference item : Poster
VGGT: Visual Geometry Grounded Transformer
- Abstract:
- We present VGGN, a feed-forward neural network that infers directly all key 3D attributes of a scene, such as camera poses, point maps, depth maps, and 3D point tracks, from few or hundreds of its views. Unlike recent alternatives, VGGN does not need to use visual geometry optimization techniques to refine the results in post-processing, obtaining all quantities of interest directly. This approach is simple and more efficient, reconstructing hundreds of images in seconds. We train VGGN on a large number of publicly available datasets with 3D annotations and demonstrate its ability to achieve state-of-the-art results in multiple 3D tasks, including camera pose estimation, multi-view depth estimation, dense point cloud reconstruction, and 3D point tracking. This is a step forward in 3D computer vision, where models have been typically constrained to and specialized for single tasks. We extensively evaluate our method on unseen datasets to demonstrate its superior performance. We will release the code and trained model.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 2.8MB, Terms of use)
-
- Publisher copy:
- 10.1109/CVPR52734.2025.00499
Authors
- Publisher:
- IEEE
- Host title:
- Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
- Pages:
- 5294-5306
- Place of publication:
- Los Alamitos, California, USA
- Publication date:
- 2025-08-13
- Acceptance date:
- 2025-02-26
- Event title:
- IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
- Event location:
- Nashville, TN, USA
- Event website:
- https://cvpr.thecvf.com/Conferences/2025
- Event start date:
- 2025-06-11
- Event end date:
- 2025-06-15
- DOI:
- EISSN:
-
2575-7075
- ISSN:
-
1063-6919
- ISBN:
- 9798331543648
- Language:
-
English
- Keywords:
- Subtype:
-
Poster
- Pubs id:
-
2099296
- Local pid:
-
pubs:2099296
- Deposit date:
-
2025-06-09
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2025
- Rights statement:
- © 2025, IEEE
- Notes:
-
This paper was presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025), 11th-15th June 2025, Nashville, TN, USA.
The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record