Conference item
Geo4D: leveraging video generators for geometric 4Dscene reconstruction
- Abstract:
- We introduce Geo4D, a method to repurpose video diffusion models for monocular 3D reconstruction of dynamic scenes. By leveraging the strong dynamic priors captured by largescale pre-trained video models, Geo4D can be trained using only synthetic data while generalizing well to real data in a zero-shot manner. Geo4D predicts several complementary geometric modalities, namely point, disparity, and ray maps. We propose a new multi-modal alignment algorithm to align and fuse these modalities, as well as a sliding window approach at inference time, thus enabling robust and accurate 4D reconstruction of long videos. Extensive experiments across multiple benchmarks show that Geo4D significantly surpasses state-of-the-art video depth estimation methods.
- Publication status:
- Accepted
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 18.4MB, Terms of use)
-
Authors
- Publisher:
- IEEE
- Acceptance date:
- 2025-07-23
- Event title:
- International Conference on Computer Vision (ICCV 2025)
- Event location:
- Honolulu, Hawai'i, USA
- Event website:
- https://iccv.thecvf.com/
- Event start date:
- 2025-10-19
- Event end date:
- 2025-10-23
- Language:
-
English
- Pubs id:
-
2300211
- Local pid:
-
pubs:2300211
- Deposit date:
-
2025-10-17
- ARK identifier:
Terms of use
- Copyright date:
- 2025
- Notes:
-
This paper will be presented at the International Conference on Computer Vision (ICCV 2025), 19th-23rd October 2025, Honolulu, Hawai'i, USA.
The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record