Conference item
Flash3D: feed-forward generalisable 3D scene reconstruction from a single image
- Abstract:
- In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a ‘foundation’ model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input. Code, models, demo, and more results are available at https://www.robots.ox.ac.uk/~vgg/research/flash3d/.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Authors
- Publisher:
- IEEE
- Host title:
- 2025 International Conference on 3D Vision (3DV)
- Pages:
- 670-681
- Acceptance date:
- 2024-11-06
- Event title:
- 12th International Conference on 3D Vision (3DV 2025)
- Event location:
- Singapore
- Event website:
- https://3dvconf.github.io/2025/
- Event start date:
- 2025-03-25
- Event end date:
- 2025-03-28
- EISSN:
-
2475-7888
- ISSN:
-
2378-3826
- EISBN:
- 9798331538514
- ISBN:
- 9798331538521
- Language:
-
English
- Pubs id:
-
2109774
- Local pid:
-
pubs:2109774
- Deposit date:
-
2025-04-09
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2025
- Rights statement:
- © 2025 IEEE
- Notes:
- This paper was presented at the 12th International Conference on 3D Vision (3DV 2025), 25th-28th March 2025, Singapore. The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record