Thesis
Learning 3D information from large image collections
- Abstract:
-
Photos and videos, the most popular ways for us to capture the environment around us, are 2D-pixel representations that contain implicit yet rich 3D information.
As 2D images are much easier to capture than 3D data, the past decade of technological advance has catalyzed the creation of image datasets that are much larger and more diverse compared to their 3D counterparts. This has led to significant improvements in 2D image recognition and generation tasks but much more limited improvements in 3D-aware computer vision problems.
In this thesis, we attempt to isolate and extract 3D information from large image datasets with very little 3D data for assistance. Specifically, we explore large image-pretrained models, both for recognition and generation tasks, and focus on how we can extract three types of 3D information: 1) geometry 2) continuous movement-based attributes (e.g., camera motion, time-of-day lighting, non-rigid object motion), and 3) materials.
In Chapter 3, we present 3DMiner, an end-to-end pipeline to obtain geometry from a large set of unannotated image collections. In Chapter 4, we present Continuous 3D Words, a way to extract continuous, 3D-aware motions like time-of-day illumination or camera parameters and further control them during image generation and editing. In Chapter 5, we show that generative models trained on large image datasets can implicitly extract and transfer materials from one exemplar to another image, without the need for any further finetuning.
Overall, this thesis shows that, with minimal-to-none 3D data and model training, these 3D-aware attributes can be disentangled from the complex information presented in images. The resulting features are beneficial to a wide range of generation and reconstruction tasks.
Actions
Access Document
- Files:
-
-
(Preview, Dissemination version, pdf, 9.5MB, Terms of use)
-
Authors
Contributors
+ Trigoni, N
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Computer Science
- Oxford college:
- Kellogg College
- Role:
- Supervisor
+ Markham, A
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Computer Science
- Oxford college:
- Kellogg College
- Role:
- Supervisor
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2026-02-10
- ARK identifier:
Terms of use
- Copyright holder:
- Ta-Ying Cheng
- Copyright date:
- 2024
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record