Thesis icon

Thesis

Learning 3D information from large image collections

Abstract:
Photos and videos, the most popular ways for us to capture the environment around us, are 2D-pixel representations that contain implicit yet rich 3D information.

As 2D images are much easier to capture than 3D data, the past decade of technological advance has catalyzed the creation of image datasets that are much larger and more diverse compared to their 3D counterparts. This has led to significant improvements in 2D image recognition and generation tasks but much more limited improvements in 3D-aware computer vision problems.

In this thesis, we attempt to isolate and extract 3D information from large image datasets with very little 3D data for assistance. Specifically, we explore large image-pretrained models, both for recognition and generation tasks, and focus on how we can extract three types of 3D information: 1) geometry 2) continuous movement-based attributes (e.g., camera motion, time-of-day lighting, non-rigid object motion), and 3) materials.

In Chapter 3, we present 3DMiner, an end-to-end pipeline to obtain geometry from a large set of unannotated image collections. In Chapter 4, we present Continuous 3D Words, a way to extract continuous, 3D-aware motions like time-of-day illumination or camera parameters and further control them during image generation and editing. In Chapter 5, we show that generative models trained on large image datasets can implicitly extract and transfer materials from one exemplar to another image, without the need for any further finetuning.

Overall, this thesis shows that, with minimal-to-none 3D data and model training, these 3D-aware attributes can be disentangled from the complex information presented in images. The resulting features are beneficial to a wide range of generation and reconstruction tasks.

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Oxford college:
St Catherine's College
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Oxford college:
Kellogg College
Role:
Supervisor
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Oxford college:
Kellogg College
Role:
Supervisor


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Keywords:
Subjects:
Deposit date:
2026-02-10
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP