Learning 3D information from large image collections

Cheng, T

Thesis

Learning 3D information from large image collections

Abstract:: Photos and videos, the most popular ways for us to capture the environment around us, are 2D-pixel representations that contain implicit yet rich 3D information.

As 2D images are much easier to capture than 3D data, the past decade of technological advance has catalyzed the creation of image datasets that are much larger and more diverse compared to their 3D counterparts. This has led to significant improvements in 2D image recognition and generation tasks but much more limited improvements in 3D-aware computer vision problems.

In this thesis, we attempt to isolate and extract 3D information from large image datasets with very little 3D data for assistance. Specifically, we explore large image-pretrained models, both for recognition and generation tasks, and focus on how we can extract three types of 3D information: 1) geometry 2) continuous movement-based attributes (e.g., camera motion, time-of-day lighting, non-rigid object motion), and 3) materials.

In Chapter 3, we present 3DMiner, an end-to-end pipeline to obtain geometry from a large set of unannotated image collections. In Chapter 4, we present Continuous 3D Words, a way to extract continuous, 3D-aware motions like time-of-day illumination or camera parameters and further control them during image generation and editing. In Chapter 5, we show that generative models trained on large image datasets can implicitly extract and transfer materials from one exemplar to another image, without the need for any further finetuning.

Overall, this thesis shows that, with minimal-to-none 3D data and model training, these 3D-aware attributes can be disentangled from the complex information presented in images. The resulting features are beneficial to a wide range of generation and reconstruction tasks.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Cheng, T. (2024). Learning 3D information from large image collections [PhD thesis]. University of Oxford.

MLA Style

Cheng, T. Learning 3D Information from Large Image Collections. 2024. University of Oxford, PhD thesis.

Chicago Style

Cheng, T. 2024. “Learning 3D Information from Large Image Collections.” PhD thesis, University of Oxford.
Print

Access Document

Files:: Cheng_2024_Learning_3D_information.pdf

(Preview, Dissemination version, pdf, 9.5MB, Terms of use)

Authors

+ Cheng, T More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Oxford college:: St Catherine's College
Role:: Author

Contributors

+ Trigoni, N

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Oxford college:: Kellogg College
Role:: Supervisor

+ Markham, A

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Oxford college:: Kellogg College
Role:: Supervisor

DOI:: 10.5287/ora-5r6qddq6q
Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Keywords:: computer vision

deep learning
Subjects:: computer vision
Deposit date:: 2026-02-10
ARK identifier:: ark:/29072/ora_8355088997e24ae7989e6e6a0c51b893

Terms of use

Copyright holder:: Ta-Ying Cheng

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Learning 3D information from large image collections

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Learning 3D information from large image collections

Actions

Access Document

Authors

Contributors

Bibliographic Details

Item Description

Related Items

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions