Thesis icon

Thesis

Visual understanding of the physical world

Abstract:
We are living in a 3D physical world, and a first step towards artificial general intelligence is to enable machines to understand the physical world. This is the goal of the thesis, and it is structured around three themes: (1) understanding occlusion and occlusion handling, (2) understanding the 3D and physical properties of the scene, and (3) bridging visual understanding with language. For all the themes, we build our method on top of large-scale pre-trained models or their representations.

For occlusion understanding and handling, we design a tri-layer plugin for conventional pre-trained object detectors to improve the performance of object detection and instance segmentation under occlusion. As an additional contribution on occlusion, we advance the amodal completion model to recover the complete shape of occluded objects, by utilising the prior of pre-trained Stable Diffusion model.

For 3D physical understanding, we start with static 3D physical properties in images. To this end, we set up a protocol to probe large-scale pre-trained visual foundation models for the understanding of such properties. Additionally, we also study dynamic 3D physical properties in videos, and explore predicting these properties from different types of large-scale pre-trained video foundation models.

For visual-language understanding, we focus on improving visual-language foundation models. On the CLIP-like large-scale pre-trained models, we improve their performance for text-to-image retrieval by introducing a learnable prompt for the visual encoder conditioned on the text; On the ChatGPT-like large-scale pre-trained models, we improve their performance and efficiency for visual grounding by equipping a small model with multi-modal reasoning capability.

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Supervisor
ORCID:
0000-0002-8945-8573
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Supervisor


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP