Thesis icon

Thesis

Developing object perception in the low data regime

Abstract:

Objects are central to human perception and understanding of the world. There is an abundance of images available on the internet covering the vast number of objects in the world, however, labelling these images exhaustively to cover all objects is infeasible—limiting the utility of systems requiring strong supervision through large labelled datasets. To address this issue, this thesis develops methods to enable novel objects to be learnt with limited use of manually labelled data.

First, we consider the problem of few-shot object detection, which is the problem of learning to expand the set of objects which can be detected with only a few manually labelled examples. We show that the few examples available for novel categories can be used to accurately pseudo-label existing data to yield a large number of novel pseudo-annotations for further detector training.

Second, we address the more challenging problem of open-vocabulary object detection, which requires learning to detect novel object categories with no annotated data. We demonstrate the utility of detailed natural language descriptions to provide additional visual information for novel object detection. Moreover, we show that visual exemplars can be aggregated and combined with object descriptions to yield multi-modal classifiers for superior novel object detection.

Finally, we consider the problem of object hallucinations in large vision-language models. We propose an automatic method to evaluate the presence of object hallucinations in detailed natural language descriptions of images generated by large vision-language models. We make use of language models and labelled detection data to automatically and robustly analyse the presence of object hallucinations in generated descriptions.

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Sub department:
Engineering Science
Research group:
Visual Geometry Group (VGG)
Oxford college:
Wadham College
Role:
Author

Contributors

Institution:
University of Texas at Austin
Role:
Contributor
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Sub department:
Engineering Science
Research group:
Visual Geometry Group (VGG)
Role:
Supervisor
ORCID:
0000-0002-8945-8573
Role:
Supervisor
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Sub department:
Engineering Science
Research group:
Torr Vision Group
Oxford college:
St Catherine's College
Role:
Examiner


More from this funder
Funder identifier:
https://ror.org/0439y7842
Grant:
EP/L015897/1
Programme:
EPSRC CDT in Autonomous Intelligent Machines and Systems


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Subjects:
Deposit date:
2024-04-18

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP