Thesis icon

Thesis

Advancing data-efficient deep learning: non-parametric transformers, active testing, and in-context learning

Abstract:

The creation of ever larger datasets has played an important role in the practical success of deep learning. Unfortunately, in many real-world scenarios, high quality data may be scarce, such that the naive application of deep learning can fall short of expectations. A large variety of prior work aims to remedy this and make deep learning more data-efficient. Such approaches typically rely on one or more of the following high-level strategies: they adjust the model architecture or training to make better use of the existing data, actively control the data creation process to obtain more useful data in the first place, or leverage data from other, indirectly relevant, tasks. In the best case, these methods can drastically improve the performance of deep learning in small data regimes. However, the problem of data-efficiency in deep learning is far from solved and many challenges remain.

This thesis proposes and studies four different approaches to data-efficient deep learning, advancing the state-of-the-art by questioning assumptions made commonly in approaches for data-efficiency. Firstly, we propose Non-Parametric Transformers (NPTs), a data-efficient deep learning architecture that takes the entire dataset as input. This deviates from common deep learning practice and allows NPTs to learn to predict by directly reasoning about interactions between datapoints. NPTs achieve impressive performance, especially on small tabular datasets, where deep learning methods have previously struggled. Secondly, we turn to data-efficiency for model evaluation. While active learning methods reduce the number of labels needed for model training, the cost of labeling for model evaluation is typically ignored without good justification. We address this by introducing two different approaches that allow for label-efficient model evaluation by actively labeling only an informative subset of the datapoints to construct specialized estimates of model performance. Thirdly, we investigate the ability of in-context learning (ICL) in large language models to learn label relationships. There has been significant discussion in the literature about the extent to which ICL actually leverages label information. Our careful study provides novel insights into ICL, revealing both capabilities and limitations.

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Supervisor
ORCID:
0000-0002-2733-2078
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Role:
Supervisor
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Role:
Examiner


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Keywords:
Subjects:
Deposit date:
2025-08-02

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP