Advancing data-efficient deep learning: non-parametric transformers, active testing, and in-context learning

Kossen, JL

Thesis

Advancing data-efficient deep learning: non-parametric transformers, active testing, and in-context learning

Abstract:: The creation of ever larger datasets has played an important role in the practical success of deep learning. Unfortunately, in many real-world scenarios, high quality data may be scarce, such that the naive application of deep learning can fall short of expectations. A large variety of prior work aims to remedy this and make deep learning more data-efficient. Such approaches typically rely on one or more of the following high-level strategies: they adjust the model architecture or training to make better use of the existing data, actively control the data creation process to obtain more useful data in the first place, or leverage data from other, indirectly relevant, tasks. In the best case, these methods can drastically improve the performance of deep learning in small data regimes. However, the problem of data-efficiency in deep learning is far from solved and many challenges remain.

This thesis proposes and studies four different approaches to data-efficient deep learning, advancing the state-of-the-art by questioning assumptions made commonly in approaches for data-efficiency. Firstly, we propose Non-Parametric Transformers (NPTs), a data-efficient deep learning architecture that takes the entire dataset as input. This deviates from common deep learning practice and allows NPTs to learn to predict by directly reasoning about interactions between datapoints. NPTs achieve impressive performance, especially on small tabular datasets, where deep learning methods have previously struggled. Secondly, we turn to data-efficiency for model evaluation. While active learning methods reduce the number of labels needed for model training, the cost of labeling for model evaluation is typically ignored without good justification. We address this by introducing two different approaches that allow for label-efficient model evaluation by actively labeling only an informative subset of the datapoints to construct specialized estimates of model performance. Thirdly, we investigate the ability of in-context learning (ICL) in large language models to learn label relationships. There has been significant discussion in the literature about the extent to which ICL actually leverages label information. Our careful study provides novel insights into ICL, revealing both capabilities and limitations.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Kossen, J. L. (2024). Advancing data-efficient deep learning: non-parametric transformers, active testing, and in-context learning [PhD thesis]. University of Oxford.

MLA Style

Kossen, J. L. Advancing Data-Efficient Deep Learning: Non-Parametric Transformers, Active Testing, and in-Context Learning. University of Oxford, 2024.

Chicago Style

Kossen, JL. 2024. “Advancing Data-Efficient Deep Learning: Non-Parametric Transformers, Active Testing, and in-Context Learning.” PhD thesis, University of Oxford.
Share
Print