Thesis icon

Thesis

Universal in-context approximation

Abstract:
The explosive rise of large language model capabilities has shifted research and practice from task-specific training to in-context learning via prompting. This raises a fundamental question: can a model with fixed weights solve novel tasks as effectively as a fine-tuned one? This thesis investigates the theoretical capabilities and limitations of in-context learning across major sequence model architectures, introducing the formal notion of universal in-context approximation, where a single model can approximate any function from a given class by only selecting an appropriate input prompt.

Our investigation begins by exploring the limitations of prompting in transformers. We first prove a significant restriction: prompting and prefix-tuning are fundamentally incapable of changing a model’s learned attention patterns over the user-provided content. This finding initially suggests that prompting is strictly less powerful than fine-tuning. However, we then demonstrate the contrary: transformers are universal in-context approximators. This thesis resolves the apparent contradiction by showing that universality is achieved not by altering attention over the content, but by leveraging the transformer’s attention mechanism’s ability to approximate smooth functions to arbitrary precision. Most notably, the model size is constant in the target precision.

Furthermore, we extend this inquiry beyond attention-based models to fully recurrent architectures, including RNNs, LSTMs and modern State Space Models (SSMs). As these models lack an attention mechanism, we develop a different method for proving their in-context capabilities: a compiler that translates high-level procedural programs into the parameters of recurrent models. Using this framework, we prove that these fully recurrent architectures are also universal in-context approximators, possibly even more efficiently so than the transformer.

Collectively, these results establish that, from a representational standpoint, in-context learning can be as expressive as full model retraining. This work provides a rigorous foundation for understanding the emergent capabilities of large-scale models and shows that, at least in theory, a carefully crafted prompt can go a long way.

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author

Contributors

Role:
Supervisor
Role:
Supervisor


More from this funder
Funder identifier:
https://ror.org/0439y7842
Grant:
EP/S024050/1
Programme:
EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP