Thesis
Universal in-context approximation
- Abstract:
-
The explosive rise of large language model capabilities has shifted research and practice from task-specific training to in-context learning via prompting. This raises a fundamental question: can a model with fixed weights solve novel tasks as effectively as a fine-tuned one? This thesis investigates the theoretical capabilities and limitations of in-context learning across major sequence model architectures, introducing the formal notion of universal in-context approximation, where a single model can approximate any function from a given class by only selecting an appropriate input prompt.
Our investigation begins by exploring the limitations of prompting in transformers. We first prove a significant restriction: prompting and prefix-tuning are fundamentally incapable of changing a model’s learned attention patterns over the user-provided content. This finding initially suggests that prompting is strictly less powerful than fine-tuning. However, we then demonstrate the contrary: transformers are universal in-context approximators. This thesis resolves the apparent contradiction by showing that universality is achieved not by altering attention over the content, but by leveraging the transformer’s attention mechanism’s ability to approximate smooth functions to arbitrary precision. Most notably, the model size is constant in the target precision.
Furthermore, we extend this inquiry beyond attention-based models to fully recurrent architectures, including RNNs, LSTMs and modern State Space Models (SSMs). As these models lack an attention mechanism, we develop a different method for proving their in-context capabilities: a compiler that translates high-level procedural programs into the parameters of recurrent models. Using this framework, we prove that these fully recurrent architectures are also universal in-context approximators, possibly even more efficiently so than the transformer.
Collectively, these results establish that, from a representational standpoint, in-context learning can be as expressive as full model retraining. This work provides a rigorous foundation for understanding the emergent capabilities of large-scale models and shows that, at least in theory, a carefully crafted prompt can go a long way.
Actions
Access Document
- Files:
-
-
(Preview, Dissemination version, pdf, 17.0MB, Terms of use)
-
Authors
+ Engineering and Physical Sciences Research Council
More from this funder
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- EP/S024050/1
- Programme:
- EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2025-12-20
- ARK identifier:
Terms of use
- Copyright holder:
- Aleksandar Petrov
- Copyright date:
- 2025
If you are the owner of this record, you can report an update to it here: Report update to this record