Thesis icon

Thesis

Towards human-like natural language understanding with language models

Abstract:
In recent years, language models (LMs) have been established as the highest-capability models for most natural language tasks. However, beyond the invention of the transformer architecture, most progress has been made through scaling the model and data size (Radford et al., 2018, 2019; Brown et al., 2020; OpenAI, 2023). The scaling made it possible for these models to be on par or better than humans in standard natural language benchmarks. However, version after version of those models have remained different and inferior to humans in their reasoning capabilities, explainability and learning capabilities. Research in natural language explanations (NLEs) (Hendricks et al., 2016) has been behind the research on neural-network-based LMs (Bengio et al., 2003), in part due to its much later start. Furthermore, LMs are still trained via backpropagation, which is less efficient and fundamentally different from how the human brain works. In this thesis, I present my progress in making LMs more human-like, both in terms of natural language understanding and in terms of their biological plausibility. First, I explore a very challenging set of problems that test natural language understanding, namely, hard cases of pronoun resolution such as the Winograd Schema Challenge. In particular, I introduce improvements to training LMs for pronoun resolution via synthetic training datasets, specialised loss functions and by comparing task reformulations. Second, I use LMs to generate NLEs on commonsense reasoning tasks such as hard cases of pronoun resolution and commonsense validation. I demonstrate that LMs can be used for the efficient transfer of NLEs between domains, while obtaining high downstream accuracy. Finally, I explore using more biologically plausible predictive-coding-based training methods for LMs, which may be the future of deep learning beyond backpropagation (Millidge et al., 2022). I show the first-ever application of these methods to training LMs. I show the best ways to implement them, study their scalability, determine the best method to use, and show competitive results with backpropagation for small LMs.

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Supervisor
ORCID:
0000-0002-7644-1668
Role:
Supervisor
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Examiner
Role:
Examiner


More from this funder
Funder identifier:
https://ror.org/052gg0110
Funding agency for:
Yordanov, Y
Programme:
Department of Computer Science Scholarship for tuition fees


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Subjects:
Deposit date:
2024-10-23

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP