Thesis
Towards human-like natural language understanding with language models
- Abstract:
- In recent years, language models (LMs) have been established as the highest-capability models for most natural language tasks. However, beyond the invention of the transformer architecture, most progress has been made through scaling the model and data size (Radford et al., 2018, 2019; Brown et al., 2020; OpenAI, 2023). The scaling made it possible for these models to be on par or better than humans in standard natural language benchmarks. However, version after version of those models have remained different and inferior to humans in their reasoning capabilities, explainability and learning capabilities. Research in natural language explanations (NLEs) (Hendricks et al., 2016) has been behind the research on neural-network-based LMs (Bengio et al., 2003), in part due to its much later start. Furthermore, LMs are still trained via backpropagation, which is less efficient and fundamentally different from how the human brain works. In this thesis, I present my progress in making LMs more human-like, both in terms of natural language understanding and in terms of their biological plausibility. First, I explore a very challenging set of problems that test natural language understanding, namely, hard cases of pronoun resolution such as the Winograd Schema Challenge. In particular, I introduce improvements to training LMs for pronoun resolution via synthetic training datasets, specialised loss functions and by comparing task reformulations. Second, I use LMs to generate NLEs on commonsense reasoning tasks such as hard cases of pronoun resolution and commonsense validation. I demonstrate that LMs can be used for the efficient transfer of NLEs between domains, while obtaining high downstream accuracy. Finally, I explore using more biologically plausible predictive-coding-based training methods for LMs, which may be the future of deep learning beyond backpropagation (Millidge et al., 2022). I show the first-ever application of these methods to training LMs. I show the best ways to implement them, study their scalability, determine the best method to use, and show competitive results with backpropagation for small LMs.
Actions
Authors
Contributors
+ Lukasiewicz, T
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Computer Science
- Role:
- Supervisor
- ORCID:
- 0000-0002-7644-1668
+ Camburu, O
- Role:
- Supervisor
+ Calinescu, A
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Computer Science
- Role:
- Examiner
+ Minervini, P
- Role:
- Examiner
+ University of Oxford
More from this funder
- Funder identifier:
- https://ror.org/052gg0110
- Funding agency for:
- Yordanov, Y
- Programme:
- Department of Computer Science Scholarship for tuition fees
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Subjects:
- Deposit date:
-
2024-10-23
Terms of use
- Copyright holder:
- Yordanov, Y
- Copyright date:
- 2024
If you are the owner of this record, you can report an update to it here: Report update to this record