How machine learning models encode knowledge – and what we can learn from them

Schut, L

Thesis

How machine learning models encode knowledge – and what we can learn from them

Abstract:: As machine learning systems become increasingly capable--often surpassing human performance--they present both a challenge and an opportunity for understanding. These complex systems may operate in ways that are distinct from human reasoning, making them difficult to interpret. Yet, they also hold the potential to reveal new knowledge and support critical decisions in domains such as healthcare, science, and education.

This thesis examines interpretability as a means of understanding and learning from machine learning models. We focus on two central goals: (1) to understand what and how knowledge is encoded in machine learning models and (2) to extract that knowledge in a form that is meaningful and accessible to humans. To this end, we propose a structured pipeline with four stages: identifying explanation desiderata, locating encoded knowledge or concepts, verifying that the concepts influence model behaviour, and translating the concepts into a human-interpretable form. Within this framework, we adapt existing methods where appropriate and develop new ones where necessary--selecting the approach best suited to the task. Our focus is not only on methodological development but also on understanding how these methods behave and the assumptions they make.

Different parts of the thesis contribute to each stage of this pipeline. We begin by investigating how models encode knowledge: first, by analysing the linear representation hypothesis and then examining the universality of concept representations in multilingual language models. We then shift to the user-facing side of interpretability. We study how to make explanations more user-friendly by leveraging uncertainty to generate realistic and unambiguous explanations. Finally, we apply the full pipeline to develop a framework for extracting novel concepts from AlphaZero and teaching them to chess experts. This final study illustrates how interpretability can help bridge the gap between artificial and human understanding. Together, these contributions advance our understanding of and ability to learn from machine learning systems, laying the groundwork for future research at the intersection of artificial intelligence and human insight.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Schut, L. (2025). How machine learning models encode knowledge – and what we can learn from them [PhD thesis]. University of Oxford.

MLA Style

Schut, L. How Machine Learning Models Encode Knowledge – and What We Can Learn from Them. 2025. University of Oxford, PhD thesis.

Chicago Style

Schut, L. 2025. “How Machine Learning Models Encode Knowledge – and What We Can Learn from Them.” PhD thesis, University of Oxford.
Print

Access Document

Files:: Schut_2025_How_machine_learning.pdf

(Preview, Dissemination version, pdf, 38.3MB, Terms of use)

Authors

+ Schut, L More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

Contributors

+ Gal, Y

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Supervisor
ORCID:: 0000-0002-2733-2078

+ Engineering and Physical Sciences Research Council More from this funder

Funder identifier:: https://ror.org/0439y7842
Grant:: EP/S024050/1
Programme:: EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems

DOI:: 10.5287/ora-qymzdayxr
Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Keywords:: machine learning

interpretability
Subjects:: Machine learning
Deposit date:: 2025-12-26
ARK identifier:: ark:/29072/ora_c63cfe9771b84ed4a981322fb0b7a822

Terms of use

Copyright holder:: Lisa Miou Antoinette Schut

Licence:: CC Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

How machine learning models encode knowledge – and what we can learn from them

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

How machine learning models encode knowledge – and what we can learn from them

Actions

Access Document

Authors

Contributors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions