Thesis icon

Thesis

Language as interface to medical image representation learning

Abstract:
Natural language inherently encodes human knowledge and reasoning. This thesis explores how language can be used as an interface between humans and artificial intelligence (AI) approaches in medical imaging. Language can function as a supervisory signal, or constraint, to learn meaningful representations that capture the human interpretation of medical images. At the same time, language can serve as a medium through which an AI system can communicate its decision-making to users.

First, I introduce the e-ViL benchmark, which systematically assesses vision-language models' capability to generate natural language explanations (NLEs). By introducing the largest dataset to date, consolidating existing datasets, and proposing a novel architecture, I demonstrate the ability of text-based explanations to constrain model reasoning and provide an explainability interface that aligns with human cognitive processes. This work also exposes the limitations of automated metrics for evaluating NLE quality and proposes a human evaluation framework to address this.

Building upon these insights, I extend the concept of NLEs to medical imaging, specifically chest X-ray analysis. I create novel datasets by extracting explanations directly from radiology reports. Models trained on these datasets not only produce explanations that mirror radiologists' reasoning but, by emphasizing scale and chain-of-thought prompting, also exhibit improved diagnostic accuracy.

Third, to better understand the role of language as a medium to communicate AI decisions to users, I conduct a large-scale user study with 85 clinicians interacting with AI under different AI explainability (XAI) conditions. The study reveals a critical dichotomy: Clinicians strongly prefer language-based explanations but tend to overrely on them, leading to a higher number of diagnostic errors. These findings underscore the complexities and caution needed in integrating such language-based AI systems into clinical settings.

Lastly, I generalize the use of language as supervision by leveraging radiology reports to train brain MRI models from scratch. This approach addresses the scarcity of labelled medical data in this domain and leads to improved performance on a suite of downstream tasks, showcasing the broad applicability and significant potential of language-guided learning in medical imaging. This thesis advances our understanding of integrating language with medical AI models. It demonstrates substantial benefits in performance and interpretability while also highlighting critical issues to address for safe and effective clinical deployment of these approaches.

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MSD
Department:
Nuffield Department of Population Health
Sub department:
Big Data Institute - NDPH
Role:
Supervisor
ORCID:
0000-0002-8432-2511
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Supervisor
ORCID:
0000-0002-7644-1668
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Supervisor


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Deposit date:
2026-04-10
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP