Language as interface to medical image representation learning

Kayser, M

Collections:

Thesis

Language as interface to medical image representation learning

Abstract:: Natural language inherently encodes human knowledge and reasoning. This thesis explores how language can be used as an interface between humans and artificial intelligence (AI) approaches in medical imaging. Language can function as a supervisory signal, or constraint, to learn meaningful representations that capture the human interpretation of medical images. At the same time, language can serve as a medium through which an AI system can communicate its decision-making to users.

First, I introduce the e-ViL benchmark, which systematically assesses vision-language models' capability to generate natural language explanations (NLEs). By introducing the largest dataset to date, consolidating existing datasets, and proposing a novel architecture, I demonstrate the ability of text-based explanations to constrain model reasoning and provide an explainability interface that aligns with human cognitive processes. This work also exposes the limitations of automated metrics for evaluating NLE quality and proposes a human evaluation framework to address this.

Building upon these insights, I extend the concept of NLEs to medical imaging, specifically chest X-ray analysis. I create novel datasets by extracting explanations directly from radiology reports. Models trained on these datasets not only produce explanations that mirror radiologists' reasoning but, by emphasizing scale and chain-of-thought prompting, also exhibit improved diagnostic accuracy.

Third, to better understand the role of language as a medium to communicate AI decisions to users, I conduct a large-scale user study with 85 clinicians interacting with AI under different AI explainability (XAI) conditions. The study reveals a critical dichotomy: Clinicians strongly prefer language-based explanations but tend to overrely on them, leading to a higher number of diagnostic errors. These findings underscore the complexities and caution needed in integrating such language-based AI systems into clinical settings.

Lastly, I generalize the use of language as supervision by leveraging radiology reports to train brain MRI models from scratch. This approach addresses the scarcity of labelled medical data in this domain and leads to improved performance on a suite of downstream tasks, showcasing the broad applicability and significant potential of language-guided learning in medical imaging. This thesis advances our understanding of integrating language with medical AI models. It demonstrates substantial benefits in performance and interpretability while also highlighting critical issues to address for safe and effective clinical deployment of these approaches.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Kayser, M. (2025). Language as interface to medical image representation learning [PhD thesis]. University of Oxford.

MLA Style

Kayser, M. Language as Interface to Medical Image Representation Learning. 2025. University of Oxford, PhD thesis.

Chicago Style

Kayser, M. 2025. “Language as Interface to Medical Image Representation Learning.” PhD thesis, University of Oxford.
Print