Thesis
Automatic classification of lung cancers from histopathology images
- Abstract:
-
Lung cancer accounts for more deaths than any other type of cancer. Currently, most lung cancers are diagnosed in symptomatic patients using CT scans and CT-guided biopsies or bronchoscopies. The latter two involve surgical excision of a small piece of tissue. To make a diagnosis, a pathologist examines the tissue under a microscope at different magnifications, noting cytological features and architectural patterns. These observations are aggregated into lung cancer subtypes, which may exhibit multiple characteristic patterns.
My research contributed to the broader DART Lung Health programme, which was based on the Targeted Lung Health Check programme conducted by NHS England. DART's main goals were to generate large datasets to enhance lung cancer diagnosis through quicker, less invasive, and more accurate methods while identifying research opportunities for treatments that could improve survival rates. I worked on automatically classifying lung cancers from histopathology images and creating an annotated histology dataset that would enable connecting histology and CT modalities. My main contributions are:
1. I developed a three-stage protocol for annotating lung cancer histology images from DART. I showed that it is possible to optimise the annotation process by selecting slides or regions with under-represented subtypes or patterns. My work resulted in a multi-centre dataset annotated to the degree unavailable in the public domain.
2. I curated a public lung cancer dataset and proposed using pretext tasks to choose promising patch-level histopathology foundation models for any custom dataset at a fraction of the computational cost of a rigorous benchmarking study. The choice of a good pretext task remains an open avenue of research.
3. I showed that incorporating prior pathology knowledge into model architecture and training pipelines enables models to learn both the dependencies between cancer subtypes and the relative importance of different regions on the whole slide images, improving the lung cancer classification performance as a result.
Actions
Access Document
- Files:
-
-
(Preview, Dissemination version, pdf, 12.2MB, Terms of use)
-
Authors
Contributors
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Engineering Science
- Sub department:
- Institute of Biomedical Engineering
- Oxford college:
- Harris Manchester College
- Role:
- Supervisor
- ORCID:
- 0000-0002-8528-8298
- Institution:
- University of Oxford
- Division:
- MSD
- Department:
- Oncology
- Role:
- Supervisor
- Institution:
- University of Oxford
- Division:
- MSD
- Department:
- Women's & Reproductive Health
- Role:
- Examiner
- ORCID:
- 0000-0002-2887-2068
- Institution:
- University of Leeds
- Role:
- Examiner
- Funder identifier:
- https://ror.org/0439y7842
- Funding agency for:
- Batchkala, G
- Grant:
- EP/S02428X/1
- Programme:
- EPSRC Center for Doctoral Training in Health Data Science
- Funder identifier:
- https://ror.org/05ar5fy68
- Funding agency for:
- Batchkala, G
- Grant:
- 40255
- Programme:
- Professor Fergus Gleeson has funded me through his A2 research funds throughout my DPhil as part of the DART Lung Health Programme (Innovate UK grant 40255).
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
-
- Subjects:
- Deposit date:
-
2025-12-24
- ARK identifier:
Terms of use
- Copyright holder:
- George Batchkala
- Copyright date:
- 2025
- Notes:
- Evaluating histopathology foundation models for few-shot tissue clustering: an application to LC25000 augmented dataset cleaning, Active data enrichment by learning what to annotate in digital pathology, and Accurate subtyping of lung cancers by modelling class dependencies are derived from this thesis.
If you are the owner of this record, you can report an update to it here: Report update to this record