Thesis icon

Thesis

Novel machine learning for applications in cancer genomics

Abstract:
Genomics has advanced rapidly in the past decade, with whole-genome sequencing and single-cell RNA sequencing now routine in cancer research. Machine and deep learning have also taken off, but applying them to biology remains challenging due to confounding factors, irregularly distributed data, and a desire for causal insight rather than prediction. In genomics, the lack of ground-truth labels further limits supervised learning. This work develops methods bridging both domains.

The first part of my work revisits copy number alteration calling in cancer, introducing araCNA, a deep learning model trained via simulation rather than emulating the outputs of other models. Using novel long-range sequence models like Mamba, araCNA predicts copy number profiles on whole-genome sequenced cancer samples. araCNA presents a different paradigm for which deep learning models can be applied in genomics - for amortised inference rather than as emulators. The second part of my work focuses on unsupervised discovery in single-cell RNA sequencing (scRNA-seq). I investigate the standard scRNA-seq pipeline assumptions and show how most approaches overlook the sparse, near-binary nature of scRNA-seq data. To address this, I develop bfact, a Boolean matrix factorisation method combining combinatorial optimisation with heuristic post-processing. bfact outperforms existing BMF methods and, when applied to scRNA-seq, finds biologically relevant gene programs beyond current approaches.

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Women's & Reproductive Health
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MSD
Department:
Women's & Reproductive Health
Role:
Supervisor
ORCID:
0000-0001-7615-8523
Institution:
University of Oxford
Division:
MSD
Department:
Radcliffe Department of Medicine
Role:
Examiner
Institution:
University of Manchester
Role:
Examiner


More from this funder
Funder identifier:
https://ror.org/0439y7842
Funding agency for:
Visscher, E
Grant:
EP/S02428X/1
Programme:
Oxford EPSRC Centre for Doctoral Training in Health Data Science


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP