Thesis
Latent feature models for multiomic data analysis
- Abstract:
-
Every year, cancer kills millions of people around the world. Treatment efficacy, including surgery, radiotherapy, and chemotherapy, varies considerably across tumour types, and growing evidence shows that the molecular subtype of the disease can be linked to clinical outcomes and hence inform clinical decision making. While DNA mutations are pivotal to cancer development, other factors such as methylation, RNA, and proteins also play critical roles, requiring a comprehensive, multimodal analysis for a holistic understanding of the disease. This has spurred significant research into multiomic data integration and analysis, which can reveal meaningful subtypes and guide treatment decisions. However, integrative analysis of multiomic datasets is challenging, and current methods fail to sufficiently address the complexity, dimensionality, prevalence of missing data, and heterogeneities that characterise different omics outputs.
To address these limitations, this thesis focuses on the development of a novel latent feature model tailored for the integrative analysis of multiomic datasets with missing modalities. We introduce iCS-GAN (integrative Cancer Subtyping with Generative Adversarial Networks) - a method that leverages adversarially learned inference to extract clustering-relevant binary latent features from multiomic data. The proposed approach employs a combination of shared and modality-specific layers, layer-wise pre-training, robust imputation techniques, and adversarial loss functions, to consistently integrate heterogeneous data, even in the presence of incomplete datasets. Non-negativity constraints ensure that the latent variables remain fully interpretable and any results are amendable for translation for clinical use. Furthermore, clustering and survival penalties guide the latent encodings and subsequent analysis towards clinically-relevant disease subtypes.
We demonstrate the utility of iCS-GAN through a comprehensive analysis of the PanProstate Cancer Group multiomic prostate cancer dataset. Our study identifies three distinct multiomic prostate cancer subtypes, including a novel aggressive subtype characterized by low expression levels of the ERG and TFF3 genes. To facilitate clinical translation, we develop a highly accurate predictive test, capable of classifying patients into these subtypes using only 24 RNA gene expression levels. Upon external validation, this test could support low-cost, clinically viable patient stratification, paving the way for improved cancer outcomes and personalized care.
Actions
Authors
Contributors
- Institution:
- University of Oxford
- Division:
- MSD
- Department:
- Surgical Sciences
- Role:
- Supervisor
- Institution:
- University of Oxford
- Division:
- MSD
- Department:
- NDORMS
- Sub department:
- Kennedy Institute for Rheumatology
- Role:
- Supervisor
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- EP/S02428X/1
- Programme:
- EPSRC Centre for Doctoral Training in Health Data Science
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2025-02-27
Terms of use
- Copyright holder:
- Aleksandra Ziubroniewicz
- Copyright date:
- 2024
If you are the owner of this record, you can report an update to it here: Report update to this record