Thesis
Leveraging genomic annotations to uncover latent components in gene expression data
- Abstract:
-
Over the last decade single cell genomics has transformed research in genetics, resolving transcriptional activity at an unprecedented resolution. Over 1000 single cell studies have been published to date and the scale of single cell data is increasing, cell atlases of complex organisms are being curated, and experimental sample sizes are increasing exponentially.
Standard analyses for single cell data include a hard clustering of cells and an investigation of these clusters to identify cell-specific marker genes. This approach is powerful and identifies transcriptomic differences between cells. However, such analyses fail to account for biological heterogeneity present in continuous developmental processes taking place in cells, or to use relevant prior information that may be available to the researcher. In this thesis we investigate incorporating prior information into factor analysis for gene expression analyses, which infers components of genes that are coexpressed or co-repressed across cells, and their expression in each cell. First, we consider structural features of the data, extending an existing tensor decomposition method to four dimensions, facilitating analysis of gene expression data indexed by cell, time, tissue and gene. We demonstrate the relative benefits of the method against a lower dimensional method. Secondly, we develop a widely applicable new prior structure and inference algorithm that allows for incorporating external prior information into factor analysis. The new algorithm automatically selects important annotations and leverages this information to infer biologically meaningful components. Comparing two likelihood models, we evaluate their strengths and weaknesses on simulated data, and the benefit of incorporating prior information into the algorithm. We demonstrate the power of the method to capture meaningful components of expression on a well-studied single cell RNA-seq dataset from cells undergoing spermatogenesis, using transcription factor binding affinities as prior information, and demonstrate the utility of the new prior structure in leveraging such annotations.
Actions
Authors
Contributors
- Division:
- MPLS
- Department:
- Statistics
- Sub department:
- Statistics
- Role:
- Supervisor
- Programme:
- Systems Approaches to Biomedical Science (EPSRC & MRC CDT)
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2022-01-23
Terms of use
- Copyright holder:
- Gill, CC
- Copyright date:
- 2021
If you are the owner of this record, you can report an update to it here: Report update to this record