Leveraging genomic annotations to uncover latent components in gene expression data

Gill, CC

Thesis

Leveraging genomic annotations to uncover latent components in gene expression data

Abstract:: Over the last decade single cell genomics has transformed research in genetics, resolving transcriptional activity at an unprecedented resolution. Over 1000 single cell studies have been published to date and the scale of single cell data is increasing, cell atlases of complex organisms are being curated, and experimental sample sizes are increasing exponentially.

Standard analyses for single cell data include a hard clustering of cells and an investigation of these clusters to identify cell-specific marker genes. This approach is powerful and identifies transcriptomic differences between cells. However, such analyses fail to account for biological heterogeneity present in continuous developmental processes taking place in cells, or to use relevant prior information that may be available to the researcher. In this thesis we investigate incorporating prior information into factor analysis for gene expression analyses, which infers components of genes that are coexpressed or co-repressed across cells, and their expression in each cell. First, we consider structural features of the data, extending an existing tensor decomposition method to four dimensions, facilitating analysis of gene expression data indexed by cell, time, tissue and gene. We demonstrate the relative benefits of the method against a lower dimensional method. Secondly, we develop a widely applicable new prior structure and inference algorithm that allows for incorporating external prior information into factor analysis. The new algorithm automatically selects important annotations and leverages this information to infer biologically meaningful components. Comparing two likelihood models, we evaluate their strengths and weaknesses on simulated data, and the benefit of incorporating prior information into the algorithm. We demonstrate the power of the method to capture meaningful components of expression on a well-studied single cell RNA-seq dataset from cells undergoing spermatogenesis, using transcription factor binding affinities as prior information, and demonstrate the utility of the new prior structure in leveraging such annotations.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Gill, C. C. (2021). Leveraging genomic annotations to uncover latent components in gene expression data [PhD thesis]. University of Oxford.

MLA Style

Gill, C. C. Leveraging Genomic Annotations to Uncover Latent Components in Gene Expression Data. University of Oxford, 2021.

Chicago Style

Gill, CC. 2021. “Leveraging Genomic Annotations to Uncover Latent Components in Gene Expression Data.” PhD thesis, University of Oxford.
Share
Print