Thesis icon

Thesis

Leveraging genomic annotations to uncover latent components in gene expression data

Abstract:

Over the last decade single cell genomics has transformed research in genetics, resolving transcriptional activity at an unprecedented resolution. Over 1000 single cell studies have been published to date and the scale of single cell data is increasing, cell atlases of complex organisms are being curated, and experimental sample sizes are increasing exponentially.

Standard analyses for single cell data include a hard clustering of cells and an investigation of these clusters to identify cell-specific marker genes. This approach is powerful and identifies transcriptomic differences between cells. However, such analyses fail to account for biological heterogeneity present in continuous developmental processes taking place in cells, or to use relevant prior information that may be available to the researcher. In this thesis we investigate incorporating prior information into factor analysis for gene expression analyses, which infers components of genes that are coexpressed or co-repressed across cells, and their expression in each cell. First, we consider structural features of the data, extending an existing tensor decomposition method to four dimensions, facilitating analysis of gene expression data indexed by cell, time, tissue and gene. We demonstrate the relative benefits of the method against a lower dimensional method. Secondly, we develop a widely applicable new prior structure and inference algorithm that allows for incorporating external prior information into factor analysis. The new algorithm automatically selects important annotations and leverages this information to infer biologically meaningful components. Comparing two likelihood models, we evaluate their strengths and weaknesses on simulated data, and the benefit of incorporating prior information into the algorithm. We demonstrate the power of the method to capture meaningful components of expression on a well-studied single cell RNA-seq dataset from cells undergoing spermatogenesis, using transcription factor binding affinities as prior information, and demonstrate the utility of the new prior structure in leveraging such annotations.

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Sub department:
Statistics
Oxford college:
Wolfson College
Role:
Author
ORCID:
0000-0001-5418-5483

Contributors

Division:
MPLS
Department:
Statistics
Sub department:
Statistics
Role:
Supervisor


Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP