Thesis
Statistical models for single-cell data
- Abstract:
-
The growing use of single-cell gene expression data (scRNA-seq data) has made it possible to more precisely identify cellular subpopulations with distinct characteristics. This offers insight both into normal cellular function and into diseases such as cancer. However, single-cell data presents new challenges which standard clustering and dimensionality reduction methods are not designed to confront. Rather than being drawn from Gaussians as assumed by most models, single-cell datasets contain an overabundance of zeros. We use several scRNA-seq datasets to investigate how zeros are distributed in real biological data. Based on our findings, we perform a series of simulations to examine how standard clustering and dimensionality reduction methods perform on high-dimensional, zero-inflated data. We show that performance suffers as the fraction of zeros increases and that more complex non-linear models fail to alleviate this problem. Based on this, we then present two new models that account for zeros: one to perform dimensionality reduction, and one to perform clustering. For dimensionality reduction, we present an extension of the factor analysis model which accounts for zero-inflation by adding a single parameter which describes how the probability of technical zeros decreases with the expression level of a gene. We show that our model, ZIFA, outperforms factor analysis and probabilistic principal components analysis on simulated data. For clustering, we present a extension of the Gaussian mixture model which accounts for zero-inflation. We show that our model, ZIMM, outperforms k-means and Gaussian mixture models on simulated data. We also show that both ZIFA and ZIMM better fit real data from several scRNA-seq experiments, and provide fast software implementations of both models. Lastly, we explore applications of our models to several other classes of non-biological datasets.
Actions
- Publication date:
- 2015
- Type of award:
- MSc by Research
- Level of award:
- Masters
- Awarding institution:
- Oxford University, UK
- Language:
-
English
- Keywords:
- Subjects:
- UUID:
-
uuid:2ea412a0-deea-4107-a363-1bbf37cca02e
- Local pid:
-
ora:12263
- Deposit date:
-
2015-09-24
Terms of use
- Copyright holder:
- Pierson, E
- Copyright date:
- 2015
If you are the owner of this record, you can report an update to it here: Report update to this record