Journal article
A nonparametric HMM for genetic imputation and coalescent inference
- Abstract:
- Genetic sequence data are well described by hidden Markov models (HMMs) in which latent states correspond to clusters of similar mutation patterns. Theory from statistical genetics suggests that these HMMs are nonhomogeneous (their transition probabilities vary along the chromosome) and have large support for self transitions. We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity. Our model provides a parameterization of the genetic process that is more parsimonious than other more general nonparametric models which have previously been applied to population genetics. We provide truncationfree MCMC inference for our model using a new auxiliary sampling scheme for Bayesian nonparametric HMMs. In a series of experiments on male X chromosome data from the Thousand Genomes Project and also on data simulated from a population bottleneck we show the benefits of our model over the popular finite model fastPHASE, which can itself be seen as a parametric truncation of our model. We find that the number of HMM states found by our model is correlated with the time to the most recent common ancestor in population bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics applied to large and complex genetic data.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 848.0KB, Terms of use)
-
- Publisher copy:
- 10.1214/16-EJS1197
Authors
- Publisher:
- Institute of Mathematical Statistics
- Journal:
- Electronic Journal of Statistics More from this journal
- Volume:
- 10
- Issue:
- 2
- Pages:
- 3425-3451
- Publication date:
- 2016-11-01
- Acceptance date:
- 2016-09-18
- DOI:
- ISSN:
-
1935-7524
- Keywords:
- Pubs id:
-
pubs:653695
- UUID:
-
uuid:e4de21d5-e80b-469e-b285-79d8970856d7
- Local pid:
-
pubs:653695
- Source identifiers:
-
653695
- Deposit date:
-
2016-10-23
Terms of use
- Copyright holder:
- Elliott and Teh
- Copyright date:
- 2016
- Notes:
- The Electronic Journal of Probability applies the Creative Commons Attribution License (CCAL) to all articles we publish in this journal. Under the CCAL, authors retain ownership of the copyright for their article, but authors allow anyone to download, reuse, reprint, modify, distribute, and/or copy articles published in EJP, so long as the original authors and source are credited.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record