Thesis icon

Thesis

Importance sampling on the coalescent with recombination

Abstract:

Performing inference on contemporary samples of homologous DNA sequence data is an important task. By assuming a stochastic model for ancestry, one can make full use of observed data by sampling from the distribution of genealogies conditional upon the sample configuration. A natural such model is Kingman's coalescent, with numerous extensions to account for additional biological phenomena. However, in this model the distribution of interest cannot be written down analytically, and so one solution is to utilize importance sampling.

In this context, importance sampling (IS) simulates genealogies from an artificial proposal distribution, and corrects for this by weighting each resulting genealogy. In this thesis I investigate in detail approaches for developing efficient proposal distributions on coalescent histories, with a particular focus on a two-locus model mutating under the infinite-sites assumption and in which the loci are separated by a region of recombination. This model was originally studied by Griffiths (1981), and is a useful simplification for considering the correlated ancestries of two linked loci. I show that my proposal distribution generally outperforms an existing IS method which could be recruited to this model.

Given today's sequencing technologies it is not difficult to find volumes of data for which even the most efficient proposal distributions might struggle. I therefore appropriate resampling mechanisms from the theory of sequential Monte Carlo in order to effect substantial improvements in IS applications. In particular, I propose a new resampling scheme and confirm that it ensures a significant gain in the accuracy of likelihood estimates. It outperforms an existing scheme which can actually diminish the quality of an IS simulation unless it is applied to coalescent models with care. Finally, I apply the methods developed here to an example dataset, and discuss a new measure for the way in which two gene trees are correlated.

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Oxford college:
St Edmund Hall
Role:
Author
More by this author
Division:
MPLS
Department:
Statistics
Role:
Author

Contributors

Division:
MPLS
Department:
Statistics
Role:
Supervisor


Publication date:
2008
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP