Thesis icon

Thesis

Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits

Abstract:

Across living species, DNA is transmitted from generation to generation via the processes of inheritance, mutation, and recombination. The history of these processes can be recorded using genome-wide gene genealogies. Accurate inference of gene genealogies from genetic data has the potential to facilitate a wide range of analyses, but is computationally challenging. In this thesis, we introduce a scalable method, called ARG-Needle, that uses genotype hashing and a coalescent hidden Markov model to infer genome-wide genealogies from sequencing or genotyping array data in modern biobanks. We develop strategies that utilise the inferred genome-wide genealogies within linear mixed models to perform association and other analyses of biomedical traits.

We validate the accuracy and scalability of ARG-Needle through extensive coalescent simulations, and use ARG-Needle to build genome-wide genealogies from genotypes of 337,464 UK Biobank individuals. We perform genealogy-based association analysis of 7 complex traits, detecting more rare and ultra-rare signals (N = 133, frequency range 0.0004% − 0.1%) than genotype imputation from ∼65,000 sequenced haplotypes (N = 65). We validate these signals using exome sequencing data from 138,039 individuals. ARG-Needle associations strongly tag (average r = 0.72) underlying sequencing variants that are enriched for missense (2.3×) and loss-of-function (4.5×) variation. Compared to imputation, inferred genealogies also capture additional signals for higher frequency variants. These results demonstrate that biobank-scale inference of gene genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.

Actions


Access Document


Files:

Authors


More by this author
Division:
MPLS
Department:
Statistics
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Role:
Supervisor


More from this funder
Funder identifier:
http://dx.doi.org/10.13039/501100014748
Grant:
Clarendon Scholarship
Programme:
Clarendon Scholarship
More from this funder
Funder identifier:
http://dx.doi.org/10.13039/501100000781
Grant:
ARGPHENO 850869
Programme:
ERC Starting Grant no. ARGPHENO 850869


Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Keywords:
Subjects:
Deposit date:
2023-06-05

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP