Thesis icon

Thesis

Efficient analysis of microbial whole-genome sequence data using de Bruijn graphs

Abstract:

Antimicrobial resistance (AMR) is a persistent and growing threat to global health. Whole genome sequencing (WGS) has the potential to dramatically improve our ability to detect, understand, and monitor AMR. However, microbial diversity and complexity means that the analysis and interpretation of their genomes is challenging. In this thesis, I explore applications of de Bruijn graphs (DBGs) to the analysis of these data.

First, I present a tool, Mykrobe predictor, that uses DBGs to rapidly identify species and AMR from WGS data. I show that it is accurate, flexible, and efficient.

Next, I explore an extension of Mykrobe predictor to long read sequencing of direct clinical samples of M. tuberculosis. In doing so, I show that one could reduce the turn-around time for susceptibility testing of an M. tuberculosis isolate from 2 weeks to 12 hours.

Finally, I explore the challenges of DNA search in very large collections (millions) of microbial data sets. In particular, I address the super-linear scaling of existing k-mer indexing tools and present a novel representation and implementation of a probabilistic coloured de Bruijn graph, “Coloured Bloom Graph" (CBG). I demonstrate its scalability by building a CBG of all publicly accessible microbial WGS data (almost half a million samples) and use it to run millisecond searches in these data.

Actions


Access Document


Authors


More by this author
Division:
MSD
Role:
Author

Contributors

Role:
Supervisor
Role:
Supervisor



Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


UUID:
uuid:b4dea8ec-f4ba-4bcc-ba00-8a1eaa515741
Deposit date:
2019-04-11

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP