Journal article icon

Journal article

Haplotype estimation using sequencing reads.

Abstract:
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.
Publication status:
Published

Actions

Access Document

Publisher copy:
10.1016/j.ajhg.2013.09.002

Authors

More by this author
Institution:
University of Oxford
Division:
MSD
Department:
NDM
Role:
Author


Journal:
American journal of human genetics More from this journal
Volume:
93
Issue:
4
Pages:
687-696
Publication date:
2013-10-01
DOI:
EISSN:
1537-6605
ISSN:
0002-9297


Language:
English
Keywords:
Pubs id:
pubs:434125
UUID:
uuid:d543946e-cf16-4ad0-a0a2-35a13b0f7150
Local pid:
pubs:434125
Source identifiers:
434125
Deposit date:
2014-02-07
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP