A framework for variation discovery and genotyping using next-generation DNA sequencing data
Top Cited Papers
Open Access
- 10 April 2011
- journal article
- research article
- Published by Springer Nature in Nature Genetics
- Vol. 43 (5), 491-498
- https://doi.org/10.1038/ng.806
Abstract
Mark DePristo and colleagues report an analytical framework to discover and genotype variation using whole exome and genome resequencing data from next-generation sequencing technologies. They apply these methods to low-pass population sequencing data from the 1000 Genomes Project. Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.Keywords
This publication has 41 references indexed in Scilit:
- Variation in genome-wide mutation rates within and between human familiesNature Genetics, 2011
- A map of human genome variation from population-scale sequencingNature, 2010
- The landscape of somatic copy-number alteration across human cancersNature, 2010
- A comprehensive catalogue of somatic mutations from a human cancer genomeNature, 2009
- Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association StudiesAmerican Journal of Human Genetics, 2009
- Exome sequencing identifies the cause of a mendelian disorderNature Genetics, 2009
- Targeted capture and massively parallel sequencing of 12 human exomesNature, 2009
- Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencingNature Biotechnology, 2009
- Accurate whole human genome sequencing using reversible terminator chemistryNature, 2008
- The complete genome of an individual by massively parallel DNA sequencingNature, 2008