Mapping short DNA sequencing reads and calling variants using mapping quality scores
Top Cited Papers
- 19 August 2008
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 18 (11), 1851-1858
- https://doi.org/10.1101/gr.078212.108
Abstract
New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.Keywords
This publication has 29 references indexed in Scilit:
- A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysisNature Biotechnology, 2008
- Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencingNature Genetics, 2008
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphsGenome Research, 2008
- Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencingNature Methods, 2007
- Multidrug-ResistantSalmonella entericaSerovar Paratyphi A Harbors IncHI1 Plasmids Similar to Those Found in Serovar TyphiJournal of Bacteriology, 2007
- High-Resolution Profiling of Histone Methylations in the Human GenomeCell, 2007
- SNPdetector: A Software Tool for Sensitive and Accurate SNP DetectionPLoS Computational Biology, 2005
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Identification of common molecular subsequencesJournal of Molecular Biology, 1981