Incorporating sequence quality data into alignment improves DNA read mapping
Open Access
- 27 January 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 38 (7), e100
- https://doi.org/10.1093/nar/gkq010
Abstract
New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read mapping, our method models both sequencer errors and real sequence differences. This approach consistently improves mapping accuracy, even when the rate of real sequence difference is only 0.2%. Furthermore, when mapping Drosophila melanogaster reads to the Drosophila simulans genome, it increased the amount of correctly mapped reads from 49 to 66%. This approach enables more effective use of DNA reads from organisms that lack reference genomes, are extinct or are highly polymorphic.Keywords
This publication has 18 references indexed in Scilit:
- Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencingGenome Research, 2009
- Alignment of biological sequences with quality scoresInternational Journal of Bioinformatics Research and Applications, 2009
- Mapping short DNA sequencing reads and calling variants using mapping quality scoresGenome Research, 2008
- Substantial biases in ultra-short read data sets from high-throughput DNA sequencingNucleic Acids Research, 2008
- New developments in ancient genomicsTrends in Ecology & Evolution, 2008
- The effect of sequence quality on sequence alignmentBioinformatics, 2008
- Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterningNature, 2008
- Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyiGenome Research, 2007
- Multiseed Lossless FiltrationIEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997