PANDAseq: paired-end assembler for illumina sequences
Top Cited Papers
Open Access
- 14 February 2012
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 13 (1), 31
- https://doi.org/10.1186/1471-2105-13-31
Abstract
Background: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. Results: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. Conclusions: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.Keywords
This publication has 13 references indexed in Scilit:
- Illumina-based analysis of microbial community diversityThe ISME Journal, 2011
- Generation of Multimillion-Sequence 16S rRNA Gene Libraries from Complex Microbial Communities by Assembling Paired-End Illumina ReadsApplied and Environmental Microbiology, 2011
- Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR ProductsPLOS ONE, 2010
- BIPES, a cost-effective high-throughput method for assessing microbial diversityThe ISME Journal, 2010
- Unlocking Short Read Sequencing for MetagenomicsPLOS ONE, 2010
- Global patterns of 16S rRNA diversity at a depth of millions of sequences per sampleProceedings of the National Academy of Sciences, 2010
- The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variantsNucleic Acids Research, 2009
- The Ribosomal Database Project: improved alignments and new tools for rRNA analysisNucleic Acids Research, 2008
- The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public dataNucleic Acids Research, 2006
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970