PANDAseq: paired-end assembler for illumina sequences

Top Cited Papers

Open Access

14 February 2012

journal article
Published by Springer Nature in BMC Bioinformatics

Vol. 13 (1), 31
https://doi.org/10.1186/1471-2105-13-31

Abstract

Background: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. Results: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. Conclusions: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.

Keywords

This publication has 13 references indexed in Scilit:

Illumina-based analysis of microbial community diversity
The ISME Journal, 2011
Generation of Multimillion-Sequence 16S rRNA Gene Libraries from Complex Microbial Communities by Assembling Paired-End Illumina Reads
Applied and Environmental Microbiology, 2011
Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products
PLOS ONE, 2010
BIPES, a cost-effective high-throughput method for assessing microbial diversity
The ISME Journal, 2010
Unlocking Short Read Sequencing for Metagenomics
PLOS ONE, 2010
Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample
Proceedings of the National Academy of Sciences, 2010
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
Nucleic Acids Research, 2009
The Ribosomal Database Project: improved alignments and new tools for rRNA analysis
Nucleic Acids Research, 2008
The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data
Nucleic Acids Research, 2006
A general method applicable to the search for similarities in the amino acid sequence of two proteins
Journal of Molecular Biology, 1970

Cited by 1901 articles