PEAR: a fast and accurate Illumina Paired-End reAd mergeR

Top Cited Papers

Open Access

18 October 2013

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 30 (5), 614-620
https://doi.org/10.1093/bioinformatics/btt593

Abstract

Motivation: The Illumina paired-end sequencing technology can generate reads from both ends of target DNA fragments, which can subsequently be merged to increase the overall read length. There already exist tools for merging these paired-end reads when the target fragments are equally long. However, when fragment lengths vary and, in particular, when either the fragment size is shorter than a single-end read, or longer than twice the size of a single-end read, most state-of-the-art mergers fail to generate reliable results. Therefore, a robust tool is needed to merge paired-end reads that exhibit varying overlap lengths because of varying target fragment lengths. Results: We present the PEAR software for merging raw Illumina paired-end reads from target fragments of varying length. The program evaluates all possible paired-end read overlaps and does not require the target fragment size as input. It also implements a statistical test for minimizing false-positive results. Tests on simulated and empirical data show that PEAR consistently generates highly accurate merged paired-end reads. A highly optimized implementation allows for merging millions of paired-end reads within a few minutes on a standard desktop computer. On multi-core architectures, the parallel version of PEAR shows linear speedups compared with the sequential version of PEAR. Availability and implementation: PEAR is implemented in C and uses POSIX threads. It is freely available at http://www.exelixis-lab.org/web/software/pear. Contact:Tomas.Flouri@h-its.org

Keywords

This publication has 18 references indexed in Scilit:

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly
Bioinformatics, 2012
Fast gapped-read alignment with Bowtie 2
Nature Methods, 2012
PANDAseq: paired-end assembler for illumina sequences
BMC Bioinformatics, 2012
ART: a next-generation sequencing read simulator
Bioinformatics, 2011
FLASH: fast length adjustment of short reads to improve genome assemblies
Bioinformatics, 2011
Generation of Multimillion-Sequence 16S rRNA Gene Libraries from Complex Microbial Communities by Assembling Paired-End Illumina Reads
Applied and Environmental Microbiology, 2011
Sequence-specific error profile of Illumina sequencers
Nucleic Acids Research, 2011
ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads
Genome Biology, 2009
Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy
Applied and Environmental Microbiology, 2007
[27] Local alignment statistics
Methods in Enzymology, 1996

Cited by 3749 articles