Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)
Open Access
- 19 July 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (18), 2518-2528
- https://doi.org/10.1093/bioinformatics/btr427
Abstract
Motivation: A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously. Results: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription–polymerase chain reaction (RT–PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. Availability: The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (http://cbil.upenn.edu/RUM). Contact:ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu Supplementary Information:The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.Keywords
This publication has 26 references indexed in Scilit:
- Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq readsBMC Genomics, 2010
- SOAP2: an improved ultrafast tool for short read alignmentBioinformatics, 2009
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- TopHat: discovering splice junctions with RNA-SeqBioinformatics, 2009
- Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biology, 2009
- Activator-Mediated Recruitment of the MLL2 Methyltransferase Complex to the β-Globin LocusMolecular Cell, 2007
- The Ciliopathies: An Emerging Class of Human Genetic DisordersAnnual Review of Genomics and Human Genetics, 2006
- Basal body dysfunction is a likely cause of pleiotropic Bardet–Biedl syndromeNature, 2003
- A map of human genome sequence variation containing 1.42 million single nucleotide polymorphismsNature, 2001
- Kabuki make-up syndrome: A syndrome of mentalretardation, unusual facies, large and protruding ears, and postnatal growth deficiencyThe Journal of Pediatrics, 1981