STAR: ultrafast universal RNA-seq aligner
Top Cited Papers
- 25 October 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 29 (1), 15-21
- https://doi.org/10.1093/bioinformatics/bts635
Abstract
Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80–90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/. Contact: dobin@cshl.edu.Keywords
This publication has 22 references indexed in Scilit:
- Landscape of transcription in human cellsNature, 2012
- Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)Bioinformatics, 2011
- Pre-mRNA splicing: where and when in the nucleusTrends in Cell Biology, 2011
- progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and RearrangementPLOS ONE, 2010
- Direct detection of DNA methylation during single-molecule, real-time sequencingNature Methods, 2010
- Detection of splice junctions from paired-end RNA-seq data by SpliceMapNucleic Acids Research, 2010
- Optimal spliced alignments of short sequence readsBioinformatics, 2008
- Mauve: Multiple Alignment of Conserved Genomic Sequence With RearrangementsGenome Research, 2004
- Fast algorithms for large-scale genome alignment and comparisonNucleic Acids Research, 2002
- Alignment of whole genomesNucleic Acids Research, 1999