SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
Top Cited Papers
Open Access
- 3 May 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 28 (14), 1823-1829
- https://doi.org/10.1093/bioinformatics/bts252
Abstract
Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license. Contact:epruesse@mpi-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 28 references indexed in Scilit:
- A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future PerspectivesPLOS ONE, 2011
- Improvements to services at the European Nucleotide ArchiveNucleic Acids Research, 2009
- Proceedings of the international workshop on Ribosomal RNA technology, April 7–9, 2008, Bremen, GermanySystematic and Applied Microbiology, 2008
- Recent Evolutions of Multiple Sequence Alignment AlgorithmsPLoS Computational Biology, 2007
- Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARBApplied and Environmental Microbiology, 2006
- NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genesNucleic Acids Research, 2006
- MUSCLE: a multiple sequence alignment method with reduced time and space complexityBMC Bioinformatics, 2004
- Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of apicomplexaMolecular Biology and Evolution, 1997
- Progressive sequence alignment as a prerequisitetto correct phylogenetic treesJournal of Molecular Evolution, 1987
- An improved algorithm for matching biological sequencesJournal of Molecular Biology, 1982