Murlet: a practical multiple alignment tool for structural RNA sequences
Open Access
- 25 April 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (13), 1588-1598
- https://doi.org/10.1093/bioinformatics/btm146
Abstract
Motivation: Structural RNA genes exhibit unique evolutionary patterns that are designed to conserve their secondary structures; these patterns should be taken into account while constructing accurate multiple alignments of RNA genes. The Sankoff algorithm is a natural alignment algorithm that includes the effect of base-pair covariation in the alignment model. However, the extremely high computational cost of the Sankoff algorithm precludes its application to most RNA sequences. Results: We propose an efficient algorithm for the multiple alignment of structural RNA sequences. Our algorithm is a variant of the Sankoff algorithm, and it uses an efficient scoring system that reduces the time and space requirements considerably without compromising on the alignment quality. First, our algorithm computes the match probability matrix that measures the alignability of each position pair between sequences as well as the base pairing probability matrix for each sequence. These probabilities are then combined to score the alignment using the Sankoff algorithm. By itself, our algorithm does not predict the consensus secondary structure of the alignment but uses external programs for the prediction. We demonstrate that both the alignment quality and the accuracy of the consensus secondary structure prediction from our alignment are the highest among the other programs examined. We also demonstrate that our algorithm can align relatively long RNA sequences such as the eukaryotic-type signal recognition particle RNA that is ∼300 nt in length; multiple alignment of such sequences has not been possible by using other Sankoff-based algorithms. The algorithm is implemented in the software named ‘Murlet’. Availability: The C++ source code of the Murlet software and the test dataset used in this study are available at http://www.ncrna.org/papers/Murlet/ Contact:kiryu-h@aist.go.jp Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 34 references indexed in Scilit:
- Identification and Classification of Conserved RNA Secondary Structures in the Human GenomePLoS Computational Biology, 2006
- Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAsNature, 2002
- Secondary Structure Prediction for Aligned RNA SequencesJournal of Molecular Biology, 2002
- Dynalign: an algorithm for finding the secondary structure common to two RNA sequencesJournal of Molecular Biology, 2002
- Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structureJournal of Molecular Biology, 1999
- Dynamic Programming Alignment AccuracyJournal of Computational Biology, 1998
- A reliable sequence alignment method based on probabilities of residue correspondencesProtein Engineering, Design and Selection, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- The equilibrium partition function and base pair binding probabilities for RNA secondary structureBiopolymers, 1990
- A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisonsJournal of Molecular Biology, 1987