RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment
Open Access
- 30 May 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (15), 1883-1891
- https://doi.org/10.1093/bioinformatics/btm272
Abstract
Motivation: Non-coding RNA genes and RNA structural regulatory motifs play important roles in gene regulation and other cellular functions. They are often characterized by specific secondary structures that are critical to their functions and are often conserved in phylogenetically or functionally related sequences. Predicting common RNA secondary structures in multiple unaligned sequences remains a challenge in bioinformatics research. Methods and Results: We present a new sampling based algorithm to predict common RNA secondary structures in multiple unaligned sequences. Our algorithm finds the common structure between two sequences by probabilistically sampling aligned stems based on stem conservation calculated from intrasequence base pairing probabilities and intersequence base alignment probabilities. It iteratively updates these probabilities based on sampled structures and subsequently recalculates stem conservation using the updated probabilities. The iterative process terminates upon convergence of the sampled structures. We extend the algorithm to multiple sequences by a consistency-based method, which iteratively incorporates and reinforces consistent structure information from pairwise comparisons into consensus structures. The algorithm has no limitation on predicting pseudoknots. In extensive testing on real sequence data, our algorithm outperformed other leading RNA structure prediction methods in both sensitivity and specificity with a reasonably fast speed. It also generated better structural alignments than other programs in sequences of a wide range of identities, which more accurately represent the RNA secondary structure conservations. Availability: The algorithm is implemented in a C program, RNA Sampler, which is available at http://ural.wustl.edu/software.html Contact:xingxu@ural.wustl.edu and stormo@genetics.wustl.edu. Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 38 references indexed in Scilit:
- Consensus Folding of Unaligned RNA Sequences RevisitedJournal of Computational Biology, 2006
- RNAshapes: an integrated RNA analysis package based on abstract shapesBioinformatics, 2005
- MicroRNAs: Genomics, Biogenesis, Mechanism, and FunctionCell, 2004
- Secondary Structure Prediction for Aligned RNA SequencesJournal of Molecular Biology, 2002
- Dynalign: an algorithm for finding the secondary structure common to two RNA sequencesJournal of Molecular Biology, 2002
- Non–coding RNA genes and the modern RNA worldNature Reviews Genetics, 2001
- A dynamic programming algorithm for RNA structure prediction including pseudoknots 1 1Edited by I. TinocoJournal of Molecular Biology, 1999
- A reliable sequence alignment method based on probabilities of residue correspondencesProtein Engineering, Design and Selection, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- The equilibrium partition function and base pair binding probabilities for RNA secondary structureBiopolymers, 1990