Prediction of RNA secondary structure using generalized centroid estimators
- 18 December 2008
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 25 (4), 465-473
- https://doi.org/10.1093/bioinformatics/btn601
Abstract
Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures. We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics. Supporting information and the CentroidFold software are available online at: http://www.ncrna.org/software/centroidfold/.Keywords
This publication has 23 references indexed in Scilit:
- Centroid estimation in discrete high-dimensional spaces with applications in biologyProceedings of the National Academy of Sciences, 2008
- Efficient parameter estimation for RNA secondary structure predictionBioinformatics, 2007
- CONTRAfold: RNA secondary structure prediction without physics-based modelsBioinformatics, 2006
- RNA secondary structure prediction by centroids in a Boltzmann weighted ensembleRNA, 2005
- ProbCons: Probabilistic consistency-based multiple sequence alignmentGenome Research, 2005
- Rfam: annotating non-coding RNAs in complete genomesNucleic Acids Research, 2004
- Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure predictionBMC Bioinformatics, 2004
- Secondary Structure Prediction for Aligned RNA SequencesJournal of Molecular Biology, 2002
- Biological Sequence AnalysisPublished by Cambridge University Press (CUP) ,1998
- Fast folding and comparison of RNA secondary structuresMonatshefte für Chemie / Chemical Monthly, 1994