Prediction of RNA secondary structure using generalized centroid estimators

18 December 2008

journal article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 25 (4), 465-473
https://doi.org/10.1093/bioinformatics/btn601

Abstract

Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures. We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics. Supporting information and the CentroidFold software are available online at: http://www.ncrna.org/software/centroidfold/.

Keywords

This publication has 23 references indexed in Scilit:

Centroid estimation in discrete high-dimensional spaces with applications in biology
Proceedings of the National Academy of Sciences, 2008
Efficient parameter estimation for RNA secondary structure prediction
Bioinformatics, 2007
CONTRAfold: RNA secondary structure prediction without physics-based models
Bioinformatics, 2006
RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble
RNA, 2005
ProbCons: Probabilistic consistency-based multiple sequence alignment
Genome Research, 2005
Rfam: annotating non-coding RNAs in complete genomes
Nucleic Acids Research, 2004
Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction
BMC Bioinformatics, 2004
Secondary Structure Prediction for Aligned RNA Sequences
Journal of Molecular Biology, 2002
Biological Sequence Analysis
Published by Cambridge University Press (CUP) ,1998
Fast folding and comparison of RNA secondary structures
Monatshefte für Chemie / Chemical Monthly, 1994

Cited by 202 articles