Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals
Top Cited Papers
- 1 March 2004
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 11 (2-3), 377-394
- https://doi.org/10.1089/1066527041410418
Abstract
We propose a framework for modeling sequence motifs based on the maximum entropy principle (MEP). We recommend approximating short sequence motif distributions with the maximum entropy distribution (MED) consistent with low-order marginal constraints estimated from available data, which may include dependencies between nonadjacent as well as adjacent positions. Many maximum entropy models (MEMs) are specified by simply changing the set of constraints. Such models can be utilized to discriminate between signals and decoys. Classification performance using different MEMs gives insight into the relative importance of dependencies between different positions. We apply our framework to large datasets of RNA splicing signals. Our best models out-perform previous probabilistic models in the discrimination of human 5′ (donor) and 3′ (acceptor) splice sites from decoys. Finally, we discuss mechanistically motivated ways of comparing models.Keywords
This publication has 22 references indexed in Scilit:
- Nonclassical splicing mutations in the coding and noncoding regions of the ATM Gene: Maximum entropy estimates of splice junction strengthsHuman Mutation, 2003
- Modeling splicing sites with pairwise correlationsBioinformatics, 2002
- The U1 snRNP protein U1C recognizes the 5′ splice site in the absence of base pairingNature, 2002
- Predictive Identification of Exonic Splicing Enhancers in Human GenesScience, 2002
- Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factorsNucleic Acids Research, 2002
- Human Genomic Sequences That Inhibit SplicingMolecular and Cellular Biology, 2000
- Modeling splice sites with Bayes networksBioinformatics, 2000
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Contingency tables with given marginalsBiometrika, 1968
- A note on approximations to discrete probability distributionsInformation and Control, 1959