Modeling splice sites with Bayes networks

Open Access

1 February 2000

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 16 (2), 152-158
https://doi.org/10.1093/bioinformatics/16.2.152

Abstract

Motivation: The main goal in this paper is to develop accurate probabilistic models for important functional regions in DNA sequences (e.g. splice junctions that signal the beginning and end of transcription in human DNA). These methods can subsequently be utilized to improve the performance of gene-finding systems. The models built here attempt to model long-distance dependencies between non-adjacent bases. Results: An efficient modeling method is described which models biological data more accurately than a first-order Markov model without increasing the number of parameters. Intuitively, a small number of parameters helps a learning system to avoid overfitting. Several experiments with the model are presented, which show a small improvement in the average accuracy as compared with a simple Markov model. These experiments suggest that single long distance dependencies do not help the recognition problem, thus confirming several previous studies which have used more heuristic modeling techniques. Availability: This software is available for download and as a web resource at http://www.ai.uic.edu/software Contact: kasif@eecs.uic.edu

Keywords

Cited by 67 articles