Modeling splicing sites with pairwise correlations

Abstract
Motivation: A new method for finding subtle patterns in sequences is introduced. It approximates the multiple correlations among residuals with pair-wise correlations, with the learning cost O(m2n) where n is the number of training sequences, each of length m. The method suits to model splicing sites in human DNA, which are reported to have higher-order dependencies. Results: By computational experiments, the prediction accuracy of our model was shown to surpass that of previously reported Markov models for the prediction of acceptor sites in human. Availability: The C++ source code is available on request from the authors. Contact: m-arita@aist.go.jp