Weighting hidden Markov models for maximum discrimination.

Open Access

1 January 1998

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 14 (9), 772-782
https://doi.org/10.1093/bioinformatics/14.9.772

Abstract

MOTIVATION: Hidden Markov models can efficiently and automatically build statistical representations of related sequences. Unfortunately, training sets are frequently biased toward one subgroup of sequences, leading to an insufficiently general model. This work evaluates sequence weighting methods based on the maximum-discrimination idea. RESULTS: One good method scales sequence weights by an exponential that ranges between 0.1 for the best scoring sequence and 1.0 for the worst. Experiments with a curated data set show that while training with one or two sequences performed worse than single-sequence Probabilistic Smith-Waterman, training with five or ten sequences reduced errors by 20% and 51%, respectively. This new version of the SAM HMM suite outperforms HMMer (17% reduction over PSW for 10 training sequences), Meta-MEME (28% reduction), and unweighted SAM (31% reduction). AVAILABILITY: A WWW server, as well as information on obtaining the Sequence Alignment and Modeling (SAM) software suite and additional data from this work, can be found at http://www.cse.ucse. edu/research/compbio/sam.html

Keywords

Cited by 23 articles