Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER

Open Access

15 April 2005

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 6 (1), 99
https://doi.org/10.1186/1471-2105-6-99

Abstract

Background: Profile hidden Markov model (HMM) techniques are among the most powerful methods for protein homology detection. Yet, the critical features for successful modelling are not fully known. In the present work we approached this by using two of the most popular HMM packages: SAM and HMMER. The programs' abilities to build models and score sequences were compared on a SCOP/Pfam based test set. The comparison was done separately for local and global HMM scoring. Results: Using default settings, SAM was overall more sensitive. SAM's model estimation was superior, while HMMER's model scoring was more accurate. Critical features for model building were then analysed by comparing the two packages' algorithmic choices and parameters. The weighting between prior probabilities and multiple alignment counts held the primary explanation why SAM's model building was superior. Our analysis suggests that HMMER gives too much weight to the sequence counts. SAM's emission prior probabilities were also shown to be more sensitive. The relative sequence weighting schemes are different in the two packages but performed equivalently. Conclusion: SAM model estimation was more sensitive, while HMMER model scoring was more accurate. By combining the best algorithmic features from both packages the accuracy was substantially improved compared to their default performance.

Keywords

This publication has 34 references indexed in Scilit:

Improving Profile HMM Discrimination by Adapting Transition Probabilities
Journal of Molecular Biology, 2004
Transition Priors for Protein Hidden Markov Models: An Empirical Study towards Maximum Discrimination
Journal of Computational Biology, 2004
The Pfam protein families database
Nucleic Acids Research, 2004
Detecting distant homologs using phylogenetic tree-based HMMs
Proteins-Structure Function and Bioinformatics, 2003
Within the twilight zone: a sensitive profile-profile comparison tool based on information theory
Journal of Molecular Biology, 2002
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure
Journal of Molecular Biology, 2001
Identification of related proteins on family, superfamily and fold level 1 1Edited by F. C. Cohen
Journal of Molecular Biology, 2000
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods
Journal of Molecular Biology, 1998
Maximum Discrimination Hidden Markov Models of Sequence Consensus
Journal of Computational Biology, 1995
Volume changes in protein evolution
Journal of Molecular Biology, 1994

Cited by 49 articles