MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information

Open Access

26 August 2006

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 34 (16), 4364-4374
https://doi.org/10.1093/nar/gkl514

Abstract

We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10 000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments.

Keywords

This publication has 58 references indexed in Scilit:

MAFFT version 5: improvement in accuracy of multiple sequence alignment
Nucleic Acids Research, 2005
3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments
Journal of Molecular Biology, 2004
MUSCLE: multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Research, 2004
Pairwise sequence alignment below the twilight zone
Journal of Molecular Biology, 2001
T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. Thornton
Journal of Molecular Biology, 2000
Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von Heijne
Journal of Molecular Biology, 1999
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
An Evolutionary Trace Method Defines Binding Surfaces Common to Protein Families
Journal of Molecular Biology, 1996
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Molecular recognition
Journal of Molecular Biology, 1991

Cited by 93 articles