Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection

Open Access

13 August 2006

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 34 (17), e112
https://doi.org/10.1093/nar/gkl480

Abstract

The accuracy of a homology model based on the structure of a distant relative or other topologically equivalent protein is primarily limited by the quality of the alignment. Here we describe a systematic approach for sequence-to-structure alignment, called ‘K*Sync’, in which alignments are generated by dynamic programming using a scoring function that combines information on many protein features, including a novel measure of how obligate a sequence region is to the protein fold. By systematically varying the weights on the different features that contribute to the alignment score, we generate very large ensembles of diverse alignments, each optimal under a particular constellation of weights. We investigate a variety of approaches to select the best models from the ensemble, including consensus of the alignments, a hydrophobic burial measure, low- and high-resolution energy functions, and combinations of these evaluation methods. The effect on model quality and selection resulting from loop modeling and backbone optimization is also studied. The performance of the method on a benchmark set is reported and shows the approach to be effective at both generating and selecting accurate alignments. The method serves as the foundation of the homology modeling module in the Robetta server.

Keywords

This publication has 88 references indexed in Scilit:

FFAS03: a server for profile-profile sequence alignments
Nucleic Acids Research, 2005
Within the twilight zone: a sensitive profile-profile comparison tool based on information theory
Journal of Molecular Biology, 2002
Assessment of the CASP4 fold recognition category
Proteins-Structure Function and Bioinformatics, 2001
Modeling of loops in protein structures
Protein Science, 2000
Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von Heijne
Journal of Molecular Biology, 1999
GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences
Journal of Molecular Biology, 1999
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods
Journal of Molecular Biology, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
A simple method to generate non-trivial alternate alignments of protein sequences
Journal of Molecular Biology, 1991
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983

Cited by 125 articles