Enriching the sequence substitution matrix by structural information

22 October 2003

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 54 (1), 41-48
https://doi.org/10.1002/prot.10474

Abstract

A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003.

Keywords

This publication has 10 references indexed in Scilit:

A novel fold recognition method using composite predicted secondary structures
Proteins-Structure Function and Bioinformatics, 2002
Within the twilight zone: a sensitive profile-profile comparison tool based on information theory
Journal of Molecular Biology, 2002
Protein Recognition by Sequence‐to‐Structure Fitness: Bridging Efficiency and Capacity of Threading Models
Published by Wiley ,2002
Linear programming optimization and a double statistical filter for protein threading protocols
Proteins-Structure Function and Bioinformatics, 2001
GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences
Journal of Molecular Biology, 1999
Protein structure alignment by incremental combinatorial extension (CE) of the optimal path
Protein Engineering, Design and Selection, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
A Surface of Minimum Area Metric for the Structural Comparison of Proteins
Journal of Molecular Biology, 1996
Amino acid substitution matrices from protein blocks.
Proceedings of the National Academy of Sciences, 1992
Identification of common molecular subsequences
Journal of Molecular Biology, 1981

Cited by 85 articles