Recognition of analogous and homologous protein folds--assessment of prediction success and associated alignment accuracy using empirical substitution matrices

Abstract
Fold recognition methods aim to use the information in the known protein structures (the targets) to identify that the sequence of a protein of unknown structure (the probe) will adopt a known fold. This paper highlights that the structural similarities sought by these methods can be divided into two types: remote homologues and analogues. Homologues are the result of divergent evolution and often share a common function. We define remote homologues as those that are not easily detectable by sequence comparison methods alone. Analogues do not have a common ancestor and generally do not have a common function. Several sets of empirical matrices for residue substitution, secondary structure conservation and residue accessibility conservation have previously been derived from aligned pairs of remote homologues and analogues (Russell et al., J. Mol. Biol., 1997, 269, 423-439). Here a method for fold recognition, FOLDFIT, is introduced that uses these matrices to match the sequences, secondary structures and residue accessibilities of the probe and target. The approach is evaluated on distinct datasets of analogous and remotely homologous folds. The accuracy of FOLDFIT with the different matrices on the two datasets is contrasted to results from another fold recognition method (THREADER) and to searches using mutation matrices in the absence of any structural information. FOLDFIT identifies at top rank 12 out of 18 remotely homologous folds and five out of nine analogous folds. The average alignment accuracies for residue and secondary structure equivalencing are much higher for homologous folds (residue approximately 42%, secondary structure approximately 78%) than for analogues folds (approximately 12%, approximately 47%). Sequence searches alone can be successful for several homologues in the testing sets but nearly always fail for the analogues. These results suggest that the recognition of analogous and remotely homologous folds should be assessed separately. This study has implications for the development and comparative evaluation of fold recognition algorithms.