Cluster analysis of amino acid indices for prediction of protein structure and function

Abstract
The relationship among 222 published indices representing various physicochemical and biochemical properties of amino acid residues has been investigated by hierarchical cluster analysis. The clustering result is illustrated by the minimum spanning tree, which is conveniently divided into four regions: α and turn propensities, α propensity, hydrophobicity and other physicochemical properties including, among others, bulkiness of amino acid residues. In addition, several subclasses of hydrophobicity scales have been identified: preference of inside and outside, accessible surface area, surrounding hydrophobicity and other mostly experimental scales including transfer free energy, partition coefficients, HPLC parameters and polarity. Representative amino acid indices are identified in each of these groups. The collection of amino acid indices is a useful resource for empirical analyses correlating sequence information with structural and functional properties of proteins. As an example, the indices that best reproduce the amino acid mutation data matrix are searched against this collection.