Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels

Abstract
An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. A preliminary multiple sequence alignment is performed using sequence information, which then determines an initial superposition of the structures. A structure comparison algorithm is applied to all pairs of proteins in the superimposed set and a similarity tree calculated. Multiple sequence alignments are then generated by following the tree from the branches to the root. At each branchpoint of the tree, a structure‐based sequence alignment and coordinate transformations are output, with the multiple alignment of all structures output at the root The algorithm encoded in STAMP (Structural Alignment of Multiple Proteins) is shown to give alignments in good agreement with published structural accounts within the dehydrogenase fold domains, globins, and serine proteinases. In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment. Sc quantifies the global structural similarity between pairs or groups of proteins, whereas Pij′ provides a normalized measure of the confidence in the alignment of each residue. STAMP alignments have the quality of each alignment characterized by Sc and Pij′ values and thus provide a reproducible resource for studies of residue conservation within structural motifs.