Prediction of a common structural domain in aminoacyl-tRNA synthetases through use of a new pattern-directed inference system

Abstract
The aminoacyl-tRNA synthetases are united by a common function with little evidence of a common structural relationship. Outside of an 11 amino acid stretch called the "signature sequence", no global primary sequence similarity exists. The signature sequence matched 4-11 amino acids in several aminoacyl-tRNA synthetases. High-resolution X-ray data are available for two of these enzymes, revealing that their signature sequence regions are small segments of a common mononucleotide binding foldlike structure. A new methodology for the analysis of dissimilar primary sequences supports the expectation that all of the signature sequence regions form a common structure. In our analysis, two complex pattern descriptors were constructed to describe the synthetase mononucleotide binding fold. These were compared to primary sequences annotated with predicted secondary structures and hydropathy profiles. Regions in 8 out of 12 (67%) heterologous aminoacyl-tRNA synthetase groups (where each group is specific for the same amino acid) match the first descriptor, and 7 of these (58%) also match the second descriptor. In contrast, only 4 regions in a set of 54 control proteins (7.4%) match the first descriptor, and only 2 regions (3.7%) match both. Alignment of these 8 regions to the descriptor (1) positions all known signature sequence regions as the first loop of a mononucleotide binding foldlike structure, (2) extends the previous alignments by another 40-odd amino acids, and (3) identifies potential sites in 3 out of 6 heterologous aminoacyl-tRNA synthetases with no previous alignments. Potential sites are also proposed for two additional heterologous synthetases on the basis of matches to less specific descriptors.