• 1 January 1986
    • journal article
    • research article
    • Vol. 303 (13), 541-6
Abstract
The sequences of related proteins show the alternance of conserved and variable regions. This fact is generally seen as a reverberation of 3 D constraints onto 1 D structures. Although the exact meaning of such constraints remains elusive, conserved regions can be extracted from protein chains and used to align them. We developed a program that efficiently performs this task. The program constructs symbolic motifs fitting a target subsequence present in every chain without requiring any insertion or deletion. However, a motif can be obliterated by substitutions when it is found in a sequence. The motifs formally consist in aminoacid symbols separated (and virtually preceded and followed) by a variable number of wild-card symbols. A wild-card, which can match any aminoacid of the chains (with no increment of score), represents a variable site within conserved regions. Different motifs are progressively built by substituting a wild-card with an aminoacid symbol within or beside preexisting motifs. Only those motifs showing an outstanding association of high matching score over all chains, and of low deviation between extreme scores over individual chains are selected for making the next generation. Starting with a null motif, the construction ends when no new aminoacid can be introduced into the current motifs. A surviving motif is then considered valid if it maps without ambiguity a unique region in every sequence, and the motif with highest score is finally selected. The construction of new motifs is then reinitated for the left and right parts of the sequences, after these have been split by the previously selected motif.(ABSTRACT TRUNCATED AT 250 WORDS)