Statistical Modeling, Phylogenetic Analysis and Structure Prediction of a Protein Splicing Domain Common to Inteins and Hedgehog Proteins

Abstract
Inteins, introns spliced at the protein level, and the hedgehog family of proteins involved in eucaryotic development both undergo autocatalytic proteolysis. Here, a specific and sensitive hidden Markov model (HMM) of a protein splicing domain shared by inteins and the hedgehog proteins has been trained and employed for further analysis. The HMM characterizes the common features of this domain including the position where a site-specific DNA endonuclease domain is inserted in the majority of the inteins. The HMM was used to identify several new putative inteins, such as that in the Methanococcus jannaschii klbA protein, and to generate a multiple sequence alignment of sequences possessing this domain. Phylogenetic analysis suggests that hedgehog proteins evolved from inteins. Secondary and tertiary structure predictions suggest that the domain has a structure similar to a β-sandwich. Similarities between the serine protease cleavage mechanism and the protein splicing reaction mechanism are discussed. Examination of the locations of inteins indicates that they are not inserted randomly in an extein, but are often inserted at functionally important positions in the host proteins. A specific and sensitive HMM for a domain present in klbA proteins identified several additional bacterial and archaeal family members, and analysis of the site of insertion of the intein suggests residues that may be functionally important. This domain may play a role in formation of surface-associated protein complexes. Key words: hidden Markov model, intein, hedgehog, endonuclease, klbA domain, protein splicing.