Non-canonical inteins

Abstract
Previous analyses have shown that inteins (protein splicing elements) employ two structural organizations: the 'canonical' Nintein-Dod-inteinC found in dozens of inteins and a 'non-canonical' Nintein-inteinC described in two inteins, where Nintein at the N-terminus and inteinC at the C-terminus are conserved domains involved in self-splicing and Dod is the Dod DNA endonuclease (DNase). In this study, four non-canonical inteins, each with unique structural features, have been identified using alignment-based Hidden Markov Models. A Nintein-inteinC intein, carrying an unprecedented replacement of the N-terminal catalytic Cys(Ser) by Ala, is described in a putative ATPase encoded by Methanococcus jannaschii . Three replicative proteins of Synechocystis spp. contain inteins with the organizations: (i) Nintein minus X minus inteinC over Dod, where X is an uncharacterized domain and Dod DNase is located in an alternative open reading frame (ORF) being embedded between two novel CG and YK domains; (ii) Nintein-HN-inteinC, where HN stands for phage-like DNase from the EX1H-HX3H family; (iii) Nintein>||< indicates that the intein domains are associated with a disrupted host protein encoded by two spatially separated ORFs. The expression of some of these newly identified inteins may affect the intein hosts. The variety of structural forms of inteins could have evolved through invasion of self-splicing proteases by different mobile DNases or the departure of mobile DNases from canonical inteins.