An Evolutionary Analysis of the Helix-Hairpin-Helix Superfamily of DNA Repair Glycosylases

Abstract
The helix-hairpin-helix (HhH) superfamily of base excision repair DNA glycosylases is composed of multiple phylogenetically diverse enzymes that are capable of excising varying spectra of oxidatively and methyl-damaged bases. Although these DNA repair glycosylases have been widely studied through genetic, biochemical, and biophysical approaches, the evolutionary relationships of different HhH homologs and the extent to which they are conserved across phylogeny remain enigmatic. We provide an evolutionary framework for this pervasive and versatile superfamily of DNA glycosylases. Six HhH gene families (named AlkA: alkyladenine glycosylase; MpgII: N-methylpurine glycosylase II; MutY/Mig: A/G-specific adenine glycosylase/mismatch glycosylase; Nth: endonuclease III; OggI: 8-oxoguanine glycosylase I; and OggII: 8-oxoguanine glycosylase II) are identified through phylogenetic analysis of 234 homologs found in 94 genomes (16 archaea, 64 bacteria, and 14 eukaryotes). The number of homologs in each gene family varies from 117 in the Nth family (nearly every genome surveyed harbors at least one Nth homolog) to only five in the divergent OggII family (all from archaeal genomes). Sequences from all three domains of life are included in four of the six gene families, suggesting that the HhH superfamily diversified very early in evolution. The phylogeny provides evidence for multiple lineage-specific gene duplication events, most of which involve eukaryotic homologs in the Nth and AlkA gene families. We observe extensive variation in the number of HhH superfamily glycosylase genes present in different genomes, possibly reflecting major differences among species in the mechanisms and pathways by which damaged bases are repaired and/or disparities in the basic rates and spectra of mutation experienced by different genomes.