Taxonomic hierarchy of HLA class I allele sequences

Abstract
The markedly high levels of polymorphism present in classical class I loci of the human major histocompatibility complex have been implicated in infectious and immune disease recognition. The large numbers of alleles present at these loci have, however, limited efforts to verify associations between individual alleles and specific diseases. As an approach to reduce allele diversity to hierarchical evolutionarily related groups, we performed phylogenetic analyses of available HLA-A, B and C allele complete sequences (n = 216 alleles) using different approaches (maximum parsimony, distance-based minimum evolution and maximum likelihood). Full nucleotide and amino acid sequences were considered as well as abridged sequences from the hypervariable peptide binding region, known to interact in vivo, with HLA presented foreign peptide. The consensus analyses revealed robust clusters of 36 HLA-C alleles concordant for full and PBR sequence analyses. HLA-A alleles (n = 60) assorted into 12 groups based on full nucleotide and amino acid sequence which with few exceptions recapitulated serological groupings, however the patterns were largely discordant with clusters prescribed by PBR sequences. HLA-B which has the most alleles (n = 120) and which unlike HLA-A and -C is thought to be subject to frequent recombinational exchange, showed limited phylogenetic structure consistent with recent selection driven retention of maximum heterozygosity and population diversity. Those allele categories recognized offer an explicit phylogenetic criterion for grouping alleles potentially relevant for epidemiologic associations, for inferring the origin of MHC genome organization, and for comparing functional constraints in peptide presentation of HLA alleles.