Population Stratification of a Common APOBEC Gene Deletion Polymorphism

Abstract
The APOBEC3 gene family plays a role in innate cellular immunity inhibiting retroviral infection, hepatitis B virus propagation, and the retrotransposition of endogenous elements. We present a detailed sequence and population genetic analysis of a 29.5-kb common human deletion polymorphism that removes the APOBEC3B gene. We developed a PCR-based genotyping assay, characterized 1,277 human diversity samples, and found that the frequency of the deletion allele varies significantly among major continental groups (global FST = 0.2843). The deletion is rare in Africans and Europeans (frequency of 0.9% and 6%), more common in East Asians and Amerindians (36.9% and 57.7%), and almost fixed in Oceanic populations (92.9%). Despite a worldwide frequency of 22.5%, analysis of data from the International HapMap Project reveals that no single existing tag single nucleotide polymorphism may serve as a surrogate for the deletion variant, emphasizing that without careful analysis its phenotypic impact may be overlooked in association studies. Application of haplotype-based tests for selection revealed potential pitfalls in the direct application of existing methods to the analysis of genomic structural variation. These data emphasize the importance of directly genotyping structural variation in association studies and of accurately resolving variant breakpoints before proceeding with more detailed population-genetic analysis. Several recent studies have demonstrated that deletions, duplications, and inversions contribute a substantial fraction of the total amount of variation present in the human genome. In this study, we provide a comprehensive population-genetic analysis of a single deletion previously identified by comparing the genome of a single individual against the human genome reference sequence. Complete genomic sequence spanning the deleted region was obtained, allowing us to define the deletion breakpoints and develop a direct genotyping assay. Analysis showed that the deletion removes a member of a gene family involved in the innate immune response against viral pathogens. We genotyped samples from a human diversity panel and found drastic differences in the frequency of the deletion around the world. Using data from the HapMap project and the application of existing analysis techniques, we illustrate the importance of directly genotyping this type of variation and of clearly defining its boundaries. Without this level of detail the potential functional importance of such variation may be missed.