The Genomic Distribution and Local Context of Coincident SNPs in Human and Chimpanzee

Abstract
We have previously shown that there is an excess of sites that are polymorphic at orthologous positions in humans and chimpanzees and that this is most likely due to cryptic variation in the mutation rate. We showed that this might be a consequence of complex context effects since we found significant heterogeneity in triplet frequencies around coincident single nucleotide polymorphism (SNP) sites. Here, we show that the heterogeneity in triplet frequencies is not specifically associated with coincident SNPs but is instead driven by base composition bias around CpG dinucleotides. As a result, we suggest that cryptic variation in the mutation rate is truly cryptic, in the sense that the mutation rate does not appear to depend on any specific primary sequence context. Furthermore, we propose that the patterns around CpG dinucleotides are driven by the mutability of CpG dinucleotides in different DNA contexts. We also show that the genomic distribution of coincident SNPs is nonuniform and that there are some subtle differences between the distributions of single and coincident SNPs. Furthermore, we identify regions that contain high numbers of coincident SNPs and suggest that one in particular, a region containing the gene PRIM2, may be under balancing selection.