Bias of Selection on Human Copy-Number Variants

Abstract
Although large-scale copy-number variation is an important contributor to conspecific genomic diversity, whether these variants frequently contribute to human phenotype differences remains unknown. If they have few functional consequences, then copy-number variants (CNVs) might be expected both to be distributed uniformly throughout the human genome and to encode genes that are characteristic of the genome as a whole. We find that human CNVs are significantly overrepresented close to telomeres and centromeres and in simple tandem repeat sequences. Additionally, human CNVs were observed to be unusually enriched in those protein-coding genes that have experienced significantly elevated synonymous and nonsynonymous nucleotide substitution rates, estimated between single human and mouse orthologues. CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease. Despite mouse CNVs also exhibiting a significant elevation in synonymous substitution rates, in most other respects they do not differ significantly from the genomic background. Nevertheless, they encode proteins that are depleted in olfactory function, and they exhibit significantly decreased amino acid sequence divergence. Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage. By contrast, the functional characteristics of mouse CNVs either suggest that advantageous gene copies have been depleted during recent selective breeding of laboratory mouse strains or suggest that they were preferentially fixed as a consequence of the larger effective population size of wild mice. It thus appears that CNV differences among mouse strains do not provide an appropriate model for large-scale sequence variations in the human population. Until recently, it was thought that most inherited human diversity results from genetic variation at single nucleotide sites. However, recent studies discovered many larger-scale differences, involving the duplication or deletion of thousands of bases. Do these large-scale differences contribute greatly to characteristics of human individuals, or are they of little consequence? For clues to solve this mystery the authors looked to the signatures of adaptive evolution written into the DNA. They reasoned that if large-scale DNA differences are beneficial, they should be enriched in genes, particularly those involved in fighting infection and sensing our environment. The authors discovered such enrichments indicating that some large-scale sequence differences have been advantageous during the last approximately 100,000 y of human history. By contrast, modern laboratory mice exhibit few signs of beneficial large-scale DNA differences, perhaps because advantageous sequences have swept rapidly through their ancestral populations. Some large-scale variations in human genomes thus appear to be a legacy of past evolutionary challenges to our species.