Global discovery of primate-specific genes in the human genome

Abstract
The genomic basis of primate phenotypic uniqueness remains obscure, despite increasing genome and transcriptome sequence data availability. Although factors such as segmental duplications and positive selection have received much attention as potential drivers of primate phenotypes, single-copy primate-specific genes are poorly characterized. To discover such genes genomewide, we screened a catalog of 38,037 human transcriptional units (TUs), compiled from EST and cDNA sequences in conjunction with the FANTOM3 transcriptome project. We identified 131 TUs from transcribed sequences residing within primate-specific insertions in 9-species sequence alignments and outside of segmental duplications. Exons of 120 (92%) of the TUs contained interspersed repeats, indicating that repeat insertions may have contributed to primate-specific gene genesis. Fifty-nine (46%) primate-specific TUs may encode proteins. Although primate-specific TU transcript lengths were comparable to known human gene mRNA lengths overall, 92 (70%) primate-specific TUs were single-exon. Thirty-two (24%) primate-specific TUs were localized to subtelomeric and pericentromeric regions. Forty (31%) of the TUs were nested in introns of known genes, indicating that primate-specific TUs may arise within older, protein-coding regions. Primate-specific TUs were preferentially expressed in reproductive organs and tissues (P < 0.011), consistent with the expectation that emergence of new, lineage-specific genes may accompany speciation or reproduction. Of the 33 primate-specific TUs with human Affymetrix microarray probe support, 21 were differentially expressed in human teratozoospermia. In addition to elucidating the likely functional relevance of primate-specific TUs to reproduction, we present a set of primate-specific genes for future functional studies, and we implicate nonduplicated pericentromeric and subtelomeric regions in gene genesis.