Unique genes in giant viruses: Regular substitution pattern and anomalously short size

Abstract
Large DNA viruses, including giant mimivirus with a 1.2-Mb genome, exhibit numerous orphan genes possessing no database homologs or genes with homologs solely in close members of the same viral family. Due to their solitary nature, the functions and evolutionary origins of those genes remain obscure. We examined sequence features and evolutionary rates of viral family-specific genes in three nucleo-cytoplasmic large DNA virus (NCLDV) lineages. First, we showed that the proportion of family-specific genes does not correlate with sequence divergence rate. Second, position-dependent nucleotide statistics were similar between family-specific genes and the remaining genes in the genome. Third, we showed that the synonymous-to-nonsynonymous substitution ratios in those viruses are at levels comparable to those estimated for vertebrate proteomes. Thus, the vast majority of family-specific genes do not exhibit an accelerated evolutionary rate, and are thus likely to specify functional polypeptides. On the other hand, these family-specific proteins exhibit several distinct properties: (1) they are shorter, (2) they include a larger fraction of predicted transmembrane proteins, and (3) they are enriched in low-complexity sequences. These results suggest that family-specific genes do not correspond to recent horizontal gene transfer. We propose that their characteristic features are the consequences of the specific evolutionary forces shaping the viral gene repertoires in the context of their parasitic lifestyles.