Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes

Abstract
Where did vertebrate genes come from? Here we address this question by analyzing eight completely sequenced land vertebrate genomes and six completely sequenced invertebrate genomes. Approximately 70% of the vertebrate genes can be found in the six invertebrate genomes with the standard homology search criteria (denoted as V.MCL), another ∼6% can be found with relaxed search criteria, and an additional ∼2% can be found in sequenced fungal and bacterial genomes. Thus, a substantial proportion of vertebrate genes (∼22%) cannot be found in the nonvertebrate genomes studied (denoted as Vonly). Interestingly, genes in Vonly are predominantly singletons, while the majority of genes in the other three groups belong to gene families. The proteins of Vonly tend to evolve faster than those of V.MCL. Surprisingly, in many cases the family sizes in V.MCL are only as large as or even smaller than their counterparts in the invertebrates, contrary to the general perception of a larger family size in vertebrates. Interestingly, in comparison with the family size in invertebrates, vertebrate gene families involved in regulation, signal transduction, transcription, protein transport, and protein modification tend to be expanded, whereas those involved in metabolic processes tend to be contracted. Furthermore, for almost all of the functional categories with family size expansion in vertebrates, the number of gene types (i.e., the number of singletons plus the number of gene families) tends to be over-represented in Vonly, but under-represented in V.MCL. Our study suggests that gene function is a major determinant of gene family size.