Comparative Analysis of Amino Acid Usage and Protein Length Distribution Between Alternatively and Non-alternatively Spliced Genes Across Six Eukaryotic Genomes

Abstract
Alternative splicing has been discovered in nearly all metazoan organisms as a mechanism to increase the diversity of gene products. However, the origin and evolution of alternatively spliced genes are still poorly understood. To understand the mechanisms for the evolution of alternatively spliced genes, it may be important to study the differences between alternatively and non-alternatively spliced genes. The aim of this research was to compare amino acid usage and protein length distribution between alternatively and non-alternatively spliced genes across six nearly complete eukaryotic genomes, including those of human (Homo sapiens), mouse (Mus musculus), rat (Rattus norvegicus), fruit fly (Drosophila melanogaster), Caenorhabditis elegans, and bovine (Bos taurus). Our results have suggested the following: (1) across the six species, alternatively and non-alternatively spliced genes have very similar tendency for amino acids usage for not only the overall scale but also those highly expressed genes, with all of the highly expressed genes having preferred amino acids including A, E, G, K, L, P, S, V, R, T, and D. (2) For not only the overall genes but also those highly expressed ones, the average length of the protein products of alternatively spliced genes is significantly greater than that of non-alternatively spliced ones. In contrast, distributions of protein lengths for the two groups of genes are very similar among all six species. Based on these results, we propose that alternatively spliced genes may have originated from non-alternatively spliced ones through events such as DNA mutations or gene fusion.