Genome-Wide Survey for Biologically Functional Pseudogenes
Open Access
- 5 May 2006
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 2 (5), e46
- https://doi.org/10.1371/journal.pcbi.0020046
Abstract
According to current estimates there exist about 20,000 pseudogenes in a mammalian genome. The vast majority of these are disabled and nonfunctional copies of protein-coding genes which, therefore, evolve neutrally. Recent findings that a Makorin1 pseudogene, residing on mouse Chromosome 5, is, indeed, in vivo vital and also evolutionarily preserved, encouraged us to conduct a genome-wide survey for other functional pseudogenes in human, mouse, and chimpanzee. We identify to our knowledge the first examples of conserved pseudogenes common to human and mouse, originating from one duplication predating the human–mouse species split and having evolved as pseudogenes since the species split. Functionality is one possible way to explain the apparently contradictory properties of such pseudogene pairs, i.e., high conservation and ancient origin. The hypothesis of functionality is tested by comparing expression evidence and synteny of the candidates with proper test sets. The tests suggest potential biological function. Our candidate set includes a small set of long-lived pseudogenes whose unknown potential function is retained since before the human–mouse species split, and also a larger group of primate-specific ones found from human–chimpanzee searches. Two processed sequences are notable, their conservation since the human–mouse split being as high as most protein-coding genes; one is derived from the protein Ataxin 7-like 3 (ATX7NL3), and one from the Spinocerebellar ataxia type 1 protein (ATX1). Our approach is comparative and can be applied to any pair of species. It is implemented by a semi-automated pipeline based on cross-species BLAST comparisons and maximum-likelihood phylogeny estimations. To separate pseudogenes from protein-coding genes, we use standard methods, utilizing in-frame disablements, as well as a probabilistic filter based on Ka/Ks ratios. Svensson, Arvestad, and Lagergren conducted a genome-wide survey for and analysis of human pseudogenes, i.e., gene copies with lost protein-coding ability, with the aim of discovering biologically functional ones. Their main motivation was a 2002 Nature paper revealing in vivo functionality for a mouse Makorin pseudogene, Makorin1-p1. Their work is in line with extensive research in recent years concerning ncRNA. The method consists of a BLAST-based pipeline augmented by modern maximum-likelihood phylogeny estimations. Several examples of unknown genes and present in silico tests favoring the hypothesis that these are functional pseudogenes were found. In the result set, there are two examples from the Ataxin family; a poorly characterized gene family which, however, includes a number of genes related to neurogenerative disorders. A discovery of new members in this gene family should be of great interest to experimentalists in the field. To the best of our knowledge, functional pseudogenes have never been observed in humans. The results suggest, however, that while functional pseudogenes are relatively rare on a long evolutionary timescale, they nevertheless exist. These deserve attention, of course, similar to any other previously uncharacterised gene.Keywords
This publication has 35 references indexed in Scilit:
- A role for both wild-type and expanded ataxin-7 in transcriptional regulationNeurobiology of Disease, 2005
- The “Inverse Relationship Between Evolutionary Rate and Age of Mammalian Genes” Is an Artifact of Increased Genetic Distance with Rate of Evolution and Time of DivergenceMolecular Biology and Evolution, 2005
- Integrated Pseudogene Annotation for Human Chromosome 22: Evidence for TranscriptionJournal of Molecular Biology, 2005
- Reconstructing the Genomic Architecture of Ancestral Mammals: Lessons From Human, Mouse, and Rat GenomesGenome Research, 2004
- Pseudogenes: Are They “Junk” or Functional DNA?Annual Review of Genetics, 2003
- pANT: A Method for the Pairwise Assessment of Nonfunctionalization Times of Processed PseudogenesMolecular Biology and Evolution, 2003
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- Automatic clustering of orthologs and in-paralogs from pairwise species comparisonsJournal of Molecular Biology, 2001
- Vertebrate pseudogenesFEBS Letters, 2000
- On the Distribution of the Number of Successes in Independent TrialsThe Annals of Mathematical Statistics, 1956