Analysis of singleton ORFans in fully sequenced microbial genomes
- 3 September 2003
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 53 (2), 241-251
- https://doi.org/10.1002/prot.10423
Abstract
Singleton sequence ORFans are orphan ORFs (open reading frames) that have no detectable sequence similarity to any other sequence in the databases. ORFans are of particular interest not only as evolutionary puzzles but also because we can learn little about them using bioinformatics tools. Here, we present a first systematic analysis of singleton ORFans in the first 60 fully sequenced microbial genomes. We show that although ORFans have been underemphasized, the number of ORFans is steadily growing, currently accounting for 23,634 sequences. At the same time, the percentage of ORFans as a fraction of all sequences is slowly diminishing, and is currently about 14%. Short ORFans comprise about 61% of all ORFans. The abundance of short ORFans may be due to a yet unexplained artifact. The data also suggest that the number of longer ORFans may soon diminish as more genomes of closely related organisms become available. To better address the questions about the functions and origins of ORFans, we propose to focus further studies on the longer ORFans, with emphasis on three new types of ORFans: ORFan modules, paralogous ORFans, and orthologous ORFans. We conclude that the large number of ORFans reflects an intrinsic property of the genetic material not yet fully understood. Further computational and experimental studies aimed at understanding Nature's protein diversity should also include ORFans. Proteins 2003.Keywords
This publication has 28 references indexed in Scilit:
- Twenty Thousand ORFan Microbial Protein Families for the Biologist?Structure, 2003
- Crystal structure of a major secreted protein of Mycobacterium tuberculosis—MPT63 at 1.5‐Å resolutionProtein Science, 2002
- Genome sequence of the human malaria parasite Plasmodium falciparumNature, 2002
- Structural genomics: A pipeline for providing structures for the biologistProtein Science, 2002
- Microbial genomes multiplyNature, 2002
- A Re‐annotation of the Saccharomyces cerevisiae GenomeComparative and Functional Genomics, 2001
- Escherichia coli ykfE ORFan Gene Encodes a Potent Inhibitor of C-type LysozymeJournal of Biological Chemistry, 2001
- Reverse Transcriptase-Polymerase Chain Reaction Validation of 25 “Orphan” Genes from Escherichia coli K-12 MG1655Genome Research, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae RdScience, 1995