The COG database: a tool for genome-scale analysis of protein functions and evolution
Top Cited Papers
Open Access
- 1 January 2000
- journal article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 28 (1), 33-36
- https://doi.org/10.1093/nar/28.1.33
Abstract
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.Keywords
This publication has 11 references indexed in Scilit:
- Phylogenetic Classification and the Universal TreeScience, 1999
- Gene Families: The Taxonomy of Protein Paralogs and ChimerasScience, 1997
- A Genomic Perspective on Protein FamiliesScience, 1997
- Genome sequences: Genome sequence of a model prokaryoteCurrent Biology, 1997
- Differential genome displayTrends in Genetics, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaeaMolecular Microbiology, 1997
- Uses for evolutionary treesPhilosophical Transactions Of The Royal Society B-Biological Sciences, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Distinguishing Homologous from Analogous ProteinsSystematic Zoology, 1970