eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations
Open Access
- 7 November 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 38 (suppl_1), D190-D195
- https://doi.org/10.1093/nar/gkp951
Abstract
The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a 2-fold increase relative to the previous version. The pipeline yielded 224 847 OGs, including 9724 extended versions of the original COG and KOG. We computed OGs for different levels of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the non-supervised orthologous groups (NOGs) with functional descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2 242 035 proteins (built from 2 590 259 proteins) and provides a broad functional description for at least 1 966 709 (88%) of them. Users can access the complete set of orthologous groups via a web interface at: http://eggnog.embl.de.Keywords
This publication has 43 references indexed in Scilit:
- Jalview Version 2—a multiple sequence alignment editor and analysis workbenchBioinformatics, 2009
- Recent developments in the MAFFT multiple sequence alignment programBriefings in Bioinformatics, 2008
- InParanoid 6: eukaryotic ortholog clusters with inparalogsNucleic Acids Research, 2007
- Automatic genome-wide reconstruction of phylogenetic gene treesBioinformatics, 2007
- MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groupsNucleic Acids Research, 2006
- Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotationBioinformatics, 2006
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- OrthoMCL: Identification of Ortholog Groups for Eukaryotic GenomesGenome Research, 2003
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994