Orthology prediction methods: A quality assessment using curated protein families
Open Access
- 19 August 2011
- Vol. 33 (10), 769-780
- https://doi.org/10.1002/bies.201100062
Abstract
The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community.Keywords
This publication has 61 references indexed in Scilit:
- Computational methods for Gene Orthology inferenceBriefings in Bioinformatics, 2011
- Simultaneous Bayesian gene tree reconstruction and reconciliation analysisProceedings of the National Academy of Sciences, 2009
- The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoansNature, 2008
- TreeFam: 2008 UpdateNucleic Acids Research, 2007
- OrthoDB: the hierarchical catalog of eukaryotic orthologsNucleic Acids Research, 2007
- eggNOG: automated construction and annotation of orthologous groups of genesNucleic Acids Research, 2007
- Gel-forming mucins appeared early in metazoan evolutionProceedings of the National Academy of Sciences, 2007
- Clustal W and Clustal X version 2.0Bioinformatics, 2007
- Automatic genome-wide reconstruction of phylogenetic gene treesBioinformatics, 2007
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004