Orthology prediction methods: A quality assessment using curated protein families

Open Access

19 August 2011

journal article
research article
Published by Wiley in BioEssays

Vol. 33 (10), 769-780
https://doi.org/10.1002/bies.201100062

Abstract

The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community.

Keywords

This publication has 61 references indexed in Scilit:

Computational methods for Gene Orthology inference
Briefings in Bioinformatics, 2011
Simultaneous Bayesian gene tree reconstruction and reconciliation analysis
Proceedings of the National Academy of Sciences, 2009
The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans
Nature, 2008
TreeFam: 2008 Update
Nucleic Acids Research, 2007
OrthoDB: the hierarchical catalog of eukaryotic orthologs
Nucleic Acids Research, 2007
eggNOG: automated construction and annotation of orthologous groups of genes
Nucleic Acids Research, 2007
Gel-forming mucins appeared early in metazoan evolution
Proceedings of the National Academy of Sciences, 2007
Clustal W and Clustal X version 2.0
Bioinformatics, 2007
Automatic genome-wide reconstruction of phylogenetic gene trees
Bioinformatics, 2007
MUSCLE: multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Research, 2004

Cited by 128 articles