Efficiently Computing the Robinson-Foulds Metric
- 1 July 2007
- journal article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 14 (6), 724-735
- https://doi.org/10.1089/cmb.2007.r012
Abstract
The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day's algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, in sublinear time and with high probability, a (1 + epsilon) approximation of the true RF metric. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We complement our algorithm by presenting an efficient embedding procedure, thereby resolving an open issue from the preliminary version of this paper. We have also improved the performance of Day's (exact) algorithm in practice by using techniques discovered while implementing our approximation scheme. Indeed, we give a unified framework for edge-based tree algorithms in which implementation tradeoffs are clear. Finally, we present detailed experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach. Our new implementation, FastRF, is available as an open-source tool for phylogenetic analysis.Keywords
This publication has 12 references indexed in Scilit:
- A Sublinear-Time Randomized Approximation Scheme for the Robinson-Foulds MetricLecture Notes in Computer Science, 2006
- The Splits in the Neighborhood of a TreeAnnals of Combinatorics, 2004
- Database-friendly random projections: Johnson-Lindenstrauss with binary coinsJournal of Computer and System Sciences, 2003
- A Linear-Time Majority Tree AlgorithmLecture Notes in Computer Science, 2003
- Subtree Transfer Operations and Their Induced Metrics on Evolutionary TreesAnnals of Combinatorics, 2001
- Algorithmic applications of low-distortion geometric embeddingsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2001
- The geometry of graphs and some of its algorithmic applicationsCombinatorica, 1995
- The Discovery and Importance of Multiple Islands of Most-Parsimonious TreesSystematic Zoology, 1991
- Optimal algorithms for comparing trees with labeled leavesJournal of Classification, 1985
- Extensions of Lipschitz mappings into a Hilbert spacePublished by American Mathematical Society (AMS) ,1984