Duplication-Based Measures of Difference Between Gene and Species Trees

Abstract
In the framework of a duplication-based method for comparing gene and species trees, the concepts of "duplication" and "loss" are reformulated in set-theoretic terms. A number of related tree dissimilarity measures is suggested, and relations between them are analyzed. For any node in the species tree, the number of gene duplications for which it is a "non-child" loss coincides with the number of times when the node's parent is an intermediate between the mapping images of a gene node and its parent. This implies that the total number of losses is equal to the number of intermediate nodes plus the number of one-side duplications and, thus, provides an alternative proof for a conjecture made by Mirkin, Muchnik, and Smith (1995). Another formula proven involves crossings (incompatible gene-species node pairs): the number of losses equals the number of crossings plus the number of duplications.