Abstract
Comparative systematic studies may use different sets of data or different sets of taxa to evaluate the quality of phylogenetic data and phylogenetic hypotheses based on levels of homoplasy as implied by the length of minimum length trees. For comparisons involving diverse arrays of characters and taxa, an appropriate index is required to permit comparisons. Examination of the number of steps per character (NSC) on minimum length trees for 28 data sets revealed a highly significant positive correlation between NSC and the number of taxa and, concomitantly, a highly significant negative correlation between the consistency index (CI) of Kluge and Farris (1969) and the number of taxa. Theoretical expectations from a study of the number of steps on random and minimum length trees (Archie and Felsenstein, 1989) agree with this finding. Computer simulations using randomly selected subsets of taxa and characters from the Drosophila data set of Throckmorton (1968) revealed a similar finding. The latter two studies also revealed a negative relationship between the CI and the number of characters in a study. These findings imply that the CI is not an appropriate index of homoplasy for comparative taxonomic studies. A new index, the homoplasy excess ratio (HER), is introduced that takes into account the expected increase in overall homoplasy levels with increasing numbers of taxa in systematic studies. The properties of HER are examined for the 28 data sets taken from the literature and, in conjunction with the simulations using the Drosophila data set, HER is shown to be more appropriate than CI in comparative taxonomic studies that wish to measure the average level of homoplasy in data sets involving different groups of taxa or different characters. Because HER is a computationally intensive statistic to calculate, two estimates are derived and examined. These estimates, the random expected homoplasy excess ratio (REHER) and the homoplasy excess ratio minimum (HERM), can be easily calculated from the formulas of Archie and Felsenstein (1989) and from intrinsic properties of the data matrix, respectively. HERM is shown to be a better estimator of HER and a linear regression equation is derived to estimate HER from HERM.