Hierarchical Classification of Community Data

Abstract
The application of hierarchical classification to ecological community data is examined, using a variety of classification techniques and test data sets. Problems discussed include the choice of a conceptual space in which points representing samples or species or both are located; the effects of random noise and nonlinearity; the degree to which clusters are natural to a data set or are imposed by the clustering technique; the choice of criteria for locating divisions; clustering strategies (non-hierarchical vs. hierarchical, divisive vs. agglomerative and polythetic vs. monothetic); the presentation of results of various clustering techniques; and methods for evaluating and comparing clustering techniques and their results. Five hierarchical clustering techniques are compared: complete linkage clustering, the unweighted pair group method using arithmetic averages, minimization of within-group dispersion, 2-way indicator species analysis, and partitioning of an ordination space (using detrended correspondence analysis, a modification of reciprocal averaging). The first 3 techniques are agglomerative and the last 2 are divisive. Data sets for tests include simulated data sets in 1 to 4 dimensions (some incorporating noise of 3 kinds), and field data varying in number of samples, noise level and number and length of community gradients. Two-way indicator species analysis is usually the best, but there are cases in which other techniques may be complementary or superior. Theoretical requirements and test results are discussed to show why clustering of ecological community data is usually best approached by a divisive strategy. This conclusion is important because the analysis may be stopped after a limited number of divisions, thus needing less computation than do agglomerative strategies. The 2 divisive techniques discussed here have computer requirements which rise only linearly with the amount of data, making analysis of large data sets practical.