Classification and Prediction of Macroinvertebrate Assemblages from Running Waters in Victoria, Australia

Abstract
We constructed predictive models using 2 macroinvertebrate data sets (for both species and family) from bankside habitats at 49 undisturbed reference sites from 6 Victorian river basins; data were accumulated over 4 to 6 sampling occasions. Classification (by unweighted pair-group arithmetic averaging with the Bray-Curtis association measure) showed 3 site groups were evident at the species level and 4 at the family level. A subset of 5 of 22 environmental variables provided maximum discrimination (using stepwise discriminant analysis) between the 3 species site groups; these variables were: conductivity, altitude, substrate heterogeneity, distance of a site from source, and longitude. Four variables discriminated between the 4 family site groups: conductivity, catchment area upstream of site, mean annual discharge, and latitude. From the discriminant analysis, it was possible to predict the group into which an unknown site (specified only by measurements on the 4 or 5 variables just noted) would be placed and thus the probabilities of occurrence of taxa at this site. To test predictive ability, 4 sites were removed at random from the 2 data sets and the classification and discriminant models were recalculated. This process was repeated 5 times. The identity and number of taxa observed at each of these sites were compared with those predicted with a probability of occurrence >50% and the results expressed as a ratio of numbers observed to numbers expected (O/E). This ratio varied from 0.75 to 1.05 at the species level and from 0.83 to 1.12 at the family level, indicating that the fauna conformed with expectation (O/E near 1.0). To test such predictive models on independent data, O/E ratios were also calculated for family data collected in spring at 18 sites from a basin not used in the original models. Two new discriminant models based on single sets of samples from the reference sites taken in spring were constructed for this purpose. O/E ratios varied from 0.09 to 1.01 for the 18 sites and were inversely correlated (r = -0.4 to -0.8) with a range of water quality variables, the values of which increased as water quality deteriorated. The O/E ratio could thus be considered a sensitive measure of disturbance.