Abstract
Diversity is the hard currency of ecologists. Various statistics have been developed for summarizing the diversity of an eco- logical community. A commonly adopted summary statistic is the Shannon-Weiner index: H pilnpi, where pi is the frequency of the ith species. In addition, species richness (the number of different species) often is reported, and recent work emphasizes the importance of accurate estimates of species richness when ecological communities and processes that affect the composition of communities and the function of ecosys- tems are described (5). The significance of diversity is often inferred by comparing communities characterized from differ- ent environments. Typically, such comparisons rely on stan- dard measures of overlap, including the percentage of species shared by two communities or similarity indices. One of the indices used is Sorensen's index: S S12/(0.5(S1 S2)), where S12 is the number of species common to both sites and Si is the number of species found at site i. A limitation of traditional statistics for describing and com- paring diversity is that species (or operational taxonomic units (OTUs)) are defined inconsistently. For instance, Kroes et al. (6) defined an OTU as a 16S ribosomal DNA (rDNA) se- quence group in which sequences differed by less than 1%. By contrast, the definition of McCaig et al. (11) included se- quences that were less than 3% different, and other studies have used 5% as the magic number. The lack of consensus limits the comparative utility of statistics based solely on iden- tification of species (or OTUs). A second, and perhaps more important, limitation of the standard statistics of diversity is that OTUs are counted equivalently even though some may be highly divergent and phylogenetically unique, whereas others may be part of a closely related group of species and are therefore phylogenetically redundant (4). The contrast can be illustrated by comparing two hypothetical communities in which the numbers of species, the richness profiles of species, and the rarefaction profiles are identical but which differ in the magnitude of phylogenetic diversity (i.e., the degree of diver- gence among the sampled sequences). Standard ecological sta- tistics of diversity would miss the genetic difference between the two communities, and ecologists would most likely con- sider the two communities equally diverse when, in fact, one community harbors more genetic diversity (or disparity) than the other. Because genetic variation and phenotypic variance often are positively correlated in populations of animals (12), plants (7), and microbes (15), descriptions of microbial com- munities based on DNA data should include information about diversity and disparity. This is especially important in light of studies demonstrating an association between ecosystem func- tion and community diversity (14, 28). In this review I introduce various statistics borrowed from population genetics and systematics for describing and com- paring the diversity evident from samples of gene sequences. I briefly introduce the statistics and methodological underpin- nings of tests for differences between communities, and I use the methods to analyze well-described microbial communities. I show that information gained from analysis of DNA se- quences provides the basis for statistical analysis of communi- ties in ways that advance inferences about the processes that may govern the compositions and functions of microbial com- munities. Furthermore, the advocated analytical approaches make it possible to accomplish broad comparisons of ecologi- cal communities. The methods of analysis explored in this paper are meant to be complementary to other methods, such as the robust estimation of richness advocated by Hughes et al. (5) and approaches for estimating functional properties of bacteria from phylogenetic inference (16).