Abstract
Few paleontological studies of species distribution in time and space have adequately considered the effects of sample size. Most species occur very infrequently, and therefore sample size effects may be large relative to the faunal patterns reported. Examination of 10 carefully compiled large data sets (each more than 1,000 occurrences) reveals that the species-occurrence frequency distribution of each fits the log series distribution well and therefore sample size effects can be predicted. Results show that, if the materials used in assembling a large data set are resampled, as many as 25% of the species will not be found a second time even if both samples are of the same size. If the two samples are of unequal size, then the larger sample may have as many as 70% unique species and the smaller sample no unique species. The implications of these values are important to studies of species richness, origination, and extinction patterns, and biogeographic phenomena such as endemism or province boundaries. I provide graphs showing the predicted sample size effects for a range of data set size, species richness, and relative data size. For data sets that do not fit the log series distribution well, I provide example calculations and equations which are usable without a large computer. If these graphs or equations are not used, then I suggest that species which occur infrequently be eliminated from consideration. Studies in which sample size effects are not considered should include sample size information in sufficient detail that other workers might make their own evaluation of observed faunal patterns.