On the Estimation of the Number of Classes in a Population
Open Access
- 1 December 1949
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Mathematical Statistics
- Vol. 20 (4), 572-579
- https://doi.org/10.1214/aoms/1177729949
Abstract
This paper deals with the following problem: Suppose a population of known size $N$ is subdivided into an unknown number of mutually exclusive classes. It is assumed that the class in which an element is contained may be determined, but that the classes are not ordered. Let us draw a random sample of $n$ elements without replacement from the population. The problem is to estimate the total number $K$ of classes which subdivide the population on the basis of the sample results and our knowledge of the population size. There is exactly one real valued statistic $S$ which is an unbiased estimate of $K$ when the sample size $n$ is not less than the maximum number $q$ of elements contained in any class. The restriction placed upon $q$ is unimportant for many practical problems where either there is a reasonably low bound for $q$ or those classes containing more than $n$ elements are known. An unbiased estimate does not exist when there is no such knowledge. Since the unbiased estimate can be very unreasonable, modifications of $S$ are considered. The statistic $T' = \big\{S' = N - \frac{N(n - 1)}{n(n - 1)} x_2,\quad\text{if} S' \geq \sum^n_{i=1} x_i,$ $\sum^n_{i=1} x_i,\quad\text{if} S' < \sum^n_{i=1} x_i,$ where $x_i$ is the number of classes containing $i$ elements in the sample, is the most suitable estimate, in comparison with three other statistics, for a hypothetical population. The case where each element in the population has an equal and independent chance of coming into the sample is used as a model for some sampling procedures and also as an approximation to the case of random sampling.