Clustering a large number of compounds. 3. The limits of classification

Abstract
Clustering is normally used to group items that are similar. In this application of obtaining a diverse sample from the 230,000 compounds in the National Cancer Institute Repository, we cluster to select compounds that are different from the rest, to optimize screening for new leads. With these constraints, our approach yielded many singleton clusters. We can interpret these results as evidence for a limit to classification, contrary to the customary view of chemistry as a study of classes of compounds.