Current approaches to classification and clump-finding at the Cambridge Language Research Unit

Abstract
Computer programs for automatic classification are a desideratum in many fields. Work on suitable procedures for handling large bodies of object/property descriptions has been in progress at the Cambridge Language Research Unit for some years: this paper describes the current series of general-purpose programs which have been developed there, in which classes or “clumps” of objects are obtained, using a similarity matrix, by a simple iterative scan of the universe of objects, distributing them in such a way that an appropriate cohesion function is minimized. This actual clump-finding process is embedded in an overall package in which the information given by a classification is manipulated in a variety of ways. The current applications of the programs, especially for information retrieval, are described.