Hierarchic document classification using Ward's clustering method

1 January 1986

conference paper
Published by Association for Computing Machinery (ACM)

p. 149-156
https://doi.org/10.1145/253168.253200

Abstract

In this paper, we discuss the application of a recent hierarchic clustering algorithm to the automatic classification of files of documents. Whereas most hierarchic clustering algorithms involve the generation and updating of an inter-object dissimilarity matrix, this new algorithm is based upon a series of nearest neighbor searches. Such an approach is appropriate to several clustering methods, including Ward's method which has been shown to perform well in experimental studies of hierarchic document clustering. A description is given of heuristics which can increase the efficiency of the new algorithm when it is used to cluster three document collections by Ward's method.

Keywords

NEAREST NEIGHBOR SEARCH
HIERARCHIC DOCUMENT CLASSIFICATION
HIERARCHIC DOCUMENT CLUSTERING
RECENT HIERARCHIC
HIERARCHIC CLUSTERING ALGORITHM
INTER-OBJECT DISSIMILARITY MATRIX
EXPERIMENTAL STUDY
DOCUMENT COLLECTION
CLUSTERING METHOD
NEW ALGORITHM
AUTOMATIC CLASSIFICATION
DOCUMENT CLUSTERING
HIERARCHICAL CLUSTERING

Cited by 30 articles