On feature distributional clustering for text categorization

1 September 2001

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 146-153
https://doi.org/10.1145/383952.383976

Abstract

We describe a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier. Our feature selection approach employs distributional clustering of words via the recently introducedinformation bottleneck method, which generates a more efficientword-clusterrepresentation of documents. Combined with the classification power of an SVM, this method yields high performance text categorization that can outperform other recent methods in terms of categorization accuracy and representation efficiency. Comparing the accuracy of our method with other techniques, we observe significant dependency of the results on the data set. We discuss the potential reasons for this dependency.

Keywords

This publication has 9 references indexed in Scilit:

Elements of Information Theory
Published by Wiley ,2001
BoosTexter: A Boosting-based System for Text Categorization
Machine Learning, 2000
Inductive learning algorithms and representations for text categorization
Published by Association for Computing Machinery (ACM) ,1998
Deterministic annealing for clustering, compression, classification, regression, and related optimization problems
Proceedings of the IEEE, 1998
Distributional clustering of words for text classification
Published by Association for Computing Machinery (ACM) ,1998
The Nature of Statistical Learning Theory
Published by Springer Science and Business Media LLC ,1995
Support-Vector Networks
Machine Learning, 1995
Distributional clustering of English words
Published by Association for Computational Linguistics (ACL) ,1993
Joining statistics with NLP for text categorization
Published by Association for Computational Linguistics (ACL) ,1992

Cited by 73 articles