A SOM-based document clustering using phrases
- 5 June 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 5, 2212-2216 vol.5
- https://doi.org/10.1109/iconip.2002.1201886
Abstract
Most of the existing techniques for document clustering rely on a "bag of words" document representation. Each word in the document is considered as a separate feature, ignoring the word order. We investigate the use of phrases rather than words as document features for the document clustering. We present a phrase grammar extraction technique, and use the extracted phrases as the features in a self-organizing map based document clustering algorithm. We present clustering results using the REUTERS corpus and show an improvement in clustering performance using both entropy and F-measure evaluation measures.Keywords
This publication has 8 references indexed in Scilit:
- Extraction of Text Phrases Using Hierarchical GrammarLecture Notes in Computer Science, 2002
- Document organization using Kohonen's algorithmInformation Processing & Management, 2002
- Context-sensitive learning methods for text categorizationACM Transactions on Information Systems, 1999
- Text classification with self-organizing maps: Some lessons learnedNeurocomputing, 1998
- WEBSOM – Self-organizing maps of document collectionsNeurocomputing, 1998
- Web document clusteringPublished by Association for Computing Machinery (ACM) ,1998
- The self-organizing mapProceedings of the IEEE, 1990
- A vector space model for automatic indexingCommunications of the ACM, 1975