Robustness of regularized linear classification methods in text categorization
- 28 July 2003
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 190-197
- https://doi.org/10.1145/860435.860471
Abstract
Real-world applications often require the classification of documents under situations of small number of features, mis-labeled documents and rare positive examples. This paper investigates the robustness of three regularized linear classification methods (SVM, ridge regression and logistic regression) under above situations. We compare these methods in terms of their loss functions and score distributions, and establish the connection between their optimization problems and generalization error bounds. Several sets of controlled experiments on the Reuters-21578 corpus are conducted to investigate the robustness of these methods. Our results show that ridge regression seems to be the most promising candidate for rare class problems.Keywords
This publication has 5 references indexed in Scilit:
- Text Categorization Based on Regularized Linear Classification MethodsInformation Retrieval Journal, 2001
- A re-examination of text categorization methodsPublished by Association for Computing Machinery (ACM) ,1999
- Inductive learning algorithms and representations for text categorizationPublished by Association for Computing Machinery (ACM) ,1998
- A comparison of classifiers and document representations for the routing problemPublished by Association for Computing Machinery (ACM) ,1995
- An example-based mapping method for text categorization and retrievalACM Transactions on Information Systems, 1994