Robustness of regularized linear classification methods in text categorization

28 July 2003

proceedings article
Published by Association for Computing Machinery (ACM)

p. 190-197
https://doi.org/10.1145/860435.860471

Abstract

Real-world applications often require the classification of documents under situations of small number of features, mis-labeled documents and rare positive examples. This paper investigates the robustness of three regularized linear classification methods (SVM, ridge regression and logistic regression) under above situations. We compare these methods in terms of their loss functions and score distributions, and establish the connection between their optimization problems and generalization error bounds. Several sets of controlled experiments on the Reuters-21578 corpus are conducted to investigate the robustness of these methods. Our results show that ridge regression seems to be the most promising candidate for rare class problems.

Keywords

This publication has 5 references indexed in Scilit:

Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval Journal, 2001
A re-examination of text categorization methods
Published by Association for Computing Machinery (ACM) ,1999
Inductive learning algorithms and representations for text categorization
Published by Association for Computing Machinery (ACM) ,1998
A comparison of classifiers and document representations for the routing problem
Published by Association for Computing Machinery (ACM) ,1995
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems, 1994

Cited by 38 articles