Support vector machines for spam categorization

1 January 1999

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks

Vol. 10 (5), 1048-1054
https://doi.org/10.1109/72.788645

Abstract

We study the use of support vector machines (SVM's) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM's performed best when using binary features. For both data sets, boosting trees and SVM's had acceptable test performance in terms of accuracy and speed. However, SVM's had significantly less training time.

Keywords

This publication has 6 references indexed in Scilit:

Boosting and Rocchio applied to text filtering
Published by Association for Computing Machinery (ACM) ,1998
Spam!
Communications of the ACM, 1998
Training algorithms for linear text classifiers
Published by Association for Computing Machinery (ACM) ,1996
Game theory, on-line prediction and boosting
Published by Association for Computing Machinery (ACM) ,1996
The Nature of Statistical Learning Theory
Published by Springer Nature ,1995
Fast Effective Rule Induction
Published by Elsevier ,1995

Cited by 926 articles