Support vector machines for spam categorization
- 1 January 1999
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks
- Vol. 10 (5), 1048-1054
- https://doi.org/10.1109/72.788645
Abstract
We study the use of support vector machines (SVM's) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM's performed best when using binary features. For both data sets, boosting trees and SVM's had acceptable test performance in terms of accuracy and speed. However, SVM's had significantly less training time.Keywords
This publication has 6 references indexed in Scilit:
- Boosting and Rocchio applied to text filteringPublished by Association for Computing Machinery (ACM) ,1998
- Spam!Communications of the ACM, 1998
- Training algorithms for linear text classifiersPublished by Association for Computing Machinery (ACM) ,1996
- Game theory, on-line prediction and boostingPublished by Association for Computing Machinery (ACM) ,1996
- The Nature of Statistical Learning TheoryPublished by Springer Nature ,1995
- Fast Effective Rule InductionPublished by Elsevier ,1995