Evaluating cost-sensitive Unsolicited Bulk Email categorization
- 11 March 2002
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 615-620
- https://doi.org/10.1145/508791.508911
Abstract
In the recent years, Unsolicited Bulk Email has became an increasingly important problem, with a big economic impact. In this paper, we discuss cost-sensitive Text Categorization methods for UBE filtering. In concrete, we have task (C4.5, Naive Bayes, PART. Support Vector Machines and Rocchio), made cost sensitive through several methods (Threshold Optimization, Instance Weighting, and Meta-Cost). We have used the Receiver Operating Characteristic Convex Hull method for the evaluation, that best suits classification problems in which target conditions are not known, as it is the case. Our results do not show a dominant algorithm nor method for making algorithms cost-sensitive, but are the best reported on the test collection used, and approach real-world hand-crafted classifiers accuracy.SIN FINANCIACIÓN0.213 SJR (2002) Q3, 231/333 SoftwareUEKeywords
This publication has 8 references indexed in Scilit:
- Machine learning in automated text categorizationACM Computing Surveys, 2002
- Robust Classification for Imprecise EnvironmentsMachine Learning, 2001
- An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messagesPublished by Association for Computing Machinery (ACM) ,2000
- Combining text and heuristics for cost-sensitive spam filteringPublished by Association for Computational Linguistics (ACL) ,2000
- MetaCostPublished by Association for Computing Machinery (ACM) ,1999
- Support vector machines for spam categorizationIEEE Transactions on Neural Networks, 1999
- An Evaluation of Statistical Approaches to Text CategorizationInformation Retrieval Journal, 1999
- Spam!Communications of the ACM, 1998