Abstract
In this paper, we introduce a novel feature selection method—INTERACT to select relevant words of emails for spam email filtering, i.e. classifying an email as spam or legitimate. Four traditional feature selection methods in text categorization domain, Information Gain, Gain Ratio, Chi Squared, and ReliefF, are also used for performance comparison. Three classifiers, Support Vector Machine (SVM), Naïve Bayes and a novel classifier—Locally Weighted learning with Naïve Bayes (LWNB) are discussed in this paper. Four popular datasets are employed as the benchmark corpora in our experiments to examine the capabilities of these five feature selection methods and the three classifiers. In our simulations, we discover that the LWNB improves the Naïve Bayes and gain higher prediction results by learning local models, and its performance is sometimes better than that of the SVM. Our study also shows the INTERACT can result in better performances of classifiers than the other four traditional methods for the spam email filtering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Frank, E., Hall, M., Pfahringer, B.: Locally Weighted Naive Bayes. In: Proc. of the Conference on Uncertainty in Artificial Intelligence, pp. 249–256 (2003)
Zhao, Z., Liu, H.: Searching for Interacting Features. In: Proc. of International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, pp. 1156–1161 (2007)
CAUBE.AU (2006), http://www.caube.org.au/spamstats.html
Cranor, L.F., LaMacchia, B.A.: Spam! In: Communications of ACM, pp. 74–83. ACM Press, New York (1998)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-mail. AAAI Technical Report WS-98-05, AAAI 1998 Workshop on Learning for Text Categorization (1998)
Schneider, K.M.: A Comparison of Event Models for Naïve Bayes Anti-Spam E-Mail Filtering. In: Proc. of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, pp. 307–314 (2003)
Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to Filter Spam E-mail: A Comparison of a Naïve Bayesian and a Memory-based Approach. In: Proc. of the Workshop on Machine Learning and Textual Information Access, pp. 1–13 (2000)
Zhang, L., Zhu, J., Yao, T.: An Evaluation of Statistical Spam Filtering Techniques. ACM Trans. Asian Lang. Inf. Process 3, 243–269 (2004)
Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Trans. on Neural Networks 10, 1048–1054 (1999)
Kolcz, A., Alspector, J.: SVM-based Filtering of E-mail Spam with Content-specific Misclassification Costs. In: Proc. of the TextDM 2001 Workshop on Text Mining - held at the 2001 IEEE International Conference on Data Mining (2001)
Sakkis, G., Androutsopoulos, I., Paliouras, G., Stamatopoulos, P.: A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists. Information Retrieval 6, 49–73 (2003)
Yu, L., Liu, H.: Feature Selection for High-dimensional Data: A Fast Correlation-based Filter Solution. In: Proc. of the 20th International Conference on Machine Learning, Washington DC, pp. 856–863 (2003)
Carreras, X., Marquez, L.: Boosting Trees for Anti-spam Email Filtering. In: Proc. Inter-national Conference on Recent Advances in Natural Language Processing (RANLP 2001), Tzigov Chark, Bulgaria, pp. 58–64 (2001)
Méndez, J.R., Iglesias, E.L., Fdez-Riverola, F., Díaz, F., Corchado, J.M.: Analyzing the Impact of Corpus Preprocessing on Anti-Spam Filtering Software. Research on Computing Science 17, 129–138 (2005)
Méndez, J.R., Fdez-Riverola, F., Díaz, F., Iglesias, E.L., Corchado, J.M.: A Comparative Performance Study of Feature Selection Methods for the Anti-spam Filtering Domain. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 106–120. Springer, Heidelberg (2006)
Email Benchmark Corpus (2006), http://www.aueb.gr/users/ion/publications.html
Kononenko, I.: Estimating Attributes: Analysis and Extensions of Relief. In: Proc. of European Conference on Machine Learning, pp. 171–182. Springer, Heidelberg (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, C., Gong, Y., Bie, R., Gao, X. (2008). Searching for Interacting Features for Spam Filtering. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87732-5_55
Download citation
DOI: https://doi.org/10.1007/978-3-540-87732-5_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87731-8
Online ISBN: 978-3-540-87732-5
eBook Packages: Computer ScienceComputer Science (R0)