Searching for Interacting Features for Spam Filtering

Chen, Chuanliang; Gong, Yunchao; Bie, Rongfang; Gao, Xiaozhi

doi:10.1007/978-3-540-87732-5_55

Chuanliang Chen⁶,
Yunchao Gong⁷,
Rongfang Bie⁶ &
…
Xiaozhi Gao⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5263))

Included in the following conference series:

International Symposium on Neural Networks

3091 Accesses
2 Citations

Abstract

In this paper, we introduce a novel feature selection method—INTERACT to select relevant words of emails for spam email filtering, i.e. classifying an email as spam or legitimate. Four traditional feature selection methods in text categorization domain, Information Gain, Gain Ratio, Chi Squared, and ReliefF, are also used for performance comparison. Three classifiers, Support Vector Machine (SVM), Naïve Bayes and a novel classifier—Locally Weighted learning with Naïve Bayes (LWNB) are discussed in this paper. Four popular datasets are employed as the benchmark corpora in our experiments to examine the capabilities of these five feature selection methods and the three classifiers. In our simulations, we discover that the LWNB improves the Naïve Bayes and gain higher prediction results by learning local models, and its performance is sometimes better than that of the SVM. Our study also shows the INTERACT can result in better performances of classifiers than the other four traditional methods for the spam email filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Supervised Machine Learning Classifier for Email Spam Filtering

Feature Selection Techniques for Email Spam Classification: A Survey

SVM-Based Feature Selection and Classification for Email Filtering

References

Frank, E., Hall, M., Pfahringer, B.: Locally Weighted Naive Bayes. In: Proc. of the Conference on Uncertainty in Artificial Intelligence, pp. 249–256 (2003)
Google Scholar
Zhao, Z., Liu, H.: Searching for Interacting Features. In: Proc. of International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, pp. 1156–1161 (2007)
Google Scholar
CAUBE.AU (2006), http://www.caube.org.au/spamstats.html
Cranor, L.F., LaMacchia, B.A.: Spam! In: Communications of ACM, pp. 74–83. ACM Press, New York (1998)
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-mail. AAAI Technical Report WS-98-05, AAAI 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Schneider, K.M.: A Comparison of Event Models for Naïve Bayes Anti-Spam E-Mail Filtering. In: Proc. of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, pp. 307–314 (2003)
Google Scholar
Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to Filter Spam E-mail: A Comparison of a Naïve Bayesian and a Memory-based Approach. In: Proc. of the Workshop on Machine Learning and Textual Information Access, pp. 1–13 (2000)
Google Scholar
Zhang, L., Zhu, J., Yao, T.: An Evaluation of Statistical Spam Filtering Techniques. ACM Trans. Asian Lang. Inf. Process 3, 243–269 (2004)
Article Google Scholar
Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Trans. on Neural Networks 10, 1048–1054 (1999)
Article Google Scholar
Kolcz, A., Alspector, J.: SVM-based Filtering of E-mail Spam with Content-specific Misclassification Costs. In: Proc. of the TextDM 2001 Workshop on Text Mining - held at the 2001 IEEE International Conference on Data Mining (2001)
Google Scholar
Sakkis, G., Androutsopoulos, I., Paliouras, G., Stamatopoulos, P.: A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists. Information Retrieval 6, 49–73 (2003)
Article Google Scholar
Yu, L., Liu, H.: Feature Selection for High-dimensional Data: A Fast Correlation-based Filter Solution. In: Proc. of the 20th International Conference on Machine Learning, Washington DC, pp. 856–863 (2003)
Google Scholar
Carreras, X., Marquez, L.: Boosting Trees for Anti-spam Email Filtering. In: Proc. Inter-national Conference on Recent Advances in Natural Language Processing (RANLP 2001), Tzigov Chark, Bulgaria, pp. 58–64 (2001)
Google Scholar
Méndez, J.R., Iglesias, E.L., Fdez-Riverola, F., Díaz, F., Corchado, J.M.: Analyzing the Impact of Corpus Preprocessing on Anti-Spam Filtering Software. Research on Computing Science 17, 129–138 (2005)
Google Scholar
Méndez, J.R., Fdez-Riverola, F., Díaz, F., Iglesias, E.L., Corchado, J.M.: A Comparative Performance Study of Feature Selection Methods for the Anti-spam Filtering Domain. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 106–120. Springer, Heidelberg (2006)
Google Scholar
Email Benchmark Corpus (2006), http://www.aueb.gr/users/ion/publications.html
Kononenko, I.: Estimating Attributes: Analysis and Extensions of Relief. In: Proc. of European Conference on Machine Learning, pp. 171–182. Springer, Heidelberg (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Beijing Normal University, Beijing, 100875, China
Chuanliang Chen & Rongfang Bie
Software Institute, Nanjing University, Nanjing, China
Yunchao Gong
Department of Electrical Engineering, Helsinki University of Technology, Otakaari 5 A, 02150, Espoo, Finland
Xiaozhi Gao

Authors

Chuanliang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yunchao Gong
View author publications
You can also search for this author in PubMed Google Scholar
Rongfang Bie
View author publications
You can also search for this author in PubMed Google Scholar
Xiaozhi Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghu University, 100084, Beijing, China
Fuchun Sun
Institute TAMS (Technical Aspects of Multimodal Systems), department of Informatics, University of Hamburg, Vogt-Koelln-Straße 30, 22527, Hamburg, Germany
Jianwei Zhang
Intel China Research Center, 8/F, Peking University, Department of Machine Intelligence, 100871, Beijing, China
Ying Tan
Department of Mathematics, Southeast University, 210096, Nanjing, China
Jinde Cao
Departamento de Control Automático, CINVESTAV-IPN, A.P. 14-740, Av.IPN 2508, 07360, México D.F., México
Wen Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, C., Gong, Y., Bie, R., Gao, X. (2008). Searching for Interacting Features for Spam Filtering. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87732-5_55

Download citation

DOI: https://doi.org/10.1007/978-3-540-87732-5_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87731-8
Online ISBN: 978-3-540-87732-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Searching for Interacting Features for Spam Filtering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Supervised Machine Learning Classifier for Email Spam Filtering

Feature Selection Techniques for Email Spam Classification: A Survey

SVM-Based Feature Selection and Classification for Email Filtering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Searching for Interacting Features for Spam Filtering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Supervised Machine Learning Classifier for Email Spam Filtering

Feature Selection Techniques for Email Spam Classification: A Survey

SVM-Based Feature Selection and Classification for Email Filtering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation