Identification of Malicious URLs: A Purely Lexical Approach

Rodrigues, Julio; Barros, Charles de; Dias, Diego; Guimarães, Marcelo de Paiva; Tuler, Elisa; Rocha, Leonardo

doi:10.1007/978-3-031-64608-9_26

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14814))

Included in the following conference series:

International Conference on Computational Science and Its Applications

310 Accesses

Abstract

Internet users are increasingly exposed to security vulnerabilities stemming from malicious Uniform Resource Locators (URLs), which act as conduits for cyber threats. These threats, often orchestrated by sophisticated cybercriminals, underscore the importance of comprehending the intricate dynamics involved to devise robust defense mechanisms. This scholarly exposition delineates an efficacious approach for discerning diverse categories of malicious URLs leveraging machine learning algorithms. Notably, our methodology obviates the necessity of directly accessing such URLs for extracting pertinent information, relying solely on attributes inherent within the lexical composition of the URLs. The empirical analyses are predicated on meticulously curated datasets from reputable repositories such as Kaggle and PhishTank, culminating in competitive performance vis-à-vis existing literature that predominantly focuses on network-centric or content-based features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Detecting Malicious URLs Using Lexical Analysis

Machine Learning-Based Phishing Detection Using URL Features: A Comprehensive Review

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

Notes

1.
More information at: https://www.python.org/.
2.
More information at: https://scikit-learn.org/stable/.
3.
Available at: https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset.
4.
Available at: https://phishtank.org/phish_archive.php.

References

Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. CoRR abs/1106.1813 (2011). http://arxiv.org/abs/1106.1813
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. KDD ’16, Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939785, https://doi.org/10.1145/2939672.2939785
Fix, E., Hodges, J.: Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. USAF School of Aviation Medicine (1951). https://books.google.com.br/books?id=4XwytAEACAAJ

Download references

Acknowledgements

This study received partial financial support from AWS, CNPq, CAPES, FINEP, and Fapemig.

Author information

Authors and Affiliations

Universidade Federal de São João Del Rei (UFSJ), São João Del Rei, Brazil
Julio Rodrigues, Charles de Barros, Elisa Tuler & Leonardo Rocha
Universidade Federal do Espírito Santo (UFES), Vitória, Brazil
Diego Dias
Universidade Federal de São Paulo (UNIFESP), São Paulo, Brazil
Marcelo de Paiva Guimarães

Authors

Julio Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Charles de Barros
View author publications
You can also search for this author in PubMed Google Scholar
Diego Dias
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo de Paiva Guimarães
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Tuler
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Rocha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego Dias .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
School of Engineering, University of Basilicata, Potenza, Italy
Beniamino Murgante
Department of Civil and Environmental Engineering and Architecture, University of Cagliari, Cagliari, Italy
Chiara Garau
Faculty of Information Technology, Monash University, Clayton, VIC, Australia
David Taniar
Algoritmi Research Centre, University of Minho, Braga, Portugal
Ana Maria A. C. Rocha
Department of Chemistry, Biology and Biotechnology, University of Perugia, Perugia, Italy
Maria Noelia Faginas Lago

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodrigues, J., Barros, C.d., Dias, D., Guimarães, M.d.P., Tuler, E., Rocha, L. (2024). Identification of Malicious URLs: A Purely Lexical Approach. In: Gervasi, O., Murgante, B., Garau, C., Taniar, D., C. Rocha, A.M.A., Faginas Lago, M.N. (eds) Computational Science and Its Applications – ICCSA 2024. ICCSA 2024. Lecture Notes in Computer Science, vol 14814. Springer, Cham. https://doi.org/10.1007/978-3-031-64608-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-64608-9_26
Published: 02 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64607-2
Online ISBN: 978-3-031-64608-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Identification of Malicious URLs: A Purely Lexical Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Detecting Malicious URLs Using Lexical Analysis

Machine Learning-Based Phishing Detection Using URL Features: A Comprehensive Review

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Identification of Malicious URLs: A Purely Lexical Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Detecting Malicious URLs Using Lexical Analysis

Machine Learning-Based Phishing Detection Using URL Features: A Comprehensive Review

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation