Abstract
The domain name system (DNS) protocol has been used for over three decades. It plays a vital role in the functioning of the Internet by facilitating the conversion of domain names into IP addresses. However, DNS is an early and vulnerable network protocol that attackers frequently target due to its numerous security flaws. To address these security concerns, several improvements have been introduced over time. The most recent enhancement is DNS over HTTPS (DoH), which aims to enhance user privacy and security by safeguarding DNS requests and responses from eavesdropping and data manipulation. Nevertheless, DoH encounters several security and privacy issues, such as encrypted traffic hindering network administrators from inspecting DNS packets for Malicious activity. Consequently, this raises concerns regarding potential security breaches and increased risk. Identification and characterizing Malicious behavior of DoH network traffic helps mitigate these threats. To tackle these issues, this research proposes two statistical pattern recognition models based on logistic and linear regression. These proposed models aim to identify the profile of Malicious DoH network traffic behavior by recognizing data patterns. In this order, we proposed two models consisting of two primary stages: data preprocessing, which involves data preparation and the selection of optimal feature sets, and pattern recognition, in which the most suitable pattern is selected and used for data classification. We also presented the obtained Malicious DoH profile utilizing the correlation coefficients between the features. To assess the effectiveness of the proposed approaches, the CIRA-CIC-DoHBrw-2020 dataset is utilized, and a comparison is made against state-of-the-art machine learning and deep learning models. Experimental results indicate that the logistic regression-based model outperformed linear regression-based. Moreover, while the outcomes indicated that the effectiveness of the linear and logistic regression-derived models was lower than particular machine learning and deep learning models, our models employed a smaller set of features than earlier research endeavors. Furthermore, our proposed models offer several advantages over previous models, including low computational complexity, simple implementation, robustness to noise, and reduced data requirements This study is the first to use basic statistical models (linear and logistic regression) to profile Malicious behavior in DoH network traffic.
Similar content being viewed by others
Data Availability
Data sharing is not applicable to this article as no datasets were generated.
References
Böttger, T., Cuadrado, F., Antichi, G., Fernandes, E.L.a., Tyson, G., Castro, I., Uhlig, S.: An empirical study of the cost of DNS-over-https. In Proceedings of the Internet Measurement Conference, IMC ’19, (New York, NY, USA), pp. 15–21, Association for Computing Machinery, 2019
Hoffman, P., McManus, P.: RFC 8484: DNS queries over https (DOH) (2018)
Žiža, K., Tadić, P., Vuletić, P.: DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour. International Journal of Information Security, pp. 1–16 (2023)
Chiba, D., Yagi, T., Akiyama, M., Shibahara, T., Mori, T., Goto, S.: Domainprofiler: toward accurate and early discovery of domain names abused in future. Int. J. Inf. Secur. 17, 661–680 (2018)
Ma, X., Zhang, J., Li, Z., Li, J., Tao, J., Guan, X., Lui, J.C., Towsley, D.: Accurate DNS query characteristics estimation via active probing. J. Netw. Comput. Appl. 47, 72–84 (2015)
García-Dorado, J.L., Ramos, J., Rodríguez, M., Aracil, J.: Dns weighted footprints for web browsing analytics. J. Netw. Comput. Appl. 111, 35–48 (2018)
Brown, C.W., Jenkins, M.: Analyzing proposals for improving authentication on the TLS-/SSL-protected web. Int. J. Inf. Secur. 15, 621–635 (2016)
Stevanovic, M., Pedersen, J.M., D’Alconzo, A., Ruehrup, S.: A method for identifying compromised clients based on DNS traffic analysis. Int. J. Inf. Secur. 16, 115–132 (2017)
Chakravarty, S., Portokalidis, G., Polychronakis, M., Keromytis, A.D.: Detection and analysis of eavesdropping in anonymous communication networks. Int. J. Inf. Secur. 14, 205–220 (2015)
Trostle, J., Gossman, B.: Techniques for improving the security and manageability of IPSEC policy. Int. J. Inf. Secur. 4, 209–226 (2005)
Hu, Z., Zhu, J., Heidemann, L., Mankin, A., Wessels, D., Hoffman, P.E.: Specification for DNS over Transport Layer Security (TLS). RFC 7858 (2016)
Hoffman, P.E., McManus, P.: DNS Queries Over HTTPS (DoH), RFC 8484 (2018). https://www.rfc-editor.org/info/rfc8484, Accessed on 2023-04-10
Hrushak, S., Pavlenko, C.: Advantages of DNS-over-https over DNS. In Computer and Information Systems and Technologies (2020)
Bumanglag, K., Kettani, H.: On the impact of DNS over https paradigm on cyber systems. In 2020 3rd International Conference on Information and Computer Technologies (ICICT), pp. 494–499 (2020)
Huang, Q., Chang, D., Li, Z.: A comprehensive study of DNS-over-https downgrade attack, 08 2020 (2020)
Hounsel, A., Borgolte, K., Schmitt, P., Holland, J., Feamster, N.: Comparing the effects of DNS, DoT, and DoH on web performance, pp. 562–572, 04 2020 (2020)
Kosek, M., Schumann, L., Marx, R., Doan, T.V., Bajpai, V.: DNS privacy with speed? Evaluating DNS over QUIC and its impact on web performance. In Proceedings of the 22nd ACM Internet Measurement Conference, pp. 44–50 (2022)
Hynek, K., Vekshin, D., Luxemburk, J., Cejka, T., Wasicek, A.: Summary of DNS over https abuse. IEEE Access 10, 54668–54680 (2022)
Zebin, T., Rezvy, S., Luo, Y.: An explainable AI-based intrusion detection system for DNS over https (DoH) attacks. IEEE Trans. Inf. Forensics Secur. 17, 2339–2349 (2022)
Ahmed, M., Naser Mahmood, A., Hu, J.: A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 60, 19–31 (2016)
Han, W., Xue, J., Wang, Y., Liu, Z., Kong, Z.: MalInsight: a systematic profiling based malware detection framework. J. Netw. Comput. Appl. 125, 236–250 (2019)
Mohanty, H., Roudsari, A.H., Lashkari, A.H.: Robust stacking ensemble model for darknet traffic classification under adversarial settings. Comput. Secur. 120, 102830 (2022)
Liao, H.-J., Richard Lin, C.-H., Lin, Y.-C., Tung, K.-Y.: Intrusion detection system: a comprehensive review. J. Netw. Comput. Appl. 36(1), 16–24 (2013)
Aslan, M.A., Samet, R.: A comprehensive review on malware detection approaches. IEEE Access 8, 6249–6271 (2020)
Alazab, M.: Profiling and classifying the behavior of malicious codes. J. Syst. Softw. 100, 91–102 (2015)
Cheng, D., Liu, Z., Zhang, P., Zeng, Y., Cui, J., Kong, L.: Profiling malicious domain by multidimensional features. In 2018 International Conference on Robots & Intelligent System (ICRIS), pp. 489–495 (2018)
Xiao, G., Li, J., Chen, Y., Li, K.: Malfcs: an effective malware classification framework with automated feature extraction based on deep convolutional neural networks. J. Parallel Distrib. Comput. 141, 49–58 (2020)
Bendiab, G., Shiaeles, S., Alruban, A., Kolokotronis, N.: IoT malware network traffic classification using visual representation and deep learning. In 2020 6th IEEE Conference on Network Softwarization (NetSoft), pp. 444–449 (2020)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg (2006)
Tibshirani, R., Hastie, T., Friedman, J.: The elements of statistical learning: Data Mining, Inference, and Prediction. Springer series in statistics, Springer (2001)
Alzubaidi, L., Zhang, J., Humaidi, A.J., Al-Dujaili, A.Q., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M.A., Al-Amidie, M., Farhan, L.: Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data 8, 1–74 (2021)
Singh, S.K., Roy, P.K.: Malicious traffic detection of DNS over https using ensemble machine learning. Int. J. Comput. Digit. Syst. 11(1), 189–197 (2022)
Patsakis, C., Casino, F., Katos, V.: Encrypted and covert DNS queries for botnets: challenges and countermeasures. Comput. Secur. 88, 101614 (2020)
Alenezi, R., Ludwig, S.A.: Classifying DNS tunneling tools for malicious DoH traffic. In 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–9, IEEE (2021)
Mitsuhashi, R., Jin, Y., Iida, K., Shinagawa, T., Takai, Y.: Detection of DGA-based malware communications from DoH traffic using machine learning analysis. In 2023 IEEE 20th Consumer Communications & Networking Conference (CCNC), pp. 224–229 (2023)
Jerabek, K., Hynek, K., Rysavy, O., Burgetova, I.: DNS over https detection using standard flow telemetry. IEEE Access 11, 50000–50012 (2023)
Vekshin, D., Hynek, K., Cejka, T.: DoH insight: Detecting DNS over https by machine learning. In Proceedings of the 15th International Conference on Availability, Reliability and Security, ARES ’20, (New York, NY, USA), Association for Computing Machinery (2020)
Bushart, J., Rossow, C.: Padding ain’t enough: assessing the privacy guarantees of encrypted DNS. CoRR (2019) arxiv:1907.01317
Singh, S.K., Roy, P.K.: Detecting malicious DNS over https traffic using machine learning. In 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), pp. 1–6 (2020)
MontazeriShatoori, M., Davidson, L., Kaur, G., Habibi Lashkari, A.: Detection of DoH tunnels using time-series classification of encrypted traffic. In 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 63–70 (2020)
Banadaki, Y.M.: Detecting malicious DNS over https traffic in domain name system using machine learning classifiers. J. Comput. Sci. Appl. 8(2), 46–55 (2020)
Wu, J., Zhu, Y., Li, B., Liu, Q., Fang, B.: Peek inside the encrypted world: Autoencoder-based detection of doh resolvers. In 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 783–790 (2021)
Gonzalez Casanova, L.F., Lin, P.-C.: Generalized classification of DNS over https traffic with deep learning. In 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1903–1907, (2021)
Chen, S., Lang, B., Liu, H., Li, D., Gao, C.: DNS covert channel detection method using the LSTM model. Comput. Secur. 104, 102095 (2021)
Zhan, M., Li, Y., Yu, G., Li, B., Wang, W.: Detecting DNS over https based data exfiltration. Comput. Netw. 209, 108919 (2022)
Wang, W., Zhu, M., Wang, J., Zeng, X., Yang, Z.: End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 43–48 (2017)
Le, H., Pham, Q., Sahoo, D., Hoi, S.C.H.: Urlnet: Learning a url representation with deep learning for malicious url detection (2018)
Liu, C., Dai, L., Cui, W., Lin, T.: A byte-level CNN method to detect DNS tunnels. In 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC), pp. 1–8 (2019)
Mitsuhashi, R., Satoh, A., Jin, Y., Iida, K., Shinagawa, T., Takai, Y.: Identifying malicious DNS tunnel tools from DoH traffic using hierarchical machine learning classification. In Information Security: 24th International Conference, ISC 2021, Virtual Event, November 10–12, 2021, Proceedings, (Berlin, Heidelberg), pp. 238–256, Springer-Verlag (2021)
Moustafa, N., Turnbull, B., Choo, K.-K.R.: An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of internet of things. IEEE Internet Things J. 6(3), 4815–4830 (2019)
Liu, X., You, J., Wu, Y., Li, T., Li, L., Zhang, Z., Ge, J.: Attention-based bidirectional GRU networks for efficient https traffic classification. Inf. Sci. 541, 297–315 (2020)
Wang, Y., Shen, C., Hou, D., Xiong, X., Li, Y.: FF-MR: a DoH-encrypted DNS covert channel detection method based on feature fusion. Appl. Sci. 12(24), 12644 (2022)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, (Red Hook, NY, USA), pp. 6000–6010, Curran Associates Inc, (2017)
Ding, S., Zhang, D., Ge, J., Yuan, X., Du, X.: Encrypt DNS traffic: automated feature learning method for detecting dns tunnels. In 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), pp. 352–359 (2021)
Du, X., Liu, D., Ding, S., Liu, Z., Yuan, X., Li, T., Deng, H.: Design of an autoencoder-based anomaly detection for the doh traffic system. In 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 763–768 (2022)
Nguyen, T.T.M., Nguyen, D.S., Tong, V., Tran, D., Tran, H.A., Mellouk, A.: Mining frequent patterns for scalable and accurate malware detection system in android. In 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 370–375 (2018)
Han, J., Kamber, M., Pei, J.: Data mining trends and research frontiers. In: Han, J., Kamber, M., Pei, J. (eds.) Data Mining. The Morgan Kaufmann Series in Data Management Systems, 3rd edn., pp. 585–631. Morgan Kaufmann, Boston (2012)
Malhotra, A., Sankaran, A., Mittal, A., Vatsa, M., Singh, R.: Fingerphoto authentication using smartphone camera captured under varying environmental conditions. In: De Marsico, M., Nappi, M., Proença, H. (eds.) Human Recognition in Unconstrained Environments, pp. 119–144. Academic Press, New York (2017)
Parasher, M., Sharma, S., Sharma, A., Gupta, J.: Anatomy on pattern recognition. Indian J. Comput. Sci. Eng. 2, 06 (2011)
Fan, Y., Ye, Y., Chen, L.: Malicious sequential pattern mining for automatic malware detection. Expert Syst. Appl. 52, 16–25 (2016)
Nawaz, M.S., Fournier-Viger, P., Nawaz, M.Z., Chen, G., Wu, Y.: MalSPM: metamorphic malware behavior analysis and classification using sequential pattern mining. Comput. Secur. 118, 102741 (2022)
Dass, R.: Pattern recognition techniques: a review (2018)
Tao, G., Zheng, Z., Guo, Z., Lyu, M.R.: Malpat: mining patterns of malicious and benign android apps via permission-related APIS. IEEE Trans. Reliab. 67(1), 355–369 (2018)
Roseline, S.A., Sasisri, A.D., Geetha, S., Balasubramanian, C.: Towards efficient malware detection and classification using multilayered random forest ensemble technique. In 2019 International Carnahan Conference on Security Technology (ICCST), pp. 1–6 (2019)
Liu, Y.-S., Lai, Y.-K., Wang, Z.-H., Yan, H.-B.: A new learning approach to malware classification using discriminative feature extraction. IEEE Access 7, 13015–13023 (2019)
Kakisim, A.G., Nar, M., Sogukpinar, I.: Metamorphic malware identification using engine-specific patterns based on co-opcode graphs. Comput. Standards Interfaces 71, 103443 (2020)
Theodoridis, S., Koutroumbas, K.: Template matching. In: Theodoridis, S., Koutroumbas, K. (eds.) Pattern Recognition, 4th edn., pp. 481–519. Academic Press, Boston (2009)
Taha, B., Varol, C.: Pattern matching based malware identification. Int. J. Sci. Eng. Res. 11, 1375–1381 (2020)
Fuzzy Pattern Recognition, pp. 125–138. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005
Dovom, E.M., Azmoodeh, A., Dehghantanha, A., Newton, D.E., Parizi, R.M., Karimipour, H.: Fuzzy pattern tree for edge malware detection and categorization in IoT. J. Syst. Architect. 97, 1–7 (2019)
Paul, S., Madhumita: Pattern recognition algorithms for multi-omics data analysis. In: Wolkenhauer, O. (ed.) Systems Medicine, pp. 141–158. Academic Press, Oxford (2021)
Yoo, S., Kim, S., Kim, S., Kang, B.B.: Ai-hydra: Advanced hybrid approach using random forest and deep learning for malware classification. Inf. Sci. 546, 420–435 (2021)
Jerbi, M., Dagdia, Z.C., Bechikh, S., Said, L.B.: On the use of artificial malicious patterns for android malware detection. Comput. Secur. 92, 101743 (2020)
Forthofer, R.N., Lee, E.S., Hernandez, M.: Linear regression. In: Forthofer, R.N., Lee, E.S., Hernandez, M. (eds.) Biostatistics, 2nd edn., pp. 349–386. Academic Press, San Diego (2007)
Jurafsky, D., Martin, J.: Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Book Rev. 2, 1–4 (2008)
Widodo, A., Handoyo, S.: The classification performance using logistic regression and support vector machine (SVM). J. Theor. Appl. Inf. Technol. 95, 10 (2017)
Dodge, Y.: The Concise Encyclopedia of Statistics. Springer, Cham (2008)
Everitt, B.: The Cambridge Dictionary of Statistics. Cambridge University Press, Cambridge (2002)
Thabtah, F., Hammoud, S., Kamalov, F., Gonsalves, A.: Data imbalance in classification: experimental evaluation. Inf. Sci. 513, 429–441 (2020)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Seabold, S., Perktold, J.: Statsmodels: econometric and statistical modeling with python. In 9th Python in Science Conference (2010)
Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)
Pudjihartono, N., Fadason, T., Kempa-Liehr, A., O’Sullivan, J.: A review of feature selection methods for machine learning-based disease risk prediction. Front. Bioinformat. 2, 927312 (2022)
Liang, Y., Zhang, S., Qiao, H., Yao, Y.: iPromoter-ET: identifying promoters and their strength by extremely randomized trees-based feature selection. Anal. Biochem. 630, 114335 (2021)
Mochammad, S., Kang, Y.-J., Noh, Y., Park, S., Ahn, B.: Stable hybrid feature selection method for compressor fault diagnosis. IEEE Access 9, 97415–97429 (2021)
Kumar, P., Singh, S., Dawra, S.: Software component reusability prediction using extra tree classifier and enhanced Harris Hawks optimization algorithm. Int. J. Syst. Assur. Eng. Manag. 13, 09 (2021)
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. 50, 1–45 (2017)
Senaviratna, N.A.M.R., Cooray, T.M.J.A.: Diagnosing multicollinearity of logistic regression model. Asian J. Probab. Stat. 5(2), 1–9 (2019)
Mzrak, A., Savage, S., Marzullo, K.: Detecting malicious packet losses. IEEE Trans. Parallel Distrib. Syst. 20, 191–206 (2009)
Jerabek, K., Rysavy, O., Burgetova, I.: Measurement and characterization of DNS over https traffic (2022)
Wang, Y., Zhou, A., Liao, S., Zheng, R., Hu, R., Zhang, L.: A comprehensive survey on DNS tunnel detection. Comput. Netw. 197, 108322 (2021)
Vekshin, D., Hynek, K., Cejka, T.: DoH insight: detecting DNS over https by machine learning. In Proceedings of the 15th International Conference on Availability, Reliability and Security, (New York, NY, USA), Association for Computing Machinery (2020)
Akoglu, H.: User’s guide to correlation coefficients. Turk. J. Emerg. Med. 18(3), 91–93 (2018). https://doi.org/10.1016/j.tjem.2018.08.001
Gregorich, M., Strohmaier, S., Dunkler, D., Heinze, G.: Regression with highly correlated predictors: variable omission is not the solution. Int. J. Environ. Res. Public Health 18, 4259 (2021)
Hauke, J., Kossowski, T.: Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaest. Geogr. 30, 87–93 (2011)
Schober, P., Boer, C., Schwarte, L.A.: Correlation coefficients: appropriate use and interpretation. Anesthesia Analgesia 126, 1763–1768 (2018)
Acknowledgements
The authors acknowledge the Natural Sciences and Engineering Research Council grant from Canada - NSERC (#RGPIN-2020-04701) to Arash Habibi Lashkari.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Niktabe, S., Lashkari, A.H. & Sharma, D.P. Detection, characterization, and profiling DoH Malicious traffic using statistical pattern recognition. Int. J. Inf. Secur. 23, 1293–1316 (2024). https://doi.org/10.1007/s10207-023-00790-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-023-00790-z