Self-supervised Log Parsing

Nedelkoski, Sasho; Bogatinovski, Jasmin; Acker, Alexander; Cardoso, Jorge; Kao, Odej

doi:10.1007/978-3-030-67667-4_8

Sasho Nedelkoski¹¹,
Jasmin Bogatinovski¹¹,
Alexander Acker¹¹,
Jorge Cardoso¹² &
…
Odej Kao¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12460))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1998 Accesses
39 Citations

Abstract

Logs are extensively used during the development and maintenance of software systems. They collect runtime events and allow tracking of code execution, which enables a variety of critical tasks such as troubleshooting and fault detection. However, large-scale software systems generate massive volumes of semi-structured log records, posing a major challenge for automated analysis. Parsing semi-structured records with free-form text log messages into structured templates is the first and crucial step that enables further analysis. Existing approaches rely on log-specific heuristics or manual rule extraction. These are often specialized in parsing certain log types, and thus, limit performance scores and generalization. We propose a novel parsing technique called NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling (MLM). In the process of parsing, the model extracts summarizations from the logs in the form of a vector embedding. This allows the coupling of the MLM as pre-training with a downstream anomaly detection task. We evaluate the parsing performance of NuLog on 10 real-world log datasets and compare the results with 12 parsing techniques. The results show that NuLog outperforms existing methods in parsing accuracy with an average of 99% and achieves the lowest edit distance to the ground truth templates. Additionally, two case studies are conducted to demonstrate the ability of the approach for log-based anomaly detection in both supervised and unsupervised scenario. The results show that NuLog can be successfully used to support troubleshooting tasks. The implementation is available at https://github.com/nulog/nulog.

S. Nedelkoski and J. Bogatinovski—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On the effectiveness of log representation for log-based anomaly detection

Article 09 October 2023

An empirical study of the impact of log parsers on the performance of log-based anomaly detection

Article 08 November 2022

Impact of log parsing on deep learning-based anomaly detection

Article Open access 17 August 2024

References

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Du, M., Li, F.: Spell: streaming parsing of system event logs. In: Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 859–864 (2016)
Google Scholar
Du, M., Li, F., Zheng, G., Srikumar, V.: DeepLog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1285–1298 (2017)
Google Scholar
Fu, Q., Lou, J.G., Wang, Y., Li, J.: Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 2009 IEEE International Conference on Data Mining, pp. 149–158 (2009)
Google Scholar
Hamooni, H., Debnath, B., Xu, J., Zhang, H., Jiang, G., Mueen, A.: LogMine: fast pattern recognition for log analytics. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1573–1582 (2016)
Google Scholar
He, P., Zhu, J., He, S., Li, J., Lyu, M.R.: An evaluation study on log parsing and its use in log mining. In: Proceedings of the 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 654–661 (2016)
Google Scholar
He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: Proceedings of the 2017 IEEE International Conference on Web Services (ICWS), pp. 33–40 (2017)
Google Scholar
Jiang, Z.M., Hassan, A.E., Hamann, G., Flora, P.: An automated approach for abstracting execution logs to execution events. J. Softw. Maint. Evol.: Res. Pract. 20, 249–267 (2008)
Article Google Scholar
Liu, J., Zhu, J., He, S., He, P., Zheng, Z., Lyu, M.R.: Logzip: extracting hidden structures via iterative clustering for log compression. In: Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 863–873. IEEE (2019)
Google Scholar
Meng, W., et al.: LogAnomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019. International Joint Conferences on Artificial Intelligence Organization, vol. 7, pp. 4739–4745 (2019)
Google Scholar
Messaoudi, S., Panichella, A., Bianculli, D., Briand, L., Sasnauskas, R.: A search-based approach for accurate identification of log message formats. In: Proceedings of the 26th Conference on Program Comprehension, pp. 167–177 (2018)
Google Scholar
Mizutani, M.: Incremental mining of system log format. In: Proceedings of the 2013 IEEE International Conference on Services Computing, pp. 595–602 (2013)
Google Scholar
Nagappan, M., Vouk, M.A.: Abstracting log lines to log event types for mining software system logs. In: Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pp. 114–117 (2010)
Google Scholar
Nandi, A., Mandal, A., Atreja, S., Dasgupta, G.B., Bhattacharya, S.: Anomaly detection using program control flow graph mining from execution logs. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 215–224 (2016)
Google Scholar
Nedelkoski, S., Cardoso, J., Kao, O.: Anomaly detection and classification using distributed tracing and deep learning. In: Proceedings of the 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 241–250 (2019)
Google Scholar
Nedelkoski, S., Cardoso, J., Kao, O.: Anomaly detection from system tracing data using multimodal deep learning. In: Proceeding of the 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), pp. 179–186 (2019)
Google Scholar
Nedelkoski, S., Bogatinovski, J., Mandapati, A.K., Becker, S., Cardoso, J., Kao, O.: Multi-source distributed system data for AI-powered analytics. In: Brogi, A., Zimmermann, W., Kritikos, K. (eds.) ESOCC 2020. LNCS, vol. 12054, pp. 161–176. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44769-4_13
Chapter Google Scholar
Shima, K.: Length matters: clustering system log messages using length of words. arXiv preprint arXiv:1611.03213 (2016)
Tang, L., Li, T., Perng, C.S.: LogSig: generating system events from raw textual logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 785–794 (2011)
Google Scholar
Taylor, W.L.: Cloze procedure: a new tool for measuring readability. J. Q. 30, 415–433 (1953)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 117–132 (2009)
Google Scholar
Zhang, X., et al.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 807–817 (2019)
Google Scholar
Zhu, J., et al.: Tools and benchmarks for automated log parsing. In: Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 121–130. IEEE (2019)
Google Scholar
Zhu, L., Laptev, N.: Deep and confident prediction for time series at uber. In: Proceedings of the 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 103–110 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Distributed Systems, TU Berlin, Berlin, Germany
Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker & Odej Kao
Department of Informatics Engineering/CISUC, University of Coimbra, Coimbra, Portugal
Jorge Cardoso

Authors

Sasho Nedelkoski
View author publications
You can also search for this author in PubMed Google Scholar
Jasmin Bogatinovski
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Acker
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Cardoso
View author publications
You can also search for this author in PubMed Google Scholar
Odej Kao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sasho Nedelkoski or Jasmin Bogatinovski .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, WA, USA
Yuxiao Dong
Jožef Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Amazon Alexa Knowledge, Cambridge, UK
Craig Saunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nedelkoski, S., Bogatinovski, J., Acker, A., Cardoso, J., Kao, O. (2021). Self-supervised Log Parsing. In: Dong, Y., Mladenić, D., Saunders, C. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12460. Springer, Cham. https://doi.org/10.1007/978-3-030-67667-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-67667-4_8
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67666-7
Online ISBN: 978-3-030-67667-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Self-supervised Log Parsing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the effectiveness of log representation for log-based anomaly detection

An empirical study of the impact of log parsers on the performance of log-based anomaly detection

Impact of log parsing on deep learning-based anomaly detection

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Self-supervised Log Parsing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the effectiveness of log representation for log-based anomaly detection

An empirical study of the impact of log parsers on the performance of log-based anomaly detection

Impact of log parsing on deep learning-based anomaly detection

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation