iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/s10586-021-03463-5
Using advanced data structures to enable responsive security monitoring | Cluster Computing Skip to main content
Log in

Using advanced data structures to enable responsive security monitoring

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Write-optimized data structures (WODS), offer the potential to keep up with cyberstream event rates and give sub-second query response for key items like IP addresses. These data structures organize logs as the events are observed. To work in a real-world environment and not fill up the disk, WODS must efficiently expire older events. As the basis for our research into organizing security monitoring data, we implemented a tool, called Diventi, to index IP addresses in connection logs using RocksDB (a write-optimized LSM tree). We extended Diventi to automatically expire data as part of the data structures’ normal operations. We guarantee that Diventi always tracks the N most recent events and tracks no more than \(N+k\) events for a parameter \(k < N\), while ensuring the index is opportunistically pruned. To test Diventi at scale in a controlled environment, we used anonymized traces of IP communications collected at SuperComputing 2019. We synthetically extended the 2.4 billion connection events to 100 billion events. We tested Diventi vs. Elasticsearch, a common log indexing tool. In our test environment, Elasticsearch saw an ingestion rate of at best 37,000 events/s while Diventi sustained ingestion rates greater than 171,000 events/s. Our query response times were as much as 100 times faster, typically answering queries in under 80 ms. Furthermore, we saw no noticeable degradation in Diventi from expiration. We have deployed Diventi for many months where it has performed well and supported new security analysis capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The anonymized traces from SuperComputing and source code for our IP address indexing tool and our test result data will be made publicly available with this publication.

Code availability

Our code and associated tools will be made publicly available with this publication.

Notes

  1. Traditionally, each level in an LSM tree is implemented as a B-tree. However, modern systems maintain sorted runs instead and store metadata information in main-memory for each level (fence pointers that store information for every disk page of every run).

  2. In the analysis section, we show in detail how the value of k derives from the values of several other parameters, but in practice k should be \( < N/3\).

  3. \(e_{\text {hist}}\) This is slightly oversimplifying: actually \(I_{\text {compact}}\) and \(I_{\text {update}}\) both depend on ingestion rate, whereas \(e_{\text {hist}}\) actually depends on the rate of event creations, as determined by their timestamps. In a practical system, these two rates will be the same, as events should be ingested at the same rate as they are created. However, in the case of our test setup, we are ingesting a pre-generated backlog of data as fast as possible, so the ingestion rate is significantly higher than the event-creation rate.

References

  1. Bender, M.A., Farach-Colton, M., Fineman, J.T., Fogel, Y.R., Kuszmaul, B.C., Nelson, J.: Cache-oblivious streaming B-trees. In: Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 81–92. ACM (2007)

  2. Bender, M.A., Farach-Colton, M., Jannen, W., Johnson, R., Kuszmaul, B.C., Porter, D.E., Yuan, J., Zhan, Y.: An introduction to B\(^{\epsilon}\)-trees and write-optimization. Login USENIX Mag. 40(5), 22–28 (2015)

    Google Scholar 

  3. Berry, J.W., Porter, A.M.: Stateful streaming in distributed memory supercomputers. In: Slides from an Invited Talk at the Chesapeake Large-Scale Analytics Conference (CLSAC), October 2016. Slides archived at OSTI (2016). https://www.osti.gov/servlets/purl/1406959. Accessed 3 April 2018

  4. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  Google Scholar 

  5. Brodal, G.S., Fagerberg, R.: Lower bounds for external memory dictionaries. In: Proceedings of the Fourteenth Annual ACM–SIAM Symposium on Discrete Algorithms, pp. 546–554. Society for Industrial and Applied Mathematics (2003)

  6. Brodal, G.S., Demaine, E.D., Fineman, J.T., Iacono, J., Langerman, S., Munro, J.I.: Cache-oblivious dynamic dictionaries with update/query tradeoffs. In: Proceedings of the Twenty-First Annual ACM–SIAM Symposium on Discrete Algorithms, pp. 1448–1456. SIAM (2010)

  7. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008)

    Article  Google Scholar 

  8. Claise, B., Bryant, S.: Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information. Technical Report, RFC 5101, January (2008)

  9. Dean, J., Ghemawat, S.: LevelDB: a fast and lightweight key/value database library by Google (2021). http://code.google.com/p/leveldb/

  10. DHS Cybersecurity and Infrastructure Security Agency (CISA): Einstein (2021). https://www.cisa.gov/einstein

  11. Facebook: RocksDB (2021). https://github.com/facebook/rocksdb

  12. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  13. Matsunobu, Y., Dong, S., Lee, H.: MyRocks: LSM-tree database storage engine serving Facebook’s social graph. Proc. VLDB Endow. 13(12), 3217–3230 (2020)

    Article  Google Scholar 

  14. NetFlow, Cisco Systems Inc: Netflow Services Solutions Guide. Cisco System, Inc. (2007). http://www.cisco.com/en/US/docs/ios/solutions_docs/netflow/nfwhite.html

  15. O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (LSM-tree). Acta Inform. 33(4), 351–385 (1996)

    Article  Google Scholar 

  16. Pandey, P., Singh, S., Bender, M.A., Berry, J.W., Farach-Colton, M., Johnson, R., Kroeger, T.M., Phillips, C.A.: Timely reporting of heavy hitters using external memory. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1431–1446 (2020)

  17. Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23), 2435–2463 (1999)

    Article  Google Scholar 

  18. Sarkar, S., Papon, T.I., Staratzis, D., Athanassoulis, M.: Lethe: a tunable delete-aware LSM engine. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 893–908 (2020)

Download references

Acknowledgements

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under Contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Funding

This research was funded by Sandia National Labs LDRD 218336 and 222383. We also gratefully acknowledge support from NSF Grants CCF-2118832, CCF-2106827, CCF-1725543, CSR-1763680, CCF-1716252, CNS-1938709.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the concepts, design, analysis and experimentation planning. JV and DRD lead the implementation and testing and were supervised by EDT and TMK. The initial conception and inspiration for this work was a team effort as was the writing and revisions.

Corresponding author

Correspondence to Thomas M. Kroeger.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vorobyeva, J., Delayo, D.R., Bender, M.A. et al. Using advanced data structures to enable responsive security monitoring. Cluster Comput 25, 2893–2914 (2022). https://doi.org/10.1007/s10586-021-03463-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03463-5

Keywords

Navigation