iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/s00607-024-01343-5
Using a random forest to predict quantized reuse distance in an SSD write buffer | Computing Skip to main content
Log in

Using a random forest to predict quantized reuse distance in an SSD write buffer

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Efficient management of the write buffer in solid-state drives (SSDs) can be achieved by predicting future I/O request patterns using machine learning techniques. However, the computational demands posed by sophisticated approaches like deep learning remain significant, despite the increasing computational power of SSDs. This paper presents a novel approach to write buffer management that addresses these challenges. Our method employs a lightweight yet accurate random forest classifier to predict the forward reuse distances (FRDs) of I/O requests, indicating the likelihood of recurring identical I/O requests. Our key insight is that, rather than aiming for exact FRD predictions for future individual requests, we focus on identifying whether the predicted FRD exceeds the buffer size. With this insight, our method implements efficient buffer management operations, including bypassing the buffer storage when necessary. To achieve this, we introduce a banding method that quantizes FRDs according to the buffer size. This enables predictions at the band level, forming the foundation for a lightweight machine learning model. Subsequently, we assign high caching priority to write requests that are anticipated to have a short FRD band. Through extensive evaluations utilizing a simulator, we demonstrate that our method achieves results comparable to those of the optimal algorithm in terms of hit rate in most scenarios. Moreover, our approach outperforms state-of-the-art algorithms, which depend on past I/O reference patterns, by up to 27%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Li N, et al (2022) Fantastic SSD internals and how to learn and use them. In: Malka M, Kolodner H, Bellosa F, Gabel M (eds.) The 15th ACM international systems and storage conference (SYSTOR), Haifa, Israel, ACM, June 13 - 15, 2022, pp 72–84

  2. Do J, Sengupta S, Swanson S (2019) Programmable solid-state storage in future cloud datacenters. Commun ACM 62:54–62

    Article  Google Scholar 

  3. Lin H et al (2023) Adaptive management with request granularity for DRAM cache inside nand-based SSDs. IEEE Trans Comput Aided Des Integr Circuits Syst 42:2475–2487

    Article  Google Scholar 

  4. Chen X, Li Y, Zhang T (2019) Reducing flash memory write traffic by exploiting a few MBs of capacitor-powered write buffer inside solid-state drives (SSDs). IEEE Trans Comput 68:426–439

    Article  MathSciNet  Google Scholar 

  5. Li P, Gu Y (2020) Learning forward reuse distance. CoRR. arXiv:2007.15859

  6. Belady LA (1966) A study of replacement algorithms for virtual-storage computer. IBM Syst J 5:78–101

    Article  Google Scholar 

  7. Park S, Jung D, Kang J, Kim J, Lee J (2006) CFLRU: a replacement algorithm for flash memory. In: Hong S, Wolf WH, Flautner K, Kim T (eds.) Proceedings of the 2006 international conference on compilers, architecture, and synthesis for embedded systems, ACM, CASES 2006, Seoul, Korea, October 22-25, 2006, pp 234–241

  8. Cui J, Wu W, Wang Y, Duan Z (2014) PT-LRU: a probabilistic page replacement algorithm for NAND flash-based consumer electronics. IEEE Trans Consum Electron 60:614–622

    Article  Google Scholar 

  9. Tripathy S, Satpathy M (2022) SSD internal cache management policies: a survey. J Syst Archit 122:102334

    Article  Google Scholar 

  10. Du C, Yao Y, Zhou J, Xu X (2019) VBBMS: a novel buffer management strategy for NAND flash storage devices. IEEE Trans Consum Electron 65:134–141

    Article  Google Scholar 

  11. Jin P, Ou Y, Härder T, Li Z (2012) AD-LRU: an efficient buffer replacement algorithm for flash-based databases. Data Knowl Eng 72:83–102

    Article  Google Scholar 

  12. Yuan Y et al (2019) DPW-LRU: an efficient buffer management policy based on dynamic page weight for flash memory in cyber-physical systems. IEEE Access 7:58810–58821

    Article  Google Scholar 

  13. Kwak J et al (2020) GALRU: a group-aware buffer management scheme for flash storage systems. IEEE Access 8:185360–185372

    Article  Google Scholar 

  14. Yuan Y et al (2017) PR-LRU: a novel buffer replacement algorithm based on the probability of reference for flash memory. IEEE Access 5:12626–12634

    Article  Google Scholar 

  15. Wu S, Lin Y, Mao B, Jiang H (2016) GCaR: Garbage Collection aware Cache Management with Improved Performance for Flash-based SSDs. In: Ozturk O, Ebcioglu K, Kandemir MT, Mutlu O (eds.) Proceedings of the 2016 international conference on supercomputing, ICS 2016, ACM, Istanbul, Turkey, June 1-3, 2016, pp 28:1–28:12

  16. Kang DH, Han SJ, Kim Y-C, Eom YI (2017) CLOCK-DNV: a write buffer algorithm for flash storage devices of consumer electronics. IEEE Trans Consum Electron 63:85–91

    Article  Google Scholar 

  17. Yang J, Qiu Z, Zhang Y, Yue Y, Rashmi KV (2023) FIFO can be Better than LRU: the Power of Lazy Promotion and Quick Demotion. In: Schwarzkopf M, Baumann A, Crooks N (eds.) Proceedings of the 19th workshop on hot topics in operating systems, ACM, HOTOS 2023, Providence, RI, USA, June 22-24, 2023, pp 70–79

  18. Kavalanekar S, Worthington BL, Zhang Q, Sharda V (2008) Characterization of storage workload traces from production Windows Servers. In: Christie D, Lee A, Mutlu O, Zorn BG (eds.) International symposium on workload characterization (IISWC), Seattle, Washington, USA, September 14-16, 2008, IEEE Computer Society, pp 119–128

  19. Riska A, Riedel E (2006) Disk drive level workload characterization. In: Adya A, Nahum EM (eds.) The 2006 USENIX Annual Technical Conference (ATC), USENIX, Boston, MA, USA, May 30 - June 3, 2006, pp 97–102

  20. Kwon M, Gouk D, Lee S, Jung M (2022) Hardware/software co-programmable framework for computational SSDs to accelerate deep learning service on large-scale graphs. In: Hildebrand D, Porter DE (eds.) 20th USENIX conference on file and storage technologies, USENIX Association, FAST 2022, Santa Clara, CA, USA, February 22-24, 2022, pp 147–164

  21. Narayanan A, Verma S, Ramadan E, Babaie P, Zhang Z (2018) DeepCache: a deep learning based framework for content caching. In: Crowcroft J, Zhang N (eds.) Proceedings of the 2018 workshop on network meets AI & ML, NetAI@SIGCOMM 2018, ACM, Budapest, Hungary, August 24, 2018, pp 48–53

  22. Yu C, Zhou W (2020) Decision making in synthesis cross technologies using LSTMs and transfer learning. In: Schlichtmann U, Gal R, Amrouch H, Li HH (eds.) MLCAD ’20: 2020 ACM/IEEE workshop on machine learning for CAD, Virtual Event, ACM, Iceland, November 16-20, 2020, pp 55–60

  23. Narayanan D, Donnelly A, Rowstron AIT (2008) Write off-loading: practical power management for enterprise storage. ACM Trans Storage 4:10:1-10:23

    Article  Google Scholar 

  24. Koller R, Rangaswami R (2010) I/O deduplication: utilizing content similarity to improve I/O performance. ACM Trans Storage 6:13:1-13:26

    Article  Google Scholar 

  25. Lee C, et al (2017) Understanding storage traffic characteristics on enterprise virtual desktop infrastructure. In: Chen D, Desnoyers P, de Lara E (eds.) Proceedings of the 10th ACM international systems and storage conference, ACM, SYSTOR 2017, Haifa, Israel, May 22-24, 2017, pp 13:1–13:11

  26. Yadgar G, Gabel M, Jaffer S, Schroeder B (2021) SSD-based workload characteristics and their performance implications. ACM Trans Storage (TOS) 17:1–26

    Article  Google Scholar 

  27. Jiang S, Ding X, Chen F, Tan E, Zhang X (2005) DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality. In: Bertino E (ed.) Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, USENIX, p 8

  28. Megiddo N, Modha DS (2003) ARC: a self-tuning low overhead replacement cache. In: Chase J (ed.) 2nd USENIX conference on file and storage technologies (FAST 03), USENIX, pp 115–130

  29. Choi H, Park S (2022) Learning future reference patterns for efficient cache replacement decisions. IEEE Access 10:25922–25934

    Article  Google Scholar 

  30. Wu N, Li P (2020) Phoebe: reuse-aware online caching with reinforcement learning for emerging storage models. CoRR. arXiv:2011.07160

  31. Liu W, Cui J, Liu J, Yang LT (2020) MLCache: a space-efficient cache scheme based on reuse distance and machine learning for NVMe SSDs. In: Mechbal N (ed.) Proceedings of the 39th international conference on computer-aided design, Computing machinery, pp 1–9

  32. Song Z, Berger DS, Li K, Lloyd W (2020) Learning relaxed belady for content distribution network caching. In: Bhagwan R, Porter G (eds) 17th USENIX symposium on networked systems design and implementation, USENIX Association, NSDI 2020, Santa Clara, CA, USA, February 25-27, 2020, pp 529–544

  33. Yang J, Mao Z, Yue Y, Rashmi KV (2023) GL-Cache: group-level learning for efficient and high-performance caching. In: Goel A, Naor D (eds.) 21st USENIX conference on file and storage technologies, USENIX Association, FAST 2023, Santa Clara, CA, USA, February 21–23, 2023, pp 115–134

  34. Wang H, Yi X, Huang P, Cheng B, Zhou K (2018) Efficient SSD caching by avoiding unnecessary writes using machine learning. In: Malony AD (ed.) Proceedings of the 47th international conference on parallel processing, ICPP 2018, ACM, Eugene, OR, USA, August 13–16, 2018, pp 82:1–82:10

  35. Wu C, Li I, Chen J (2021) A supervised-learning-based garbage collection in solid-state drives (SSDs). IT Prof 23:39–45

    Article  Google Scholar 

  36. Zhang Y, et al (2020) A Machine Learning Based Write Policy for SSD Cache in Cloud Block Storage. In: Natale GD (ed.) 2020 Design, automation & test in Europe conference & exhibition, IEEE, DATE 2020, Grenoble, France, March 9-13, 2020, pp 1279–1282

  37. I T, Lokhandwala M, Hu Y, Tseng H (2018) Pensieve: a machine learning assisted SSD layer for extending the lifetime. In: Khan O (ed.) 36th IEEE international conference on computer design, IEEE Computer Society, ICCD 2018, Orlando, FL, USA, October 7–10, 2018, pp 35–42

  38. Zhu B, et al (2013) Proactive drive failure prediction for large scale storage systems. In: Amer A, Wong TM (eds.) IEEE 29th symposium on mass storage systems and technologies, MSST 2013, IEEE Computer Society, May 6-10, 2013, Long Beach, CA, USA, pp 1–5

  39. Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: Perner P (ed.) Machine learning and data mining in pattern recognition - 8th international conference, MLDM 2012, Berlin, Germany, July 13-20, 2012. Proceedings, Vol. 7376 of Lecture Notes in Computer Science, Springer, pp 154–168

  40. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Krishnapuram B, et al. (eds.) Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, San Francisco, CA, USA, August 13–17, 2016, pp 785–794

  41. Jiang X et al (2022) An imbalanced multifault diagnosis method based on bias weights AdaBoost. IEEE Trans Instrum Meas 71:1–8

    Google Scholar 

  42. Bjørling M, Gonzalez J, Bonnet P (2017) LightNVM: the linux open-channel SSD subsystem. In: Kuenning G, Waldspurger CA (eds.) 15th USENIX Conference on File and Storage Technologies, USENIX Association, FAST 2017, Santa Clara, CA, USA, February 27–March 2, 2017, pp 359–374

  43. Liang S, Wang Y, Liu C, Li H, Li X (2019) InS-DLA: an In-SSD deep learning accelerator for near-data processing. In: Sourdis I, et al. (eds.) 29th international conference on field programmable logic and applications, IEEE, FPL 2019, Barcelona, Spain, September 8–12, 2019, pp 173–179

  44. Liu R, et al (2020) SSDKeeper: self-adapting channel allocation to improve the performance of ssd devices. In: Yang Y (ed.) 2020 IEEE international parallel and distributed processing symposium (IPDPS), IEEE, New Orleans, LA, USA, May 18–22, 2020, pp 966–975

  45. Kim K, Lee E, Kim T (2019) HMB-SSD: framework for efficient exploiting of the host memory buffer in the NVMe SSD. IEEE Access 7:150403–150411

    Article  Google Scholar 

  46. Kim K, Kim S, Kim T (2020) HMB-I/O: fast track for handling urgent I/Os in nonvolatile memory express solid-state drives. Appl Sci 10:4341

    Article  Google Scholar 

  47. McKinney W et al (2011) pandas: a foundational python library for data analysis and statistics. Python High Perform Sci Comput 14:1–9

    Google Scholar 

Download references

Acknowledgements

The work reported in this paper was conducted during the sabbatical year of Kwangwoon University in 2022. This research was also supported by the KIAT (Korea Institute for Advancement of Technology) grant funded by the Korea Government (MOTIE: Ministry of Trade Industry and Energy). (P0017124, HRD Program for Industrial Innovation).

Author information

Authors and Affiliations

Authors

Contributions

H. C. performed all experiments and prepared figures. T. K. proposed a main idea and wrote the main manuscript. I. K. proofread the manuscript and redrew all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Taeseok Kim.

Ethics declarations

Conflict of interest

The authors declare no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cha, H., Kim, I.K. & Kim, T. Using a random forest to predict quantized reuse distance in an SSD write buffer. Computing 106, 3967–3986 (2024). https://doi.org/10.1007/s00607-024-01343-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-024-01343-5

Keywords

Mathematics Subject Classification

Navigation