Abstract
Efficient management of the write buffer in solid-state drives (SSDs) can be achieved by predicting future I/O request patterns using machine learning techniques. However, the computational demands posed by sophisticated approaches like deep learning remain significant, despite the increasing computational power of SSDs. This paper presents a novel approach to write buffer management that addresses these challenges. Our method employs a lightweight yet accurate random forest classifier to predict the forward reuse distances (FRDs) of I/O requests, indicating the likelihood of recurring identical I/O requests. Our key insight is that, rather than aiming for exact FRD predictions for future individual requests, we focus on identifying whether the predicted FRD exceeds the buffer size. With this insight, our method implements efficient buffer management operations, including bypassing the buffer storage when necessary. To achieve this, we introduce a banding method that quantizes FRDs according to the buffer size. This enables predictions at the band level, forming the foundation for a lightweight machine learning model. Subsequently, we assign high caching priority to write requests that are anticipated to have a short FRD band. Through extensive evaluations utilizing a simulator, we demonstrate that our method achieves results comparable to those of the optimal algorithm in terms of hit rate in most scenarios. Moreover, our approach outperforms state-of-the-art algorithms, which depend on past I/O reference patterns, by up to 27%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li N, et al (2022) Fantastic SSD internals and how to learn and use them. In: Malka M, Kolodner H, Bellosa F, Gabel M (eds.) The 15th ACM international systems and storage conference (SYSTOR), Haifa, Israel, ACM, June 13 - 15, 2022, pp 72–84
Do J, Sengupta S, Swanson S (2019) Programmable solid-state storage in future cloud datacenters. Commun ACM 62:54–62
Lin H et al (2023) Adaptive management with request granularity for DRAM cache inside nand-based SSDs. IEEE Trans Comput Aided Des Integr Circuits Syst 42:2475–2487
Chen X, Li Y, Zhang T (2019) Reducing flash memory write traffic by exploiting a few MBs of capacitor-powered write buffer inside solid-state drives (SSDs). IEEE Trans Comput 68:426–439
Li P, Gu Y (2020) Learning forward reuse distance. CoRR. arXiv:2007.15859
Belady LA (1966) A study of replacement algorithms for virtual-storage computer. IBM Syst J 5:78–101
Park S, Jung D, Kang J, Kim J, Lee J (2006) CFLRU: a replacement algorithm for flash memory. In: Hong S, Wolf WH, Flautner K, Kim T (eds.) Proceedings of the 2006 international conference on compilers, architecture, and synthesis for embedded systems, ACM, CASES 2006, Seoul, Korea, October 22-25, 2006, pp 234–241
Cui J, Wu W, Wang Y, Duan Z (2014) PT-LRU: a probabilistic page replacement algorithm for NAND flash-based consumer electronics. IEEE Trans Consum Electron 60:614–622
Tripathy S, Satpathy M (2022) SSD internal cache management policies: a survey. J Syst Archit 122:102334
Du C, Yao Y, Zhou J, Xu X (2019) VBBMS: a novel buffer management strategy for NAND flash storage devices. IEEE Trans Consum Electron 65:134–141
Jin P, Ou Y, Härder T, Li Z (2012) AD-LRU: an efficient buffer replacement algorithm for flash-based databases. Data Knowl Eng 72:83–102
Yuan Y et al (2019) DPW-LRU: an efficient buffer management policy based on dynamic page weight for flash memory in cyber-physical systems. IEEE Access 7:58810–58821
Kwak J et al (2020) GALRU: a group-aware buffer management scheme for flash storage systems. IEEE Access 8:185360–185372
Yuan Y et al (2017) PR-LRU: a novel buffer replacement algorithm based on the probability of reference for flash memory. IEEE Access 5:12626–12634
Wu S, Lin Y, Mao B, Jiang H (2016) GCaR: Garbage Collection aware Cache Management with Improved Performance for Flash-based SSDs. In: Ozturk O, Ebcioglu K, Kandemir MT, Mutlu O (eds.) Proceedings of the 2016 international conference on supercomputing, ICS 2016, ACM, Istanbul, Turkey, June 1-3, 2016, pp 28:1–28:12
Kang DH, Han SJ, Kim Y-C, Eom YI (2017) CLOCK-DNV: a write buffer algorithm for flash storage devices of consumer electronics. IEEE Trans Consum Electron 63:85–91
Yang J, Qiu Z, Zhang Y, Yue Y, Rashmi KV (2023) FIFO can be Better than LRU: the Power of Lazy Promotion and Quick Demotion. In: Schwarzkopf M, Baumann A, Crooks N (eds.) Proceedings of the 19th workshop on hot topics in operating systems, ACM, HOTOS 2023, Providence, RI, USA, June 22-24, 2023, pp 70–79
Kavalanekar S, Worthington BL, Zhang Q, Sharda V (2008) Characterization of storage workload traces from production Windows Servers. In: Christie D, Lee A, Mutlu O, Zorn BG (eds.) International symposium on workload characterization (IISWC), Seattle, Washington, USA, September 14-16, 2008, IEEE Computer Society, pp 119–128
Riska A, Riedel E (2006) Disk drive level workload characterization. In: Adya A, Nahum EM (eds.) The 2006 USENIX Annual Technical Conference (ATC), USENIX, Boston, MA, USA, May 30 - June 3, 2006, pp 97–102
Kwon M, Gouk D, Lee S, Jung M (2022) Hardware/software co-programmable framework for computational SSDs to accelerate deep learning service on large-scale graphs. In: Hildebrand D, Porter DE (eds.) 20th USENIX conference on file and storage technologies, USENIX Association, FAST 2022, Santa Clara, CA, USA, February 22-24, 2022, pp 147–164
Narayanan A, Verma S, Ramadan E, Babaie P, Zhang Z (2018) DeepCache: a deep learning based framework for content caching. In: Crowcroft J, Zhang N (eds.) Proceedings of the 2018 workshop on network meets AI & ML, NetAI@SIGCOMM 2018, ACM, Budapest, Hungary, August 24, 2018, pp 48–53
Yu C, Zhou W (2020) Decision making in synthesis cross technologies using LSTMs and transfer learning. In: Schlichtmann U, Gal R, Amrouch H, Li HH (eds.) MLCAD ’20: 2020 ACM/IEEE workshop on machine learning for CAD, Virtual Event, ACM, Iceland, November 16-20, 2020, pp 55–60
Narayanan D, Donnelly A, Rowstron AIT (2008) Write off-loading: practical power management for enterprise storage. ACM Trans Storage 4:10:1-10:23
Koller R, Rangaswami R (2010) I/O deduplication: utilizing content similarity to improve I/O performance. ACM Trans Storage 6:13:1-13:26
Lee C, et al (2017) Understanding storage traffic characteristics on enterprise virtual desktop infrastructure. In: Chen D, Desnoyers P, de Lara E (eds.) Proceedings of the 10th ACM international systems and storage conference, ACM, SYSTOR 2017, Haifa, Israel, May 22-24, 2017, pp 13:1–13:11
Yadgar G, Gabel M, Jaffer S, Schroeder B (2021) SSD-based workload characteristics and their performance implications. ACM Trans Storage (TOS) 17:1–26
Jiang S, Ding X, Chen F, Tan E, Zhang X (2005) DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality. In: Bertino E (ed.) Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, USENIX, p 8
Megiddo N, Modha DS (2003) ARC: a self-tuning low overhead replacement cache. In: Chase J (ed.) 2nd USENIX conference on file and storage technologies (FAST 03), USENIX, pp 115–130
Choi H, Park S (2022) Learning future reference patterns for efficient cache replacement decisions. IEEE Access 10:25922–25934
Wu N, Li P (2020) Phoebe: reuse-aware online caching with reinforcement learning for emerging storage models. CoRR. arXiv:2011.07160
Liu W, Cui J, Liu J, Yang LT (2020) MLCache: a space-efficient cache scheme based on reuse distance and machine learning for NVMe SSDs. In: Mechbal N (ed.) Proceedings of the 39th international conference on computer-aided design, Computing machinery, pp 1–9
Song Z, Berger DS, Li K, Lloyd W (2020) Learning relaxed belady for content distribution network caching. In: Bhagwan R, Porter G (eds) 17th USENIX symposium on networked systems design and implementation, USENIX Association, NSDI 2020, Santa Clara, CA, USA, February 25-27, 2020, pp 529–544
Yang J, Mao Z, Yue Y, Rashmi KV (2023) GL-Cache: group-level learning for efficient and high-performance caching. In: Goel A, Naor D (eds.) 21st USENIX conference on file and storage technologies, USENIX Association, FAST 2023, Santa Clara, CA, USA, February 21–23, 2023, pp 115–134
Wang H, Yi X, Huang P, Cheng B, Zhou K (2018) Efficient SSD caching by avoiding unnecessary writes using machine learning. In: Malony AD (ed.) Proceedings of the 47th international conference on parallel processing, ICPP 2018, ACM, Eugene, OR, USA, August 13–16, 2018, pp 82:1–82:10
Wu C, Li I, Chen J (2021) A supervised-learning-based garbage collection in solid-state drives (SSDs). IT Prof 23:39–45
Zhang Y, et al (2020) A Machine Learning Based Write Policy for SSD Cache in Cloud Block Storage. In: Natale GD (ed.) 2020 Design, automation & test in Europe conference & exhibition, IEEE, DATE 2020, Grenoble, France, March 9-13, 2020, pp 1279–1282
I T, Lokhandwala M, Hu Y, Tseng H (2018) Pensieve: a machine learning assisted SSD layer for extending the lifetime. In: Khan O (ed.) 36th IEEE international conference on computer design, IEEE Computer Society, ICCD 2018, Orlando, FL, USA, October 7–10, 2018, pp 35–42
Zhu B, et al (2013) Proactive drive failure prediction for large scale storage systems. In: Amer A, Wong TM (eds.) IEEE 29th symposium on mass storage systems and technologies, MSST 2013, IEEE Computer Society, May 6-10, 2013, Long Beach, CA, USA, pp 1–5
Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: Perner P (ed.) Machine learning and data mining in pattern recognition - 8th international conference, MLDM 2012, Berlin, Germany, July 13-20, 2012. Proceedings, Vol. 7376 of Lecture Notes in Computer Science, Springer, pp 154–168
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Krishnapuram B, et al. (eds.) Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, San Francisco, CA, USA, August 13–17, 2016, pp 785–794
Jiang X et al (2022) An imbalanced multifault diagnosis method based on bias weights AdaBoost. IEEE Trans Instrum Meas 71:1–8
Bjørling M, Gonzalez J, Bonnet P (2017) LightNVM: the linux open-channel SSD subsystem. In: Kuenning G, Waldspurger CA (eds.) 15th USENIX Conference on File and Storage Technologies, USENIX Association, FAST 2017, Santa Clara, CA, USA, February 27–March 2, 2017, pp 359–374
Liang S, Wang Y, Liu C, Li H, Li X (2019) InS-DLA: an In-SSD deep learning accelerator for near-data processing. In: Sourdis I, et al. (eds.) 29th international conference on field programmable logic and applications, IEEE, FPL 2019, Barcelona, Spain, September 8–12, 2019, pp 173–179
Liu R, et al (2020) SSDKeeper: self-adapting channel allocation to improve the performance of ssd devices. In: Yang Y (ed.) 2020 IEEE international parallel and distributed processing symposium (IPDPS), IEEE, New Orleans, LA, USA, May 18–22, 2020, pp 966–975
Kim K, Lee E, Kim T (2019) HMB-SSD: framework for efficient exploiting of the host memory buffer in the NVMe SSD. IEEE Access 7:150403–150411
Kim K, Kim S, Kim T (2020) HMB-I/O: fast track for handling urgent I/Os in nonvolatile memory express solid-state drives. Appl Sci 10:4341
McKinney W et al (2011) pandas: a foundational python library for data analysis and statistics. Python High Perform Sci Comput 14:1–9
Acknowledgements
The work reported in this paper was conducted during the sabbatical year of Kwangwoon University in 2022. This research was also supported by the KIAT (Korea Institute for Advancement of Technology) grant funded by the Korea Government (MOTIE: Ministry of Trade Industry and Energy). (P0017124, HRD Program for Industrial Innovation).
Author information
Authors and Affiliations
Contributions
H. C. performed all experiments and prepared figures. T. K. proposed a main idea and wrote the main manuscript. I. K. proofread the manuscript and redrew all figures. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cha, H., Kim, I.K. & Kim, T. Using a random forest to predict quantized reuse distance in an SSD write buffer. Computing 106, 3967–3986 (2024). https://doi.org/10.1007/s00607-024-01343-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-024-01343-5
Keywords
- Solid-state drives (SSD)
- Random forest classifier
- Predicting forward reuse distance
- Write buffer management