Using a random forest to predict quantized reuse distance in an SSD write buffer

Cha, Hyejin; Kim, In Kee; Kim, Taeseok

doi:10.1007/s00607-024-01343-5

Using a random forest to predict quantized reuse distance in an SSD write buffer

Regular Paper
Published: 05 September 2024

Volume 106, pages 3967–3986, (2024)
Cite this article

Computing Aims and scope Submit manuscript

Hyejin Cha¹,
In Kee Kim² &
Taeseok Kim¹

132 Accesses
Explore all metrics

Abstract

Efficient management of the write buffer in solid-state drives (SSDs) can be achieved by predicting future I/O request patterns using machine learning techniques. However, the computational demands posed by sophisticated approaches like deep learning remain significant, despite the increasing computational power of SSDs. This paper presents a novel approach to write buffer management that addresses these challenges. Our method employs a lightweight yet accurate random forest classifier to predict the forward reuse distances (FRDs) of I/O requests, indicating the likelihood of recurring identical I/O requests. Our key insight is that, rather than aiming for exact FRD predictions for future individual requests, we focus on identifying whether the predicted FRD exceeds the buffer size. With this insight, our method implements efficient buffer management operations, including bypassing the buffer storage when necessary. To achieve this, we introduce a banding method that quantizes FRDs according to the buffer size. This enables predictions at the band level, forming the foundation for a lightweight machine learning model. Subsequently, we assign high caching priority to write requests that are anticipated to have a short FRD band. Through extensive evaluations utilizing a simulator, we demonstrate that our method achieves results comparable to those of the optimal algorithm in terms of hit rate in most scenarios. Moreover, our approach outperforms state-of-the-art algorithms, which depend on past I/O reference patterns, by up to 27%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Learning I/O Access Patterns to Improve Prefetching in SSDs

A Hierarchical Modeling Approach for Assessing the Reliability and Performability of Burst Buffers

Prober: exploiting sequential characteristics in buffer for improving SSDs write performance

Article 14 May 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Li N, et al (2022) Fantastic SSD internals and how to learn and use them. In: Malka M, Kolodner H, Bellosa F, Gabel M (eds.) The 15th ACM international systems and storage conference (SYSTOR), Haifa, Israel, ACM, June 13 - 15, 2022, pp 72–84
Do J, Sengupta S, Swanson S (2019) Programmable solid-state storage in future cloud datacenters. Commun ACM 62:54–62
Article Google Scholar
Lin H et al (2023) Adaptive management with request granularity for DRAM cache inside nand-based SSDs. IEEE Trans Comput Aided Des Integr Circuits Syst 42:2475–2487
Article Google Scholar
Chen X, Li Y, Zhang T (2019) Reducing flash memory write traffic by exploiting a few MBs of capacitor-powered write buffer inside solid-state drives (SSDs). IEEE Trans Comput 68:426–439
Article MathSciNet Google Scholar
Li P, Gu Y (2020) Learning forward reuse distance. CoRR. arXiv:2007.15859
Belady LA (1966) A study of replacement algorithms for virtual-storage computer. IBM Syst J 5:78–101
Article Google Scholar
Park S, Jung D, Kang J, Kim J, Lee J (2006) CFLRU: a replacement algorithm for flash memory. In: Hong S, Wolf WH, Flautner K, Kim T (eds.) Proceedings of the 2006 international conference on compilers, architecture, and synthesis for embedded systems, ACM, CASES 2006, Seoul, Korea, October 22-25, 2006, pp 234–241
Cui J, Wu W, Wang Y, Duan Z (2014) PT-LRU: a probabilistic page replacement algorithm for NAND flash-based consumer electronics. IEEE Trans Consum Electron 60:614–622
Article Google Scholar
Tripathy S, Satpathy M (2022) SSD internal cache management policies: a survey. J Syst Archit 122:102334
Article Google Scholar
Du C, Yao Y, Zhou J, Xu X (2019) VBBMS: a novel buffer management strategy for NAND flash storage devices. IEEE Trans Consum Electron 65:134–141
Article Google Scholar
Jin P, Ou Y, Härder T, Li Z (2012) AD-LRU: an efficient buffer replacement algorithm for flash-based databases. Data Knowl Eng 72:83–102
Article Google Scholar
Yuan Y et al (2019) DPW-LRU: an efficient buffer management policy based on dynamic page weight for flash memory in cyber-physical systems. IEEE Access 7:58810–58821
Article Google Scholar
Kwak J et al (2020) GALRU: a group-aware buffer management scheme for flash storage systems. IEEE Access 8:185360–185372
Article Google Scholar
Yuan Y et al (2017) PR-LRU: a novel buffer replacement algorithm based on the probability of reference for flash memory. IEEE Access 5:12626–12634
Article Google Scholar
Wu S, Lin Y, Mao B, Jiang H (2016) GCaR: Garbage Collection aware Cache Management with Improved Performance for Flash-based SSDs. In: Ozturk O, Ebcioglu K, Kandemir MT, Mutlu O (eds.) Proceedings of the 2016 international conference on supercomputing, ICS 2016, ACM, Istanbul, Turkey, June 1-3, 2016, pp 28:1–28:12
Kang DH, Han SJ, Kim Y-C, Eom YI (2017) CLOCK-DNV: a write buffer algorithm for flash storage devices of consumer electronics. IEEE Trans Consum Electron 63:85–91
Article Google Scholar
Yang J, Qiu Z, Zhang Y, Yue Y, Rashmi KV (2023) FIFO can be Better than LRU: the Power of Lazy Promotion and Quick Demotion. In: Schwarzkopf M, Baumann A, Crooks N (eds.) Proceedings of the 19th workshop on hot topics in operating systems, ACM, HOTOS 2023, Providence, RI, USA, June 22-24, 2023, pp 70–79
Kavalanekar S, Worthington BL, Zhang Q, Sharda V (2008) Characterization of storage workload traces from production Windows Servers. In: Christie D, Lee A, Mutlu O, Zorn BG (eds.) International symposium on workload characterization (IISWC), Seattle, Washington, USA, September 14-16, 2008, IEEE Computer Society, pp 119–128
Riska A, Riedel E (2006) Disk drive level workload characterization. In: Adya A, Nahum EM (eds.) The 2006 USENIX Annual Technical Conference (ATC), USENIX, Boston, MA, USA, May 30 - June 3, 2006, pp 97–102
Kwon M, Gouk D, Lee S, Jung M (2022) Hardware/software co-programmable framework for computational SSDs to accelerate deep learning service on large-scale graphs. In: Hildebrand D, Porter DE (eds.) 20th USENIX conference on file and storage technologies, USENIX Association, FAST 2022, Santa Clara, CA, USA, February 22-24, 2022, pp 147–164
Narayanan A, Verma S, Ramadan E, Babaie P, Zhang Z (2018) DeepCache: a deep learning based framework for content caching. In: Crowcroft J, Zhang N (eds.) Proceedings of the 2018 workshop on network meets AI & ML, NetAI@SIGCOMM 2018, ACM, Budapest, Hungary, August 24, 2018, pp 48–53
Yu C, Zhou W (2020) Decision making in synthesis cross technologies using LSTMs and transfer learning. In: Schlichtmann U, Gal R, Amrouch H, Li HH (eds.) MLCAD ’20: 2020 ACM/IEEE workshop on machine learning for CAD, Virtual Event, ACM, Iceland, November 16-20, 2020, pp 55–60
Narayanan D, Donnelly A, Rowstron AIT (2008) Write off-loading: practical power management for enterprise storage. ACM Trans Storage 4:10:1-10:23
Article Google Scholar
Koller R, Rangaswami R (2010) I/O deduplication: utilizing content similarity to improve I/O performance. ACM Trans Storage 6:13:1-13:26
Article Google Scholar
Lee C, et al (2017) Understanding storage traffic characteristics on enterprise virtual desktop infrastructure. In: Chen D, Desnoyers P, de Lara E (eds.) Proceedings of the 10th ACM international systems and storage conference, ACM, SYSTOR 2017, Haifa, Israel, May 22-24, 2017, pp 13:1–13:11
Yadgar G, Gabel M, Jaffer S, Schroeder B (2021) SSD-based workload characteristics and their performance implications. ACM Trans Storage (TOS) 17:1–26
Article Google Scholar
Jiang S, Ding X, Chen F, Tan E, Zhang X (2005) DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality. In: Bertino E (ed.) Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, USENIX, p 8
Megiddo N, Modha DS (2003) ARC: a self-tuning low overhead replacement cache. In: Chase J (ed.) 2nd USENIX conference on file and storage technologies (FAST 03), USENIX, pp 115–130
Choi H, Park S (2022) Learning future reference patterns for efficient cache replacement decisions. IEEE Access 10:25922–25934
Article Google Scholar
Wu N, Li P (2020) Phoebe: reuse-aware online caching with reinforcement learning for emerging storage models. CoRR. arXiv:2011.07160
Liu W, Cui J, Liu J, Yang LT (2020) MLCache: a space-efficient cache scheme based on reuse distance and machine learning for NVMe SSDs. In: Mechbal N (ed.) Proceedings of the 39th international conference on computer-aided design, Computing machinery, pp 1–9
Song Z, Berger DS, Li K, Lloyd W (2020) Learning relaxed belady for content distribution network caching. In: Bhagwan R, Porter G (eds) 17th USENIX symposium on networked systems design and implementation, USENIX Association, NSDI 2020, Santa Clara, CA, USA, February 25-27, 2020, pp 529–544
Yang J, Mao Z, Yue Y, Rashmi KV (2023) GL-Cache: group-level learning for efficient and high-performance caching. In: Goel A, Naor D (eds.) 21st USENIX conference on file and storage technologies, USENIX Association, FAST 2023, Santa Clara, CA, USA, February 21–23, 2023, pp 115–134
Wang H, Yi X, Huang P, Cheng B, Zhou K (2018) Efficient SSD caching by avoiding unnecessary writes using machine learning. In: Malony AD (ed.) Proceedings of the 47th international conference on parallel processing, ICPP 2018, ACM, Eugene, OR, USA, August 13–16, 2018, pp 82:1–82:10
Wu C, Li I, Chen J (2021) A supervised-learning-based garbage collection in solid-state drives (SSDs). IT Prof 23:39–45
Article Google Scholar
Zhang Y, et al (2020) A Machine Learning Based Write Policy for SSD Cache in Cloud Block Storage. In: Natale GD (ed.) 2020 Design, automation & test in Europe conference & exhibition, IEEE, DATE 2020, Grenoble, France, March 9-13, 2020, pp 1279–1282
I T, Lokhandwala M, Hu Y, Tseng H (2018) Pensieve: a machine learning assisted SSD layer for extending the lifetime. In: Khan O (ed.) 36th IEEE international conference on computer design, IEEE Computer Society, ICCD 2018, Orlando, FL, USA, October 7–10, 2018, pp 35–42
Zhu B, et al (2013) Proactive drive failure prediction for large scale storage systems. In: Amer A, Wong TM (eds.) IEEE 29th symposium on mass storage systems and technologies, MSST 2013, IEEE Computer Society, May 6-10, 2013, Long Beach, CA, USA, pp 1–5
Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: Perner P (ed.) Machine learning and data mining in pattern recognition - 8th international conference, MLDM 2012, Berlin, Germany, July 13-20, 2012. Proceedings, Vol. 7376 of Lecture Notes in Computer Science, Springer, pp 154–168
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Krishnapuram B, et al. (eds.) Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, San Francisco, CA, USA, August 13–17, 2016, pp 785–794
Jiang X et al (2022) An imbalanced multifault diagnosis method based on bias weights AdaBoost. IEEE Trans Instrum Meas 71:1–8
Google Scholar
Bjørling M, Gonzalez J, Bonnet P (2017) LightNVM: the linux open-channel SSD subsystem. In: Kuenning G, Waldspurger CA (eds.) 15th USENIX Conference on File and Storage Technologies, USENIX Association, FAST 2017, Santa Clara, CA, USA, February 27–March 2, 2017, pp 359–374
Liang S, Wang Y, Liu C, Li H, Li X (2019) InS-DLA: an In-SSD deep learning accelerator for near-data processing. In: Sourdis I, et al. (eds.) 29th international conference on field programmable logic and applications, IEEE, FPL 2019, Barcelona, Spain, September 8–12, 2019, pp 173–179
Liu R, et al (2020) SSDKeeper: self-adapting channel allocation to improve the performance of ssd devices. In: Yang Y (ed.) 2020 IEEE international parallel and distributed processing symposium (IPDPS), IEEE, New Orleans, LA, USA, May 18–22, 2020, pp 966–975
Kim K, Lee E, Kim T (2019) HMB-SSD: framework for efficient exploiting of the host memory buffer in the NVMe SSD. IEEE Access 7:150403–150411
Article Google Scholar
Kim K, Kim S, Kim T (2020) HMB-I/O: fast track for handling urgent I/Os in nonvolatile memory express solid-state drives. Appl Sci 10:4341
Article Google Scholar
McKinney W et al (2011) pandas: a foundational python library for data analysis and statistics. Python High Perform Sci Comput 14:1–9
Google Scholar

Download references

Acknowledgements

The work reported in this paper was conducted during the sabbatical year of Kwangwoon University in 2022. This research was also supported by the KIAT (Korea Institute for Advancement of Technology) grant funded by the Korea Government (MOTIE: Ministry of Trade Industry and Energy). (P0017124, HRD Program for Industrial Innovation).

Author information

Authors and Affiliations

School of Computer Information and Engineering, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul, 01897, South Korea
Hyejin Cha & Taeseok Kim
School of Computing, University of Georgia, 802 Boyd Graduate Studies Research Center, Athens, GA, 30602, USA
In Kee Kim

Authors

Hyejin Cha
View author publications
You can also search for this author in PubMed Google Scholar
In Kee Kim
View author publications
You can also search for this author in PubMed Google Scholar
Taeseok Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H. C. performed all experiments and prepared figures. T. K. proposed a main idea and wrote the main manuscript. I. K. proofread the manuscript and redrew all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Taeseok Kim.

Ethics declarations

Conflict of interest

The authors declare no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cha, H., Kim, I.K. & Kim, T. Using a random forest to predict quantized reuse distance in an SSD write buffer. Computing 106, 3967–3986 (2024). https://doi.org/10.1007/s00607-024-01343-5

Download citation

Received: 15 August 2023
Accepted: 26 August 2024
Published: 05 September 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s00607-024-01343-5

Keywords

Mathematics Subject Classification

68N25

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using a random forest to predict quantized reuse distance in an SSD write buffer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning I/O Access Patterns to Improve Prefetching in SSDs

A Hierarchical Modeling Approach for Assessing the Reliability and Performability of Burst Buffers

Prober: exploiting sequential characteristics in buffer for improving SSDs write performance

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Using a random forest to predict quantized reuse distance in an SSD write buffer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning I/O Access Patterns to Improve Prefetching in SSDs

A Hierarchical Modeling Approach for Assessing the Reliability and Performability of Burst Buffers

Prober: exploiting sequential characteristics in buffer for improving SSDs write performance

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation