iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/978-981-97-0862-8_9
DeletePop: A DLT Execution Time Predictor Based on Comprehensive Modeling | SpringerLink
Skip to main content

DeletePop: A DLT Execution Time Predictor Based on Comprehensive Modeling

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14493))

  • 257 Accesses

Abstract

The modeling and simulation of Deep Learning Training (DLT) are challenging problems. Due to the intricate parallel patterns, existing modelings and simulations do not consider enough factors that influence the training, which brings inaccuracy for the prediction of DLT time. To address these rising challenges, we propose DeletePop, a Deep Learning Training Execution time Predictor based on comprehensive modeling at the Operator level. It systematically abstracts the process of DLT by dividing it into computation, memory access, and communication three parts. DeletePop could predict the Job Execution Time (JET) according to the operator dataset obtained from the homogeneous network. Finally, we integrate the DeletePop into a Job Scheduling Simulator (JSS) DLTSim to make support more efficient scheduling. Although the implementation of DeletePop is based on the TensorFlow framework, the theoretical model could adapt to any other frameworks that use static graphs. DeletePop achieves up to 90% accuracy for Homogeneous Networks, and we also provide the theoretical manners to add support for Heterogeneous Networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aida, K.: Effect of job size characteristics on job scheduling performance. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2000. LNCS, vol. 1911, pp. 1–17. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-39997-6_1

    Chapter  Google Scholar 

  2. Arafa, Y., et al.: Hybrid, scalable, trace-driven performance modeling of GPGPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, pp. 1–15. Association for Computing Machinery. https://doi.org/10.1145/3458817.3476221

  3. Bai, Y., et al.: Gradient compression supercharged high-performance data parallel DNN training. In: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles CD-ROM, pp. 359–375. ACM (2021). https://doi.org/10.1145/3477132.3483553

  4. Dadu, V., Nowatzki, T.: TaskStream: accelerating task-parallel workloads by recovering program structure. In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 1–13. ACM (2022). https://doi.org/10.1145/3503222.3507706

  5. Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, vol. 25. Curran Associates Inc. (2012)

    Google Scholar 

  6. Gautam, J.V., Prajapati, H.B., Dabhi, V.K., Chaudhary, S.: A survey on job scheduling algorithms in big data processing. In: 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–11 (2015). https://doi.org/10.1109/ICECCT.2015.7226035

  7. Goldsborough, P.: A Tour of TensorFlow (2016)

    Google Scholar 

  8. Hindman, B., et al.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI 2011, pp. 295–308. USENIX Association (2011)

    Google Scholar 

  9. Javed, M.H., Ibrahim, K.Z., Lu, X.: Performance analysis of deep learning workloads using roofline trajectories. CCF Trans. High Perform. Comput. 1(3), 224–239 (2019). https://doi.org/10.1007/s42514-019-00018-4

    Article  Google Scholar 

  10. Jia, Z., Zaharia, M., Aiken, A.: Beyond data and model parallelism for deep neural networks, vol. 1, pp. 1–13 (2019)

    Google Scholar 

  11. Kaufman, S., et al.: A learned performance model for tensor processing units 3, 387–400

    Google Scholar 

  12. Kim, Y., Choi, H., Lee, J., Kim, J.-S., Jei, H., Roh, H.: Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster. Cluster Comput. 23(3), 2287–2300 (2020). https://doi.org/10.1007/s10586-020-03144-9

    Article  Google Scholar 

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25. Curran Associates Inc. (2012)

    Google Scholar 

  14. Kwon, W., Yu, G.I., Jeong, E., Chun, B.G.: Nimble: lightweight and parallel GPU task scheduling for deep learning, vol. 33, pp. 8343–8354 (2020)

    Google Scholar 

  15. Li, A., et al.: Evaluating modern GPU interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Trans. Parallel Distrib. Syst. 31(1), 94–110 (2020). https://doi.org/10.1109/TPDS.2019.2928289

    Article  Google Scholar 

  16. Li, M.: Scaling distributed machine learning with the parameter server. In: Proceedings of the 2014 International Conference on Big Data Science and Computing, BigDataScience 2014, p. 1. Association for Computing Machinery (2014). https://doi.org/10.1145/2640087.2644155

  17. Narayanan, D., et al.: PipeDream: generalized pipeline parallelism for DNN training. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, pp. 1–15. Association for Computing Machinery (2019). https://doi.org/10.1145/3341301.3359646

  18. Ouyang, S., Dong, D., Xu, Y., Xiao, L.: Communication optimization strategies for distributed deep neural network training: a survey. J. Parallel Distrib. Comput. 149, 52–65 (2021). https://doi.org/10.1016/j.jpdc.2020.11.005

    Article  Google Scholar 

  19. Park, J.H., et al.: HetPipe: enabling large DNN training on (Whimpy) heterogeneous GPU clusters through integration of pipelined model parallelism and data parallelism. In: Proceedings of the 2020 USENIX Conference on USENIX Annual Technical Conference, vol. 21, pp. 307–321. USENIX Association (2020)

    Google Scholar 

  20. Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009). https://doi.org/10.1016/j.jpdc.2008.09.002

    Article  Google Scholar 

  21. Robson, E., Xu, C., Wills, L.W.: ProSE: the architecture and design of a protein discovery engine. In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 655–668. ACM (2022). https://doi.org/10.1145/3503222.3507722

  22. Sanders, P., Mehlhorn, K., Dietzfelbinger, M., Dementiev, R.: Sequential and Parallel Algorithms and Data Structures: The Basic Toolbox. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25209-0

  23. Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., Wilkes, J.: Omega: flexible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys 2013, pp. 351–364. Association for Computing Machinery (2013). https://doi.org/10.1145/2465351.2465386

  24. Simakov, N.A., et al.: Slurm simulator: improving slurm scheduler performance on large HPC systems by utilization of multiple controllers and node sharing. In: Proceedings of the Practice and Experience on Advanced Research Computing, PEARC 2018, pp. 1–8. Association for Computing Machinery (2018). https://doi.org/10.1145/3219104.3219111

  25. Song, L., Mao, J., Zhuo, Y., Qian, X., Li, H., Chen, Y.: HyPar: towards hybrid parallelism for deep learning accelerator array. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 56–68. IEEE (2019). https://doi.org/10.1109/HPCA.2019.00027

  26. Vavilapalli, V.K., et al.: Apache hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC 2013, pp. 1–16. Association for Computing Machinery (2013). https://doi.org/10.1145/2523616.2523633

  27. Verma, A., Dahiya, P.K.: PCIe bus: a state-of-the-art-review. IOSR J. VLSI Sig. Process. 7(4), 24–28 (2017). https://doi.org/10.9790/4200-0704012428

    Article  Google Scholar 

  28. Wette, P., Schwabe, A., Splietker, M., Karl, H.: Extending Hadoop’s yarn scheduler load simulator with a highly realistic network & traffic model. In: Proceedings of the 2015 1st IEEE Conference on Network Softwarization (NetSoft), pp. 1–2 (2015). https://doi.org/10.1109/NETSOFT.2015.7116169

  29. Yang, K., Cao, R., Zhou, Y., Zhang, J., Shao, E., Tan, G.: Deep reinforcement agent for failure-aware job scheduling in high-performance computing. In: 2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS), pp. 442–449 (2021). https://doi.org/10.1109/ICPADS53394.2021.00061

  30. Yang, X., et al.: Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 1–11 (2013). https://doi.org/10.1145/2503210.2503264

  31. Yang, Y., et al.: INFless: a native serverless system for low-latency, high-throughput inference. In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 768–781. ACM (2022). https://doi.org/10.1145/3503222.3507709

  32. Zhang, P., Fang, J., Yang, C., Huang, C., Tang, T., Wang, Z.: Optimizing streaming parallelism on heterogeneous many-core architectures. IEEE Trans. Parallel Distrib. Syst. 31(8), 1878–1896 (2020). https://doi.org/10.1109/TPDS.2020.2978045

    Article  Google Scholar 

  33. Zheng, Z., et al.: AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures. In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 359–373. ACM (2022). https://doi.org/10.1145/3503222.3507723

Download references

Acknowledgment

This work was sponsored in part by NKRDP (2021YFB0300800), and in part by NSFC (62102396), Beijing Nova Program (Z211100002121143, 20220484217), Youth Innovation Promotion Association of Chinese Academy of Sciences (2021099). Pilot for Major Scientific Research Facility of Jiangsu Province of China (NO. BM2021800).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to En Shao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, Y., Zhou, Y., Shao, E., Tan, G., Sun, N. (2024). DeletePop: A DLT Execution Time Predictor Based on Comprehensive Modeling. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14493. Springer, Singapore. https://doi.org/10.1007/978-981-97-0862-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0862-8_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0861-1

  • Online ISBN: 978-981-97-0862-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics