Abstract
Nowadays, Large-Scale Distributed Computing Systems has become crucial for storing, processing, and analyzing massive datasets. Apache Spark endorses a general and efficient programming model for large-scale data processing called Resilient Distributed Dataset (RDD). However, the incidence of stragglers is one of the major issues with the Spark cluster. It results in performance deterioration because a task on a system takes abnormal time to finish execution. In this paper, a straggler identification model for distributed environments using machine learning is proposed. This model employs a several spark parameters extracted by the execution of various types and large scale jobs on to assist in identifying the stragglers. In addition, the proposed model applies machine learning approaches to Spark log to learn various kinds of job execution features. The performance of the introduced model is evaluated across various real-world benchmark datasets using default apache spark across diverse CPU, I/O, and mixed workloads. Furthermore, we have empirically shown that Logistic Regression outperforms and can achieve average accuracy of 90% for straggler identification with comparison to other competitive models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cardellini, V., Lo Presti, F., Nardelli, M., Russo Russo, G.: Run-time adaptation of data stream processing systems: the state of the art. ACM Comp. Surv. (CSUR) (2022)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10). (2010)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Stoica, I.: Resilient distributed datasets: a {Fault-Tolerant} abstraction for {In-Memory} cluster computing. In: 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pp. 15–28. (2012)
Lu, S., Wei, X., Rao, B., Tak, B., Wang, L., Wang, L.: LADRA: log-based abnormal task detection and root-cause analysis in big data processing with Spark. Futur. Gener. Comput. Syst. 95, 392–403 (2019)
Gill, S.S., Ouyang, X., Garraghan, P.: Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centres. J. Supercomput. 76(12), 10050–10089 (2020). https://doi.org/10.1007/s11227-020-03241-x
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Said, S.A., El-Sayed, M.S., Salem, S.A., Habashy, S.M.: A speculative execution framework for big data processing systems. In: 2021 International Conference on Information Technology (ICIT), pp. 616–621. IEEE. (2021)
Xu, H., Lau, W.C.: Optimization for speculative execution in big data processing clusters. IEEE Trans. Parallel Distrib. Syst. 28(2), 530–545 (2016)
Garraghan, P., Ouyang, X., Yang, R., McKee, D., Xu, J.: Straggler root-cause and impact analysis for massive-scale virtualized cloud datacenters. IEEE Trans. Serv. Comput. 12(1), 91–104 (2016)
Phan, T.D., Pallez, G., Ibrahim, S., Raghavan, P.: A new framework for evaluating straggler detection mechanisms in mapreduce. ACM Trans. Model. Perform. Eval. Comp. Syst. (TOMPECS) 4(3), 1–23 (2019)
Deshmukh, S., Thirupathi Rao, K., Shabaz, M.: Collaborative learning based straggler prevention in large-scale distributed computing framework. Sec. Commun. Netw. (2021)
Zheng, P., Lee, B.C.: Hound: Causal learning for datacenter-scale straggler diagnosis. Proc. ACM Meas. Anal. Comp. Syst. 2(1), 1–36 (2018)
Kleinbaum, D.G., Dietz, K., Gail, M., Klein, M., Klein, M.: Logistic regression, p. 536. Springer-Verlag, New York (2002)
Belgiu, M., Drăguţ, L.: Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogramm. Remote. Sens. 114, 24–31 (2016)
Huang, X., Shi, L., Suykens, J.A.: Support vector machine classifier with pinball loss. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 984–997 (2013)
Abu Alfeilat, H.A., et al.: Effects of distance measure choice on k-nearest neighbor classifier performance: a review. Big data 7(4), 221–248 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Said, S.A., Habashy, S.M., Salem, S.A., Saad, E.LS.M. (2023). A Straggler Identification Model for Large-Scale Distributed Computing Systems Using Machine Learning. In: Hassanien, A.E., Snášel, V., Tang, M., Sung, TW., Chang, KC. (eds) Proceedings of the 8th International Conference on Advanced Intelligent Systems and Informatics 2022. AISI 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 152. Springer, Cham. https://doi.org/10.1007/978-3-031-20601-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-20601-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20600-9
Online ISBN: 978-3-031-20601-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)