Abstract
The large choice of Distributed Computing Infrastructures (DCIs) available allows users to select and combine their preferred architectures amongst Clusters, Grids, Clouds, Desktop Grids and more. In these hybrid DCIs, elasticity is emerging as a key property. In elastic infrastructures, resources available to execute application continuously vary, either because of application requirements or because of constraints on the infrastructure, such as node volatility.
In the former case, there is no guarantee that the computing resources will remain available during the entire execution of an application. In this paper, we show that Bag-of-Tasks (BoT) execution on these “Best-Effort” infrastructures suffer from a drop of the task completion rate at the end of the execution.
The SpeQuloS service presented in this paper improves the Quality of Service (QoS) of BoT applications executed on hybrid and elastic infrastructures. SpeQuloS monitors the execution of the BoT, and dynamically supplies fast and reliable Cloud resources when the critical part of the BoT is executed. SpeQuloS offers several features to hybrid DCIs users, such as estimating completion time and execution speedup. Performance evaluation shows that BoT executions can be accelerated by a factor 2, while offloading less than 2.5 % of the workload to the Cloud.
We report on several scenarios where SpeQuloS is deployed on hybrid infrastructures featuring a large variety of infrastructures combinations. In the context of the European Desktop Grid Initiative (EDGI), SpeQuloS is operated to improve QoS of Desktop Grids using resources from private Clouds. We present a use case where SpeQuloS uses both EC2 regular and spot instances to decrease the cost of computation while preserving a similar QoS level. Finally, in the last scenario SpeQuloS allows to optimize Grid5000 resources utilization.
Similar content being viewed by others
References
Agmon Ben-Yehuda, O., Schuster, A., Sharov, A., Silberstein, M., Iosup, A.: ExPERT: Pareto-efficient task replication on grids and clouds. Technical Report CS-2011-03, Technion (2011)
Amazon Web Services: An introduction to spot instances. Technical Report, Amazon Elastic Compute Cloud (2009)
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using Mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10 (2010)
Anderson, D.: BOINC: a system for public-resource computing and storage. In: Proceedings of the 5th IEEE/ACM International GRID Workshop, Pittsburgh, USA (2004)
Andrade, N., Brasileiro, F., Cirne, W., Mowbray, M.: Automatic grid assembly by promoting collaboration in peer-to-peer grids. J. Parallel Distrib. Comput. 67(8), 957–966 (2007)
Andrade, N., Cirne, W., Brasileiro, F., Roisenberg, P.: OurGrid: an approach to easily assemble grids with equitable resource sharing. In: Proceedings of the 9th Workshop on Job Scheduling Strategies for Parallel Processing (2003)
Anglano, C., Brevik, J., Canonico, M., Nurmi, D., Wolski, R.: Fault-aware scheduling for bag-of-tasks applications on desktop grids. In: Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, GRID ’06 (2006)
Bolze, R., et al.: Grid5000: a large scale highly reconfigurable experimental grid testbed. Int. J. High Perform. Comput. Appl. 20(4), 481–494 (2006)
Brasileiro, F., Duarte, A., Carvalho, D., Barber, R., Scardaci, D.: An approach for the co-existence of service and opportunistic grids: the EELA-2 case. In: Latin-American Grid Workshop (2008)
Calheiros, R.N., Vecchiola, C., Karunamoorthy, D., Buyya, R.: The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid clouds. Future Gener. Comput. Syst. 28(6), 861–870 (2011)
Capit, N., Da Costa, G., Georgiou, Y., Huard, G., Martin, C., Mounie, G., Neyron, P., Richard, O.: A batch scheduler with high level components. In: Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid’05), Washington, DC, USA (2005)
Delamare, S., Fedak, G., Kondo, D., Lodygensky, O.: SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures. In: Proceedings of the 21st ACM International Symposium on High Performance Distributed Computing (HPDC’12), Delft, The Netherlands, pp. 173–186 (2012)
Dong, F., Akl, S.G.: Scheduling algorithms for grid computing: State of the art and open problems. Technical Report, Queen’s University Kingston (2006)
European desktop grid infrastructure (2010). http://edgi-project.eu/
Estrada, T., Reed, K., Taufer, M.: Modeling job lifespan delays in volunteer computing projects. In: 9th IEEE International Symposium on Cluster Computing and Grid (CCGrid) (2009)
Fedak, G., Germain, C., Neri, V., Cappello, F.: XtremWeb: a generic global computing platform. In: CCGRID’2001 Special Session Global Computing on Personal Devices (2001)
Fishelson, M., Geiger, D.: Exact genetic linkage computations for general pedigrees. Bioinformatics 18(Suppl 1), S189–S198 (2002)
Heien, E., Kondo, D., David, A.: Correlated resource models of Internet end hosts. In: 31st International Conference on Distributed Computing Systems (ICDCS), Minneapolis, Minnesota, USA (2011)
Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.H.: The grid workloads archive. Future Gener. Comput. Syst. 24(7), 672–686 (2008)
Iosup, A., Sonmez, O., Anoep, S., Epema, D.: The performance of bags-of-tasks in large-scale distributed systems. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing, HPDC ’08 (2008)
Islam, M., Balaji, P., Sadayappan, P., Panda, D.: QoPS: a QoS based scheme for parallel job scheduling. In: Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science. Springer, Berlin (2003)
Javadi, B., Kondo, D., Vincent, J., Anderson, D.: Mining for statistical availability models in large-scale distributed systems: an empirical study of SETI@home. In: 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2009)
Kondo, D., Chien, A., Casanova, H.: Resource management for rapid application turnaround on enterprise desktop grids. In: ACM Conference on High Performance Computing and Networking, SC 2004, USA (2004)
Kondo, D., Javadi, B., Iosup, A., Epema, D.: The Failure Trace Archive: enabling comparative analysis of failures in diverse distributed systems. In: 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2010)
Kondo, D., Javadi, B., Malecot, P., Cappello, F., Anderson, D.: Cost-benefit analysis of cloud computing versus desktop grids. In: 18th International Heterogeneity in Computing Workshop (2009)
Litzkow, M., Livny, M., Mutka, M.: Condor—a hunter of idle workstations. In: Proceedings of the 8th International Conference of Distributed Computing Systems (ICDCS) (1988)
Mao, M., Humphrey, M.: Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11. ACM, New York (2011)
Marosi, A.C., Kacsuk, P.: Workers in the clouds. In: Euromicro Conference on Parallel, Distributed, and Network-Based Processing (2011)
Marshall, P., Keahey, K., Freeman, T.: Elastic site: using clouds to elastically extend site resources. In: Proceedings of CCGrid’2010, Melbourne, Australia (2010)
Marshall, P., Keahey, K., Freeman, T.: Improving utilization of infrastructure clouds. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011) (2011)
Minh, T.N., Wolters, L.: Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact. In: High-Performance Parallel and Distributed Computing (2011)
Nurmi, D.C., Brevik, J., Wolski, R.: QBETS: queue bounds estimation from time series. In: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’07 (2007)
Oprescu, A.M., Kielmann, T.: Bag-of-tasks scheduling under budget constraints. In: CloudCom (2010)
Palankar, M.R., Iamnitchi, A., Ripeanu, M., Garfinkel, S.: Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, DADC ’08 (2008)
Rood, B., Lewis, M.J.: Multi-state grid resource availability characterization. In: 8th Grid Computing Conference (2007)
Silberstein, M., Sharov, A., Geiger, D., Schuster, A.: GridBot: execution of bags of tasks in multiple grids. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09 (2009)
Urbah, E., Kacsuk, P., Farkas, Z., Fedak, G., Kecskemeti, G., Lodygensky, O., Marosi, A., Balaton, Z., Caillat, G., Gombas, G., Kornafeld, A., Kovacs, J., He, H., Lovas, R.: EDGeS: bridging EGEE to BOINC and XtremWeb. J. Grid Comput. 7, 335–354 (2009)
Vázquez, C., Huedo, E., Montero, R.S., Llorente, I.M.: On the use of clouds for grid resource provisioning. Future Gener. Comput. Syst. 27(5), 600–605 (2011)
Weng, C., Lu, X.: Heuristic scheduling for bag-of-tasks applications in combination with QoS in the computational grid. Future Gener. Comput. Syst. 21(2), 271–280 (2005)
Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: OSDI’08 (2008)
Acknowledgements
Authors would like to thank Peter Kacsuk, Jozsef Kovacs, Michela Taufer, Trilce Estrada and Kate Keahey for their insightful comments and suggestions throughout our research and development of SpeQuloS.
Some of the experiments presented in this paper were carried out using the Grid5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies.
This work was funded by the EDGI project, supported by the European Commission FP7 Capacities Programme under grant agreement RI-261556.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Delamare, S., Fedak, G., Kondo, D. et al. SpeQuloS: a QoS service for hybrid and elastic computing infrastructures. Cluster Comput 17, 79–100 (2014). https://doi.org/10.1007/s10586-013-0283-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-013-0283-6