Abstract
Nearly all existing HPC systems are operated by resource management systems based on the queuing approach. With the increasing acceptance of grid middleware like Globus, new requirements for the underlying local resource management systems arise. Features like advanced reservation or quality of service are needed to implement high level functions like co-allocation. However it is difficult to realize these features with a resource management system based on the queuing concept since it considers only the present resource usage.
In this paper we present an approach which closes this gap. By assigning start times to each resource request, a complete schedule is planned. Advanced reservations are now easily possible. Based on this planning approach functions like diffuse requests, automatic duration extension, or service level agreements are described. We think they are useful to increase the usability, acceptance and performance of HPC machines. In the second part of this paper we present a planning based resource management system which already covers some of the mentioned features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brune, M., Gehring, J., Keller, A., Reinefeld, A.: RSD - Resource and Service Description. In: Proc. of 12th Intl. Symp. on High-Performance Computing Systems and Applications (HPCS 1998), pp. 193–206. Kluwer Academic Press, Dordrecht (1998)
Brune, M., Gehring, J., Keller, A., Reinefeld, A.: Managing Clusters of Geographically Distributed High-Performance Computers. Concurrency - Practice and Experience 11(15), 887–911 (1999)
Brune, M., Reinefeld, A., Varnholt, J.: A Resource Description Environment for Distributed Computing Systems. In: Proceedings of the 8th International Symposium High-Performance Distributed Computing HPDC 1999, Redondo Beach. LNCS, pp. 279–286. IEEE Computer Society, Los Alamitos (1999)
Cjajkowski, K., Foster, I., Kesselman, C., Sander, V., Tuecke, S.: SNAP: A Protocol for Negotiation of Service Level Agreements and Coordinated Resource Management in Distributed Systems. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 153–183. Springer, Heidelberg (2002)
Direct Access Transport (DAT) Specification (April 2003), http://www.datcollaborative.org
Ernemann, C., Hamscher, V., Streit, A., Yahyapour, R.: Enhanced Algorithms for Multi-Site Scheduling. In: Parashar, M. (ed.) GRID 2002. LNCS, vol. 2536, pp. 219–231. Springer, Heidelberg (2002)
Feitelson, D.G., Jette, M.A.: Improved Utilization and Responsiveness with Gang Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 238–262. Springer, Heidelberg (1997)
Feitelson, D.G., Rudolph, L.: Towards Convergence in Job Schedulers for Parallel Supercomputers. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 1–26. Springer, Heidelberg (1996)
Feitelson, D.G., Rudolph, L.: Metrics and Benchmarking for Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998)
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C.: Theory and Practice in Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997)
Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Foster, I., Kesselman, C., Lee, C., Lindell, R., Nahrstedt, K., Roy, A.: A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation. In: Proceedings of the International Workshop on Quality of Service (1999)
GGF Grid Scheduling Dictionary Working Group. Grid Scheduling Dictionary of Terms and Keywords (April 2003), http://www.fz-juelich.de/zam/RD/coop/ggf/sd-wg.html
Hungershöfer, J., Wierum, J.-M., Gänser, H.-P.: Resource Management for Finite Element Codes on Shared Memory Systems. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds.) ICCSA 2003. LNCS, vol. 2667, pp. 927–936. Springer, Heidelberg (2003)
Jackson, D., Snell, Q., Clement, M.: Core Algorithms of the Maui Scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 87–103. Springer, Heidelberg (2001)
Keller, A., Reinefeld, A.: Anatomy of a Resource Management System for HPC Clusters. In: Keller, A., Reinefeld, A. (eds.) Annual Review of Scalable Computing, vol. 3, pp. 1–31. Singapore University Press (2001)
Kishimoto, H., Savva, A., Snelling, D.: OGSA Fundamental Services: Requirements for Commercial GRID Systems. Technical report, Open Grid Services Architecture Working Group (OGSA WG) (April 2003), http://www.gridforiam.org/Dociaments/Drafts/default_b.htm
Lifka, D.A.: The ANL/IBM SP Scheduling System. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)
Litzkow, M., Livny, M., Mutka, M.: Condor - A Hunter of Idle Workstations. In: Proceedings of the 8th International Conference on Distributed Computing Systems (ICDCS 1988), pp. 104–111. IEEE Computer Society Press, Los Alamitos (1988)
MacLaren, J., Sander, V., Ziegler, W.: Advanced Reservations - State of the Art. Technical report, Grid Resource Allocation Agreement Protocol Working Group, Global Grid Forum (April 2003), http://www.fz-juelich.de/zam/RD/coop/ggf/graap/sched-graap-2.0.html
Mu’alem, A., Feitelson, D.G.: Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling. IEEE Trans. Parallel & Distributed Systems 12(6), 529–543 (2001)
Sahai, A., Durante, A., Machiraju, V.: Towards Automated SLA Management for Web Services. HPL-2001-310 (R.l), Hewlett-Packard Company, Software Technology Laboratory, HP Laboratories Palo Alto (2000), http://www.hpl.hp.com/techreports/2001/HPL-2001-310R1.html
Sahai, A., Durante, A., Machiraju, V., Sayal, M., Jin, L., Casati, F.: Towards Automated SLA Management for Web Services Monitoring for Web Services. In: Feridun, M., Kropf, P.G., Babin, G. (eds.) DSOM 2002. LNCS, vol. 2506, pp. 28–41. Springer, Heidelberg (2002)
Scali MPI ConnectTM (April 2003), http://www.scali.com
Smarr, L., Catlett, C.E.: Metacomputing. Communications of the ACM 35(6), 44–52 (1992)
Smith, W., Foster, I., Taylor, V.: Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 202–219. Springer, Heidelberg (1999)
Streit, A.: A Self-Tuning Job Scheduler Family with Dynamic Policy Switching. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 1–23. Springer, Heidelberg (2002)
Talby, D., Feitelson, D.G.: Supporting Priorities and Improving Utilization of the IBM SP2 Scheduler Using Slack-Based Backfilling. In: 13th Intl. Parallel Processing Symp., April 1999, pp. 513–517 (1999)
Verma, D.: Supporting Service Level Agreements on an IP Network, August 1999. Macmillan Technology Series. Macmillan Technical Publishing, Basingstoke (1999)
Windisch, K., Lo, V., Moore, R., Feitelson, D., Nitzberg, B.: A Comparison of Workload Traces from Two Production Parallel Machines. In: 6th Symposium Frontiers Massively Parallel Computing, pp. 319–326 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hovestadt, M., Kao, O., Keller, A., Streit, A. (2003). Scheduling in HPC Resource Management Systems: Queuing vs. Planning. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2003. Lecture Notes in Computer Science, vol 2862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10968987_1
Download citation
DOI: https://doi.org/10.1007/10968987_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20405-3
Online ISBN: 978-3-540-39727-4
eBook Packages: Springer Book Archive