Abstract
It is increasingly becoming evident that operating system interference in the form of daemon activity and interrupts contribute significantly to performance degradation of parallel applications in large clusters. An earlier theoretical study has evaluated the impact of system noise on application performance for different noise distributions [1]. Our work complements the theoretical analysis by presenting an empirical study of noise in production clusters. We designed a parallel benchmark that was used on large clusters at SanDeigo Supercomputing Center for collecting noise related data. This data was fed to a simulator that predicts the performance of collective operations using the model of [1]. We report our comparison of the predicted and the observed performance. Additionally, the tools developed in the process have been instrumental in identifying anomalous nodes that could potentially be affecting application performance if undetected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agarwal, S., Garg, R., Vishnoi, N.K.: The impact of noise on the scaling of collectives: A theoretical approach. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, pp. 280–289. Springer, Heidelberg (2005)
Jones, T., Brenner, L., Fier, J.: Impacts of Operating Systems on the Scalability of Parallel Applications, Lawrence Livermore National Laboratory, Tech. Rep. UCRL-MI-202629 (March 2003)
Giosa, R., Petrini, F., Davis, K., Lebaillif-Delamare, F.: Analysis of System Overhead on Parallel Computers. In: IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) (2004)
Petrini, F., Kerbyson, D.J., Pakin, S.: The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8192 Processors of ASCI Q. In: ACM Supercomputing (2003)
Tsafrir, D., Etsion, Y., Feitelson, D.G., Kirkpatrick, S.: System Noise, OS Clock Ticks, and Fine-grained Parallel Applications. In: ICS (2005)
Moreira, J., Franke, H., Chan, W., Fong, L., Jette, M., Yoo, A.: A Gang-Scheduling System for ASCI Blue-Pacific. In: International Conference on High performance Computing and Networking (1999)
Hori, A., Tezuka, H., Ishikawa, Y.: Highly Efficient Gang Scheduling Implementations. In: ACM/IEEE Conference on Supercomputing (1998)
Frachtenberg, E., Petrini, F., Fernandez, J., Pakin, S., Coll, S.: STORM: Lightning-Fast Resource Management. In: ACM/IEEE Conference on Supercomputing (2002)
DataStar Compute Resource at SDSC, [Online] Available: http://www.sdsc.edu/user_services/datastar/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garg, R., De, P. (2006). Impact of Noise on Scaling of Collectives: An Empirical Evaluation. In: Robert, Y., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing - HiPC 2006. HiPC 2006. Lecture Notes in Computer Science, vol 4297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11945918_45
Download citation
DOI: https://doi.org/10.1007/11945918_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68039-0
Online ISBN: 978-3-540-68040-6
eBook Packages: Computer ScienceComputer Science (R0)