iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1145/2503210.2503247
There goes the neighborhood | Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis skip to main content
10.1145/2503210.2503247acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

There goes the neighborhood: performance degradation due to nearby jobs

Published: 17 November 2013 Publication History

Abstract

Predictable performance is important for understanding and alleviating application performance issues; quantifying the effects of source code, compiler, or system software changes; estimating the time required for batch jobs; and determining the allocation requests for proposals. Our experiments show that on a Cray XE system, the execution time of a communication-heavy parallel application ranges from 28% faster to 41% slower than the average observed performance. Blue Gene systems, on the other hand, demonstrate no noticeable run-to-run variability. In this paper, we focus on Cray machines and investigate potential causes for performance variability such as OS jitter, shape of the allocated partition, and interference from other jobs sharing the same network links. Reducing such variability could improve overall throughput at a computer center and save energy costs.

References

[1]
R. L. Berger, B. F. Lasinski, A. B. Langdon, T. B. Kaiser, B. B. Afeyan, B. I. Cohen, C. H. Still, and E. A. Williams. Influence of spatial and temporal laser beam smoothing on stimulated brillouin scattering in filamentary laser light. Phys. Rev. Lett., 75(6):1078--1081, Aug 1995.
[2]
C. Bernard, T. Burch, T. A. DeGrand, C. DeTar, S. Gottlieb, U. M. Heller, J. E. Hetrick, K. Orginos, B. Sugar, and D. Toussaint. Scaling tests of the improved Kogut-Susskind quark action. Physical Review D, (61), 2000.
[3]
A. D. Breslow, L. Porter, A. Tiwari, M. Laurenzano, L. Carrington, D. M. Tullsen, and A. E. Snavely. The Case For Colocation of HPC Workloads. Concurrency and Computation: Practice and Experience Preprint, 2012.
[4]
J. J. Evans, C. S. Hood, and W. D. Gropp. Exploring the Relationship Between Parallel Application Run-Time Variability and Network Performance in Clusters. In Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks, LCN '03, 2003.
[5]
J. Hensley, R. Alter, D. Duffy, M. Fahey, L. Higbie, T. Oppe, W. Ward, M. Bullock, and J. Becklehimer. Minimizing Runtime Performance Variation with Cpusets on the SGI Origin 3800. ERDC MSRC PET Preprint.
[6]
T. Hoefler, T. Schneider, and A. Lumsdaine. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), Nov. 2010.
[7]
T. Jones, S. Dawson, R. Neely, W. Tuel, L. Brenner, J. Fier, R. Blackmore, P. Caffrey, B. Maskell, P. Tomlinson, and M. Roberts. Improving the Scalability of Parallel Jobs by Adding Parallel Awareness to the Operating System. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing (SC'03), 2003.
[8]
W. T. C. Kramer and C. Ryan. Performance Variability of Highly Parallel Architectures. In Proceedings of the 2003 international conference on Computational science: PartIII, ICCS'03, 2003.
[9]
S. Langer, B. Still, T. Bremer, D. Hinkel, B. Langdon, and E. A. Williams. Cielo full-system simulations of multi-beam laser-plasma interaction in nif experiments. CUG 2011 proceedings, 2011.
[10]
F. Petrini, D. J. Kerbyson, and S. Pakin. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing (SC'03), 2003.
[11]
D. Skinner and W. Kramer. Understanding the Causes of Performance Variability in HPC Workloads. In Proceedings of the IEEE International Workload Characterization Symposium, 2005, pages 137--149, 2005.
[12]
C. H. Still, R. L. Berger, A. B. Langdon, D. E. Hinkel, L. J. Suter, and E. A. Williams. Filamentation and forward brillouin scatter of entire smoothed and aberrated laser beams. Physics of Plasmas, 7(5):2023, 2000.
[13]
T. B. Tabe, J. Hardwick, and Q. F. Stout. Statistical Analysis of Communication Time on the IBM SP2. Computing Science and Statistics, 27:347--351, 1995.
[14]
J. S. Vetter and M. O. McCracken. Statistical Scalability Analysis of Communication Operations in Distributed Applications. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 123--132, 2001.
[15]
D. Wang and K. Antypas. "Application Performance Variability On Hopper", 2012. http://www.nersc.gov/users/computational-systems/hopper/performance-and-optimization/application-performance-variability-on-hopper/.
[16]
Y. Wang, G. M. Stocks, W. A. Shelton, D. M. C. Nicholson, Z. Szotek, and W. M. Temmerman. Order-N Multiple Scattering Approach to Electronic Structure Calculations. Physical Review Letters, 75(15):2867--2870, Oct. 1995.
[17]
N. Wright, S. Smallen, C. Olschanowsky, J. Hayes, and A. Snavely. Measuring and Understanding Variation in Benchmark Performance. In DoD High Performance Computing Modernization Program Users Group Conference (HPCMP-UGC), 2009, pages 438--443, 2009.

Cited By

View all
  • (2024)A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673059(669-678)Online publication date: 12-Aug-2024
  • (2024)Evaluating Active-learning Based Performance Prediction of Parallel Applications2024 IEEE 20th International Conference on e-Science (e-Science)10.1109/e-Science62913.2024.10678665(1-10)Online publication date: 16-Sep-2024
  • (2024)Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.336546235:4(693-706)Online publication date: Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. communication performance
  2. interference
  3. resource management
  4. system noise
  5. torus networks

Qualifiers

  • Research-article

Funding Sources

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)4
Reflects downloads up to 04 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673059(669-678)Online publication date: 12-Aug-2024
  • (2024)Evaluating Active-learning Based Performance Prediction of Parallel Applications2024 IEEE 20th International Conference on e-Science (e-Science)10.1109/e-Science62913.2024.10678665(1-10)Online publication date: 16-Sep-2024
  • (2024)Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.336546235:4(693-706)Online publication date: Apr-2024
  • (2024)Performance Analysis of the NVIDIA HPC SDK and AMD AOCC Compilers in an HPC Cluster Using Pooled, Robust and Relative Metrics2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00135(726-737)Online publication date: 27-May-2024
  • (2024)Software Resource Disaggregation for HPC with Serverless Computing2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00021(139-156)Online publication date: 27-May-2024
  • (2024)Approbation of Asymptotic Method for Queue with an Unlimited Number of Servers and State-Dependent Service RateDistributed Computer and Communication Networks: Control, Computation, Communications10.1007/978-3-031-50482-2_28(361-372)Online publication date: 24-Mar-2024
  • (2023)Analysis and Characterization of Performance Variability for OpenMP RuntimeProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624239(1614-1622)Online publication date: 12-Nov-2023
  • (2023)Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top 500 ListProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624203(1298-1305)Online publication date: 12-Nov-2023
  • (2023)ZeroSum: User Space Monitoring of Resource Utilization and Contention on Heterogeneous HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624145(685-695)Online publication date: 12-Nov-2023
  • (2023)Prodigy: Towards Unsupervised Anomaly Detection in Production HPC SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607076(1-14)Online publication date: 12-Nov-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media