OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters

Bureddy, D.; Wang, H.; Venkatesh, A.; Potluri, S.; Panda, D. K.

doi:10.1007/978-3-642-33518-1_16

D. Bureddy¹⁹,
H. Wang¹⁹,
A. Venkatesh¹⁹,
S. Potluri¹⁹ &
…
D. K. Panda¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7490))

Included in the following conference series:

European MPI Users' Group Meeting

1675 Accesses
33 Citations

Abstract

General-Purpose Graphics Processing Units (GPGPUs) are becoming a common component of modern supercomputing systems. Many MPI applications are being modified to take advantage of the superior compute potential offered by GPUs. To facilitate this process, many MPI libraries are being extended to support MPI communication from GPU device memory. However, there is lack of a standardized benchmark suite that helps users evaluate common communication models on GPU clusters and do a fair comparison for different MPI libraries. In this paper, we extend the widely used OSU Micro-Benchmarks (OMB) suite with benchmarks that evaluate performance of point-point, multi-pair and collective MPI communication for different GPU cluster configurations. Benefits of the proposed benchmarks for MVAPICH2 and OpenMPI libraries are illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences

Evolving the HPL benchmark towards multi-GPGPU clusters

Article 26 October 2022

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models

References

Intel MPI Benchmark, http://www.intel.com/cd/software/products/
Jacket GBENCH, http://www.accelereyes.com/gbench
NAS Parallel Benchmarks, http://www.nas.nasa.gov
Che, S., Sheaffer, J.W., Boyer, M., Szafaryn, L.G., Wang, L., Skadron, K.: A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IISWC 2009 (2009)
Google Scholar
Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The Scalable HeterOgeneous Computing (SHOC) Benchmark Suite. In: Proceedings of the 3rd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2010 (2010)
Google Scholar
Ji, F., Aji, A.M., Dinan, J., Buntinas, D., Balaji, P., Feng, W., Ma, X.: Efficient Intranode Communication in GPU-Accelerated Systems. In: Proceedings of AsHES, in conjunction with IPDPS 2012 (2012)
Google Scholar
Argonne National Laboratory: MPICH2: High-performance and Widely Portable MPI, http://www.mcs.anl.gov/research/projects/mpich2/
Network-Based Computing Laboratory: MVAPICH: MPI over InfiniBand and 10GigE/iWARP, http://mvapich.cse.ohio-state.edu/
Open MPI: Open Source High Performance Computing, http://www.open-mpi.org
OSU Microbenchmarks, http://mvapich.cse.ohio-state.edu/benchmarks/
Parboil Benchmarks, http://impact.crhc.illinois.edu/parboil.aspx
Portable Hardware Locality (hwloc), http://www.open-mpi.org/projects/hwloc/
Potluri, S., Wang, H., Bureddy, D., Singh, A.K., Rosales, C., Panda, D.K.: Optimizaing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication. In: Proceedings of the AsHES, in conjunction with IPDPS 2012 (2012)
Google Scholar
Singh, A.K., Potluri, S., Wang, H., Kandalla, K., Sur, S., Panda, D.K.: MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits. In: Proceedings of the Workshop on Parallel Programming on Accelerator Clusters (PPAC), in conjunction with Cluster 2011 (2011)
Google Scholar
Spafford, K., Meredith, J.S., Vetter, J.S.: Quantifying NUMA and Contention Effects in Multi-GPU systems. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2011 (2011)
Google Scholar
SPEC MPI 2007, http://www.spec.org/mpi/
Wang, H., Potluri, S., Luo, M., Singh, A.K., Ouyang, X., Sur, S., Panda, D.K.: Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2. In: Proceedings of Cluster 2011 (2011)
Google Scholar
Wang, H., Potluri, S., Luo, M., Singh, A.K., Sur, S., Panda, D.K.: MVAPICH2-GPU: Optimized GPU to GPU Communication for InfiniBand Clusters. In: Proceedings of the 2011 International Supercomputing Conference, ISC 2011 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Ohio State University, USA
D. Bureddy, H. Wang, A. Venkatesh, S. Potluri & D. K. Panda

Authors

D. Bureddy
View author publications
You can also search for this author in PubMed Google Scholar
H. Wang
View author publications
You can also search for this author in PubMed Google Scholar
A. Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar
S. Potluri
View author publications
You can also search for this author in PubMed Google Scholar
D. K. Panda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Institute of Information Systems, Research Group Parallel Computing, Vienna University of Technology / TU Wien, Favoritenstrasse 16, 1040, Vienna / Wien, Austria
Jesper Larsson Träff
Faculty of Computer Science, Research Group Scientific Computing, University of Vienna, Währinger Str. 29/6.21, 1090, Vienna / Wien, Austria
Siegfried Benkner
University of Tennessee, 37996, Knoxville, TN, USA
Jack J. Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bureddy, D., Wang, H., Venkatesh, A., Potluri, S., Panda, D.K. (2012). OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-33518-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33517-4
Online ISBN: 978-3-642-33518-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences

Evolving the HPL benchmark towards multi-GPGPU clusters

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences

Evolving the HPL benchmark towards multi-GPGPU clusters

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation