Abstract
General-Purpose Graphics Processing Units (GPGPUs) are becoming a common component of modern supercomputing systems. Many MPI applications are being modified to take advantage of the superior compute potential offered by GPUs. To facilitate this process, many MPI libraries are being extended to support MPI communication from GPU device memory. However, there is lack of a standardized benchmark suite that helps users evaluate common communication models on GPU clusters and do a fair comparison for different MPI libraries. In this paper, we extend the widely used OSU Micro-Benchmarks (OMB) suite with benchmarks that evaluate performance of point-point, multi-pair and collective MPI communication for different GPU cluster configurations. Benefits of the proposed benchmarks for MVAPICH2 and OpenMPI libraries are illustrated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Intel MPI Benchmark, http://www.intel.com/cd/software/products/
Jacket GBENCH, http://www.accelereyes.com/gbench
NAS Parallel Benchmarks, http://www.nas.nasa.gov
Che, S., Sheaffer, J.W., Boyer, M., Szafaryn, L.G., Wang, L., Skadron, K.: A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IISWC 2009 (2009)
Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The Scalable HeterOgeneous Computing (SHOC) Benchmark Suite. In: Proceedings of the 3rd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2010 (2010)
Ji, F., Aji, A.M., Dinan, J., Buntinas, D., Balaji, P., Feng, W., Ma, X.: Efficient Intranode Communication in GPU-Accelerated Systems. In: Proceedings of AsHES, in conjunction with IPDPS 2012 (2012)
Argonne National Laboratory: MPICH2: High-performance and Widely Portable MPI, http://www.mcs.anl.gov/research/projects/mpich2/
Network-Based Computing Laboratory: MVAPICH: MPI over InfiniBand and 10GigE/iWARP, http://mvapich.cse.ohio-state.edu/
Open MPI: Open Source High Performance Computing, http://www.open-mpi.org
OSU Microbenchmarks, http://mvapich.cse.ohio-state.edu/benchmarks/
Parboil Benchmarks, http://impact.crhc.illinois.edu/parboil.aspx
Portable Hardware Locality (hwloc), http://www.open-mpi.org/projects/hwloc/
Potluri, S., Wang, H., Bureddy, D., Singh, A.K., Rosales, C., Panda, D.K.: Optimizaing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication. In: Proceedings of the AsHES, in conjunction with IPDPS 2012 (2012)
Singh, A.K., Potluri, S., Wang, H., Kandalla, K., Sur, S., Panda, D.K.: MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits. In: Proceedings of the Workshop on Parallel Programming on Accelerator Clusters (PPAC), in conjunction with Cluster 2011 (2011)
Spafford, K., Meredith, J.S., Vetter, J.S.: Quantifying NUMA and Contention Effects in Multi-GPU systems. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2011 (2011)
SPEC MPI 2007, http://www.spec.org/mpi/
Wang, H., Potluri, S., Luo, M., Singh, A.K., Ouyang, X., Sur, S., Panda, D.K.: Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2. In: Proceedings of Cluster 2011 (2011)
Wang, H., Potluri, S., Luo, M., Singh, A.K., Sur, S., Panda, D.K.: MVAPICH2-GPU: Optimized GPU to GPU Communication for InfiniBand Clusters. In: Proceedings of the 2011 International Supercomputing Conference, ISC 2011 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bureddy, D., Wang, H., Venkatesh, A., Potluri, S., Panda, D.K. (2012). OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-33518-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33517-4
Online ISBN: 978-3-642-33518-1
eBook Packages: Computer ScienceComputer Science (R0)