Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access

Moreaud, Stéphanie; Goglin, Brice; Namyst, Raymond

doi:10.1007/978-3-642-15646-5_25

Stéphanie Moreaud²⁰,
Brice Goglin²⁰ &
Raymond Namyst²⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6305))

Included in the following conference series:

European MPI Users' Group Meeting

1048 Accesses
3 Citations

Abstract

Multicore processors have not only reintroduced Non-Uniform Memory Access (NUMA) architectures in nowadays parallel computers, but they are also responsible for non-uniform access times with respect to Input/Output devices (NUIOA). In clusters of multicore machines equipped with several network interfaces, performance of communication between processes thus depends on which cores these processes are scheduled on, and on their distance to the Network Interface Cards involved. We propose a technique allowing multirail communication between processes to carefully distribute data among the network interfaces so as to counterbalance NUIOA effects. We demonstrate the relevance of our approach by evaluating its implementation within Open MPI on a Myri-10G + InfiniBand cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning

BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs

High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters

References

Moreaud, S., Goglin, B.: Impact of NUMA Effects on High-Speed Networking with Multi-Opteron Machines. In: The 19th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2007), Cambridge, Massachussetts (2007)
Google Scholar
Buntinas, D., Goglin, B., Goodell, D., Mercier, G., Moreaud, S.: Cache-Efficient, Intranode Large-Message MPI Communication with MPICH2-Nemesis. In: Proceedings of the 38th International Conference on Parallel Processing (ICPP-2009), Vienna, Austria, pp. 462–469. IEEE Computer Society Press, Los Alamitos (2009)
Google Scholar
Narayanaswamy, G., Balaji, P., Feng, W.: Impact of Network Sharing in Multi-core Architectures. In: Proceedings of the IEEE International Conference on Computer Communication and Networks (ICCCN), St. Thomas, U.S. Virgin Islands (2008)
Google Scholar
Jang, H.C., Jin, H.W.: MiAMI: Multi-core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces. In: Proceedings of the 17th Annual Symposium on High-Performance Interconnects (HotI 2009), New York, NJ, pp. 73–82 (2009)
Google Scholar
Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes. In: Proceedings of the 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2009), Weimar, Germany, pp. 427–436 (2009)
Google Scholar
Mercier, G., Clet-Ortega, J.: Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 5759, pp. 104–115. Springer, Heidelberg (2009)
Chapter Google Scholar
Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings of 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, pp. 97–104 (2004)
Google Scholar
Mercier, G., Trahay, F., Buntinas, D., Brunet, É.: NewMadeleine: An Efficient Support for High-Performance Networks in MPICH2. In: Proceedings of 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2009), Rome, Italy. IEEE Computer Society Press, Los Alamitos (2009)
Google Scholar
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In: Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), Pisa, Italia. IEEE Computer Society Press, Los Alamitos (2010)
Google Scholar
Aumage, O., Brunet, E., Mercier, G., Namyst, R.: High-Performance Multi-Rail Support with the NewMadeleine Communication Library. In: Proceedings of the Sixteenth International Heterogeneity in Computing Workshop (HCW 2007), held in conjunction with IPDPS 2007, Long Beach, CA (2007)
Google Scholar
Pellegrini, S., Wang, J., Fahringer, T., Moritsch, H.: Optimizing MPI Runtime Parameter Settings by Using Machine Learning. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 5759, pp. 196–206. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

LaBRI, Université de Bordeaux, INRIA, 351, cours de la Libération, F-33405, Talence cedex, France
Stéphanie Moreaud, Brice Goglin & Raymond Namyst

Authors

Stéphanie Moreaud
View author publications
You can also search for this author in PubMed Google Scholar
Brice Goglin
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Namyst
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

High Performance Computing Center Stuttgart (HLRS), Universität Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Rainer Keller
Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston,
Edgar Gabriel
High Performance Computing Center Stuttgart, University of Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Michael Resch
Department of Electrical Engineering and Computer Science, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moreaud, S., Goglin, B., Namyst, R. (2010). Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2010. Lecture Notes in Computer Science, vol 6305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15646-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-15646-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15645-8
Online ISBN: 978-3-642-15646-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning

BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs

High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning

BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs

High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation