Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources

Nesi, Lucas Leandro; Schnorr, Lucas Mello; Navaux, Philippe Olivier Alexandre

doi:10.1007/978-3-030-15996-2_3

Lucas Leandro Nesi²¹,
Lucas Mello Schnorr²¹ &
Philippe Olivier Alexandre Navaux²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11333))

Included in the following conference series:

International Conference on Vector and Parallel Processing

489 Accesses

Abstract

The development of parallel solutions over contemporary heterogeneous supercomputers is complex and challenging, especially for coding, performance analysis, and behavioral characterization. The task-based programming model is a possible alternative to adequately reduce the burden on the programmer. Such model consists of dividing the application into tasks with dependencies through a directed acyclic graph (DAG), and subject the DAG to a runtime scheduler that will map tasks to resources. In this paper, we present the design, development, and performance analysis of a task-based heterogeneous (CPU and GPU) application of a Computational Fluid Dynamics (CFD) problem that simulates the flow of an incompressible Newtonian fluid with constant viscosity. We implement our solution based on the StarPU runtime and use the StarVZ toolkit to conduct a comprehensive performance analysis. Results indicate that our solution provides a 6.5$\times $ speedup compared to the serial version on the target machine using 7 CPU workers and a 60$\times $ speedup using 5 CPU and 2 GPU workers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Overcoming GPU Memory Capacity Limitations in Hybrid MPI Implementations of CFD

Parallelization and Optimization of Large-Scale CFD Simulations on Sunway TaihuLight System

Under the Hood of SYCL – An Initial Performance Analysis with An Unstructured-Mesh CFD Application

References

Afzal, A., Ansari, Z., Faizabadi, A.R., Ramis, M.K.: Parallelization strategies for computational fluid dynamics software: state of the art review. Arch. Comput. Methods Eng. 24(2), 337–363 (2017)
Article MathSciNet Google Scholar
Agullo, E., et al.: Faster, cheaper, better-a hybridization methodology to develop linear algebra software for GPUS (2010)
Google Scholar
Agullo, E., Buttari, A., Guermouche, A., Lopez, F.: Implementing multifrontal sparse solvers for multicore architectures with sequential task flow runtime systems. ACM Trans. Math. Softw. 43(2), 13:1–13:22 (2016)
Article MathSciNet Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platformfor task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23, 187–198 (2011). SI: Euro-Par 2009
Article Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Article Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed dag engine for high performance computing. Parallel Comput. 38(1–2), 37–51 (2012)
Article Google Scholar
Buttari, A.: Fine granularity sparse QR factorization for multicore based systems. In: Jónasson, K. (ed.) PARA 2010. LNCS, vol. 7134, pp. 226–236. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28145-7_23
Chapter Google Scholar
Carpaye, J.M.C., Roman, J., Brenner, P.: Design and analysis of a task-basedparallelization over a runtime system of an explicit finite-volume CFD code withadaptive time stepping. J. Comput. Sci. 28, 439–454 (2017)
Article Google Scholar
Chafi, H., Sujeeth, A.K., Brown, K.J., Lee, H., Atreya, A.R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism. SIGPLAN Not. 46(8), 35–46 (2011)
Article Google Scholar
Dagum, L., Menon, R.: OpenMP: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: XKaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1299–1308 (2013)
Google Scholar
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, vol. 1. MIT Press, Cambridge (1999)
Book Google Scholar
Jacobsen, D., Thibault, J., Senocak, I.: An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters. In: 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, p. 522 (2010)
Google Scholar
Jeannot, E., Fournier, Y., Lorendeau, B.: Experimenting task-based runtimes on a legacy computational fluid dynamics code with unstructured meshes. Comput. Fluids 173, 51–58 (2018)
Article MathSciNet Google Scholar
Kjolstad, F.B., Snir, M.: Ghost cell pattern. In: Proceedings of the 2010 Workshop on Parallel Programming Patterns, p. 4. ACM (2010)
Google Scholar
NVIDIA: CUDA Toolkit Documentation v9.2.88. NVIDIA Corporation, Santa Clara, CA, USA (2018)
Google Scholar
Pinto, V.G., Schnorr, L.M., Stanisic, L., Legrand, A., Thibault, S., Danjean, V.: A visual performance analysis framework for task-based parallel applications running on hybrid clusters. Pract. Exp. Concurr. Comput. 30(18), e4472 (2018). https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.4472
Pletcher, R.H., Tannehill, J.C., Anderson, D.: Computational Fluid Mechanics and Heat Transfer. CRC Press, Boca Raton (2012)
MATH Google Scholar
Robison, A.D.: Intel$\textregistered $ threading building blocks (TBB). In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 955–964. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-09766-4_51
Chapter Google Scholar
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010)
Article Google Scholar
Xie, C.: Interactive heat transfer simulations for everyone. Phys. Teach. 50(4), 237 (2012)
Article Google Scholar

Download references

Acknowledgements

This study was financed by the National Council for Scientific and Technological Development (CNPq). We thank these projects for supporting this investigation: FAPERGS GreenCloud (16/488-9), the FAPERGS MultiGPU (16/354-8), the CNPq 447311/2014-0, the CAPES/Brafitec EcoSud 182/15, and the CAPES/Cofecub 899/18. The companion material is hosted by CERN’s Zenodo for which we are also grateful.

Author information

Authors and Affiliations

Institute of Informatics/PPGC/UFRGS, Porto Alegre, Brazil
Lucas Leandro Nesi, Lucas Mello Schnorr & Philippe Olivier Alexandre Navaux

Authors

Lucas Leandro Nesi
View author publications
You can also search for this author in PubMed Google Scholar
Lucas Mello Schnorr
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Olivier Alexandre Navaux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucas Mello Schnorr .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, São Paulo, Brazil
Hermes Senger
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Osni Marques
Universidade Estadual Paulista Júlio de Mesquita Filho, Presidente Prudente, São Paulo, Brazil
Rogerio Garcia
Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo, São Paulo, Brazil
Tatiana Pinheiro de Brito
Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo, São Paulo, Brazil
Rogério Iope
Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo, São Paulo, Brazil
Silvio Stanzani
Universidad Nacional de San Luis, San Luis, Argentina
Veronica Gil-Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nesi, L.L., Schnorr, L.M., Navaux, P.O.A. (2019). Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources. In: Senger, H., et al. High Performance Computing for Computational Science – VECPAR 2018. VECPAR 2018. Lecture Notes in Computer Science(), vol 11333. Springer, Cham. https://doi.org/10.1007/978-3-030-15996-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-15996-2_3
Published: 26 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15995-5
Online ISBN: 978-3-030-15996-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Overcoming GPU Memory Capacity Limitations in Hybrid MPI Implementations of CFD

Parallelization and Optimization of Large-Scale CFD Simulations on Sunway TaihuLight System

Under the Hood of SYCL – An Initial Performance Analysis with An Unstructured-Mesh CFD Application

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Overcoming GPU Memory Capacity Limitations in Hybrid MPI Implementations of CFD

Parallelization and Optimization of Large-Scale CFD Simulations on Sunway TaihuLight System

Under the Hood of SYCL – An Initial Performance Analysis with An Unstructured-Mesh CFD Application

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation