Abstract
The development of parallel solutions over contemporary heterogeneous supercomputers is complex and challenging, especially for coding, performance analysis, and behavioral characterization. The task-based programming model is a possible alternative to adequately reduce the burden on the programmer. Such model consists of dividing the application into tasks with dependencies through a directed acyclic graph (DAG), and subject the DAG to a runtime scheduler that will map tasks to resources. In this paper, we present the design, development, and performance analysis of a task-based heterogeneous (CPU and GPU) application of a Computational Fluid Dynamics (CFD) problem that simulates the flow of an incompressible Newtonian fluid with constant viscosity. We implement our solution based on the StarPU runtime and use the StarVZ toolkit to conduct a comprehensive performance analysis. Results indicate that our solution provides a 6.5\(\times \) speedup compared to the serial version on the target machine using 7 CPU workers and a 60\(\times \) speedup using 5 CPU and 2 GPU workers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Afzal, A., Ansari, Z., Faizabadi, A.R., Ramis, M.K.: Parallelization strategies for computational fluid dynamics software: state of the art review. Arch. Comput. Methods Eng. 24(2), 337–363 (2017)
Agullo, E., et al.: Faster, cheaper, better-a hybridization methodology to develop linear algebra software for GPUS (2010)
Agullo, E., Buttari, A., Guermouche, A., Lopez, F.: Implementing multifrontal sparse solvers for multicore architectures with sequential task flow runtime systems. ACM Trans. Math. Softw. 43(2), 13:1–13:22 (2016)
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platformfor task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23, 187–198 (2011). SI: Euro-Par 2009
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed dag engine for high performance computing. Parallel Comput. 38(1–2), 37–51 (2012)
Buttari, A.: Fine granularity sparse QR factorization for multicore based systems. In: Jónasson, K. (ed.) PARA 2010. LNCS, vol. 7134, pp. 226–236. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28145-7_23
Carpaye, J.M.C., Roman, J., Brenner, P.: Design and analysis of a task-basedparallelization over a runtime system of an explicit finite-volume CFD code withadaptive time stepping. J. Comput. Sci. 28, 439–454 (2017)
Chafi, H., Sujeeth, A.K., Brown, K.J., Lee, H., Atreya, A.R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism. SIGPLAN Not. 46(8), 35–46 (2011)
Dagum, L., Menon, R.: OpenMP: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: XKaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1299–1308 (2013)
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, vol. 1. MIT Press, Cambridge (1999)
Jacobsen, D., Thibault, J., Senocak, I.: An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters. In: 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, p. 522 (2010)
Jeannot, E., Fournier, Y., Lorendeau, B.: Experimenting task-based runtimes on a legacy computational fluid dynamics code with unstructured meshes. Comput. Fluids 173, 51–58 (2018)
Kjolstad, F.B., Snir, M.: Ghost cell pattern. In: Proceedings of the 2010 Workshop on Parallel Programming Patterns, p. 4. ACM (2010)
NVIDIA: CUDA Toolkit Documentation v9.2.88. NVIDIA Corporation, Santa Clara, CA, USA (2018)
Pinto, V.G., Schnorr, L.M., Stanisic, L., Legrand, A., Thibault, S., Danjean, V.: A visual performance analysis framework for task-based parallel applications running on hybrid clusters. Pract. Exp. Concurr. Comput. 30(18), e4472 (2018). https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.4472
Pletcher, R.H., Tannehill, J.C., Anderson, D.: Computational Fluid Mechanics and Heat Transfer. CRC Press, Boca Raton (2012)
Robison, A.D.: Intel\(\textregistered \) threading building blocks (TBB). In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 955–964. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-09766-4_51
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010)
Xie, C.: Interactive heat transfer simulations for everyone. Phys. Teach. 50(4), 237 (2012)
Acknowledgements
This study was financed by the National Council for Scientific and Technological Development (CNPq). We thank these projects for supporting this investigation: FAPERGS GreenCloud (16/488-9), the FAPERGS MultiGPU (16/354-8), the CNPq 447311/2014-0, the CAPES/Brafitec EcoSud 182/15, and the CAPES/Cofecub 899/18. The companion material is hosted by CERN’s Zenodo for which we are also grateful.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Nesi, L.L., Schnorr, L.M., Navaux, P.O.A. (2019). Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources. In: Senger, H., et al. High Performance Computing for Computational Science – VECPAR 2018. VECPAR 2018. Lecture Notes in Computer Science(), vol 11333. Springer, Cham. https://doi.org/10.1007/978-3-030-15996-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-15996-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15995-5
Online ISBN: 978-3-030-15996-2
eBook Packages: Computer ScienceComputer Science (R0)