Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language

Tran Tan, Antoine; Falcou, Joel; Etiemble, Daniel; Kaiser, Hartmut

doi:10.1007/s10766-015-0354-9

Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language

Published: 20 March 2015

Volume 44, pages 449–465, (2016)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Antoine Tran Tan¹,
Joel Falcou¹,
Daniel Etiemble¹ &
…
Hartmut Kaiser²

639 Accesses
8 Citations
Explore all metrics

Abstract

Providing high level tools for parallel programming while sustaining a high level of performance has been a challenge that techniques like Domain Specific Embedded Languages try to solve. In previous works, we investigated the design of such a DSEL—NT$^2$—providing a Matlab -like syntax for parallel numerical computations inside a C++ library. In this paper, we show how NT$^2\!$ has been redesigned for shared memory systems in an extensible and portable way. The new NT$^2\!$ design relies on a tiered Parallel Skeleton system built using asynchronous task management and automatic compile-time taskification of user level code. We describe how this system can operate various shared memory runtimes and evaluate the design by using two benchmarks implementing linear algebra algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards High-Level Programming for Systems with Many Cores

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Automatic Parallelism Through Macro Dataflow in MATLAB

Notes

As defined by Czarnecki et al. [20].
The Parallel Linear Algebra for Multicore Architectures [2, 13] is a software framework that rewrites a major part of LAPACK subroutines to take advantage of multicore architectures. PLASMA implements tile algorithms and uses both tile data layout and dynamic task scheduling to achieve good performance.

References

Abrahams, D., Gurtovoy, A.: C++ Template Metaprogramming: concepts, Tools, and Techniques from Boost and Beyond. Pearson Education, Boston (2004)
Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H., Luszczek, P., YarKhan, A.: Plasma users guide. Techn. Rep., Electrical Engineering and Computer Science Department, University of Tennessee. http://icl.cs.utk.edu/projectsfiles/plasma/pdf/usersguide.pdf (2009)
Aldinucci, M., Danelutto, M., Dazzi, P.: Muskel: an expandable skeleton environment. Scal. Comput. Pract. Exp. 8(4), 325–341 (2001)
Aldinucci, M., Danelutto, M., Dnnweber, J.: Optimization techniques for implementing parallel skeletons in grid environments. In: Gorlatch, S. (ed.) Proceedings of CMPP: International Workshop on Constructive Methods for Parallel Programming, pp. 35–47. Universitat Munster, Stirling (2004)
Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. In: Pllana, S., Xhafa, F. (eds.) Programming Multi-core and Many-core Computing Systems, chap 13. Parallel and Distributed Computing. Wiley (2014)
An, P., Jula, A., Rus, S., Saunders, S., Smith, T., Tanase, G.,Thomas, N., Amato, N., Rauchwerger, L.: STAPL: an adaptive, generic parallel C++ library. In: Dietz, H.G. (ed.) Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 2624, pp. 193–208. Springer, Berlin, Heidelberg (2003)
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The design of openmp tasks. Parallel Distrib. Syst. IEEE Trans. 20(3), 404–418 (2009)
Article Google Scholar
Baker Jr, H.C., Hewitt, C.: The incremental garbage collection of processes. ACM SIGART Bull. 12, 55–59 (1977)
Article Google Scholar
Benoit, A., Cole, M., Gilmore, S., Hillston, J.: Flexible skeletal programming with Eskel. In: Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Lisbon, Portugal. Euro-Par’05, pp. 761–770. Springer-Verlag, Berlin, Heidelberg (2005)
Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3), 637–654 (1973)
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Dongarra, J.: From serial loops to parallel execution on distributed systems. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science, vol. 7484, pp. 246–257. Springer, Berlin, Heidelberg (2012)
Harshvardhan, A. Buss, Papadopoulos, I., Pearce, O., Smith, T., Tanase, G., Thomas, N., Xu, X., Bianco, M., Amato, N.M., Rauchwerger, L.: Stapl: standard template adaptive parallel library. In: Proceedings of the 3rd Annual Haifa Experimental Systems Conference, SYSTOR ’10, pp. 14:1–14:10, ACM, New York, (2010)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)
Article MathSciNet Google Scholar
Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)
Article Google Scholar
Ching, W.-M., Zheng, D.: Automatic parallelization of array-oriented programs for a multi-core machine. Int. J. Parallel Progr. 40(5), 514–531 (2012)
Article Google Scholar
Mysen, C., Gustafsson, N., Austern, M., Yasskin, J.: N3785: executors and schedulers, revision 3. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3785.pdf (2013)
Ciechanowicz, P., Kuchen, H.: Enhancing muesli’s data parallel skeletons for multi-core computer architectures. In: High Performance Computing and Communications (HPCC), 12th IEEE International Conference on, pp. 108–113. IEEE (2010)
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)
Article Google Scholar
Cole, M.I.: Algorithmic skeletons: structured management of parallel computation. Pitman London, (1989)
Czarnecki, K., Eisenecker, U.W., Glück, R., Vandevoorde, D., Veldhuizen, T.L.: Generative programming and active libraries. In Generic Programming, pp. 25–39 (1998)
Dawes, B., Abrahams, D., Rivera. R.: Boost C++ Libraries. http://www.boost.org (2009)
Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Domain-specific optimization strategy for skeleton programs. In: Kermarrec, A.-M., Boug, L., Priol, T. (eds.) Euro-Par 2007 Parallel Processing. Lecture Notes in Computer Science, vol. 4641, pp. 705–714. Springer, Berlin (2007)
Chapter Google Scholar
Estérie, P., Gaunard, M., Falcou, J., Lapresté, J.-T., Rozoy, B.: Boost. simd: generic programming for portable simdization. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 431–432. ACM, (2012)
Falcou, J., Gaunard, M., Lapresté, J.-T., The numerical template toolbox. http://www.github.com/MetaScale/nt2 (2013)
Falcou, J., Sérot, J., Pech, L., Lapresté, J.-T.: Meta-programming applied to automatic smp parallelization of linear algebra code. In Euro-Par 2008-Parallel Processing, pp. 729–738. Springer, Berlin, (2008)
Friedman, D.P., Wise, D.S.: The impact of applicative programming on multiprocessing. Indiana University, Computer Science Department (1976)
Grelck, C., Scholz, S.-B.: Saca functional array language for efficient multi-threaded execution. Int. J. Parallel Progr. 34(4), 383–427 (2006)
Article MATH Google Scholar
Hudak, P.: Building domain-specific embedded languages. ACM Comput. Surv. 28(4es), 196 (1996)
Article Google Scholar
Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: Parallel Processing Workshops, 2009. ICPPW’09. International Conference on, pp. 394–401. IEEE, (2009)
Kale, L.V., and Krishnan, S.: CHARM++: A Portable Concurrent Object Oriented System Based on C++, 28(10). ACM, (1993)
Kuchen, H.: A Skeleton Library. Springer, Berlin (2002)
Book MATH Google Scholar
Niebler, E.: Proto : A compiler construction toolkit for DSELs. In: Proceedings of ACM SIGPLAN Symposium on Library-Centric Software Design, (2007)
Gustafsson, N., Laksberg, A., Sutter, H., Mithani, S.: N3857: Improvements to std::future $<$T$>$ and related APIs. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3857.pdf (2014)
OpenMP Architecture Review Board. OpenMP application program interface version 4, (2013)
Reinders, J.: Intel Threading Building Blocks: outfitting C++ for Multi-Core Processor Parallelism. O’Reilly Media, California (2010)
Google Scholar
Spinellis, D.: Notable design patterns for domain-specific languages. J. Syst. Softw. 56(1), 91–99 (2001)
Article Google Scholar
The C++ Standards Committee. ISO/IEC 14882:2011, Standard for Programming Language C++. Technical report. http://www.open-std.org/jtc1/sc22/wg21 (2011)
The C++ Standards Committee. N3797: Working Draft, Standard for Programming Language C++. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf (2013)
Tratt, L.: Model transformations and tool integration. Softw. Syst. Model. 4(2), 112–122 (2005)
Article Google Scholar
Vandevoorde, D., Josuttis, N.M.: C++ Templates. Addison-Wesley Longman Publishing Co, Boston (2002)
Google Scholar
Veldhuizen, T.: Expression templates. C++ Report 7, 26–31 (1995)
Google Scholar
Escriba, V.B.J.: N3865: More Improvements to std::future $<$T$>$. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3865.pdf (2014)
Yarkhan, A., Kurzak, J., and Dongarra, J.: Quark users guide. Technical report, Technical Report April, Electrical Engineering and Computer Science, Innovative Computing Laboratory, University of Tenessee (2011)

Download references

Author information

Authors and Affiliations

LRI, INRIA, Université Paris-Sud XI, Orsay, France
Antoine Tran Tan, Joel Falcou & Daniel Etiemble
CCT, Louisiana State University, Baton Rouge, LA, USA
Hartmut Kaiser

Authors

Antoine Tran Tan
View author publications
You can also search for this author in PubMed Google Scholar
Joel Falcou
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Etiemble
View author publications
You can also search for this author in PubMed Google Scholar
Hartmut Kaiser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antoine Tran Tan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran Tan, A., Falcou, J., Etiemble, D. et al. Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language. Int J Parallel Prog 44, 449–465 (2016). https://doi.org/10.1007/s10766-015-0354-9

Download citation

Received: 30 July 2014
Accepted: 07 March 2015
Published: 20 March 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10766-015-0354-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards High-Level Programming for Systems with Many Cores

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Automatic Parallelism Through Macro Dataflow in MATLAB

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards High-Level Programming for Systems with Many Cores

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Automatic Parallelism Through Macro Dataflow in MATLAB

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation