iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/S10766-015-0354-9
Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language | International Journal of Parallel Programming Skip to main content
Log in

Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Providing high level tools for parallel programming while sustaining a high level of performance has been a challenge that techniques like Domain Specific Embedded Languages try to solve. In previous works, we investigated the design of such a DSEL—NT\(^2\)—providing a Matlab -like syntax for parallel numerical computations inside a C++ library. In this paper, we show how NT\(^2\!\) has been redesigned for shared memory systems in an extensible and portable way. The new NT\(^2\!\) design relies on a tiered Parallel Skeleton system built using asynchronous task management and automatic compile-time taskification of user level code. We describe how this system can operate various shared memory runtimes and evaluate the design by using two benchmarks implementing linear algebra algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. As defined by Czarnecki et al. [20].

  2. The Parallel Linear Algebra for Multicore Architectures [2, 13] is a software framework that rewrites a major part of LAPACK subroutines to take advantage of multicore architectures. PLASMA implements tile algorithms and uses both tile data layout and dynamic task scheduling to achieve good performance.

References

  1. Abrahams, D., Gurtovoy, A.: C++ Template Metaprogramming: concepts, Tools, and Techniques from Boost and Beyond. Pearson Education, Boston (2004)

  2. Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H., Luszczek, P., YarKhan, A.: Plasma users guide. Techn. Rep., Electrical Engineering and Computer Science Department, University of Tennessee. http://icl.cs.utk.edu/projectsfiles/plasma/pdf/usersguide.pdf (2009)

  3. Aldinucci, M., Danelutto, M., Dazzi, P.: Muskel: an expandable skeleton environment. Scal. Comput. Pract. Exp. 8(4), 325–341 (2001)

  4. Aldinucci, M., Danelutto, M., Dnnweber, J.: Optimization techniques for implementing parallel skeletons in grid environments. In: Gorlatch, S. (ed.) Proceedings of CMPP: International Workshop on Constructive Methods for Parallel Programming, pp. 35–47. Universitat Munster, Stirling (2004)

  5. Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. In: Pllana, S., Xhafa, F. (eds.) Programming Multi-core and Many-core Computing Systems, chap 13. Parallel and Distributed Computing. Wiley (2014)

  6. An, P., Jula, A., Rus, S., Saunders, S., Smith, T., Tanase, G.,Thomas, N., Amato, N., Rauchwerger, L.: STAPL: an adaptive, generic parallel C++ library. In: Dietz, H.G. (ed.) Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 2624, pp. 193–208. Springer, Berlin, Heidelberg (2003)

  7. Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The design of openmp tasks. Parallel Distrib. Syst. IEEE Trans. 20(3), 404–418 (2009)

    Article  Google Scholar 

  8. Baker Jr, H.C., Hewitt, C.: The incremental garbage collection of processes. ACM SIGART Bull. 12, 55–59 (1977)

    Article  Google Scholar 

  9. Benoit, A., Cole, M., Gilmore, S., Hillston, J.: Flexible skeletal programming with Eskel. In: Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Lisbon, Portugal. Euro-Par’05, pp. 761–770. Springer-Verlag, Berlin, Heidelberg (2005)

  10. Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3), 637–654 (1973)

  11. Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Dongarra, J.: From serial loops to parallel execution on distributed systems. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science, vol. 7484, pp. 246–257. Springer, Berlin, Heidelberg (2012)

  12. Harshvardhan, A. Buss, Papadopoulos, I., Pearce, O., Smith, T., Tanase, G., Thomas, N., Xu, X., Bianco, M., Amato, N.M., Rauchwerger, L.: Stapl: standard template adaptive parallel library. In: Proceedings of the 3rd Annual Haifa Experimental Systems Conference, SYSTOR ’10, pp. 14:1–14:10, ACM, New York, (2010)

  13. Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)

    Article  MathSciNet  Google Scholar 

  14. Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)

    Article  Google Scholar 

  15. Ching, W.-M., Zheng, D.: Automatic parallelization of array-oriented programs for a multi-core machine. Int. J. Parallel Progr. 40(5), 514–531 (2012)

    Article  Google Scholar 

  16. Mysen, C., Gustafsson, N., Austern, M., Yasskin, J.: N3785: executors and schedulers, revision 3. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3785.pdf (2013)

  17. Ciechanowicz, P., Kuchen, H.: Enhancing muesli’s data parallel skeletons for multi-core computer architectures. In: High Performance Computing and Communications (HPCC), 12th IEEE International Conference on, pp. 108–113. IEEE (2010)

  18. Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)

    Article  Google Scholar 

  19. Cole, M.I.: Algorithmic skeletons: structured management of parallel computation. Pitman London, (1989)

  20. Czarnecki, K., Eisenecker, U.W., Glück, R., Vandevoorde, D., Veldhuizen, T.L.: Generative programming and active libraries. In Generic Programming, pp. 25–39 (1998)

  21. Dawes, B., Abrahams, D., Rivera. R.: Boost C++ Libraries. http://www.boost.org (2009)

  22. Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Domain-specific optimization strategy for skeleton programs. In: Kermarrec, A.-M., Boug, L., Priol, T. (eds.) Euro-Par 2007 Parallel Processing. Lecture Notes in Computer Science, vol. 4641, pp. 705–714. Springer, Berlin (2007)

    Chapter  Google Scholar 

  23. Estérie, P., Gaunard, M., Falcou, J., Lapresté, J.-T., Rozoy, B.: Boost. simd: generic programming for portable simdization. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 431–432. ACM, (2012)

  24. Falcou, J., Gaunard, M., Lapresté, J.-T., The numerical template toolbox. http://www.github.com/MetaScale/nt2 (2013)

  25. Falcou, J., Sérot, J., Pech, L., Lapresté, J.-T.: Meta-programming applied to automatic smp parallelization of linear algebra code. In Euro-Par 2008-Parallel Processing, pp. 729–738. Springer, Berlin, (2008)

  26. Friedman, D.P., Wise, D.S.: The impact of applicative programming on multiprocessing. Indiana University, Computer Science Department (1976)

  27. Grelck, C., Scholz, S.-B.: Saca functional array language for efficient multi-threaded execution. Int. J. Parallel Progr. 34(4), 383–427 (2006)

    Article  MATH  Google Scholar 

  28. Hudak, P.: Building domain-specific embedded languages. ACM Comput. Surv. 28(4es), 196 (1996)

    Article  Google Scholar 

  29. Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: Parallel Processing Workshops, 2009. ICPPW’09. International Conference on, pp. 394–401. IEEE, (2009)

  30. Kale, L.V., and Krishnan, S.: CHARM++: A Portable Concurrent Object Oriented System Based on C++, 28(10). ACM, (1993)

  31. Kuchen, H.: A Skeleton Library. Springer, Berlin (2002)

    Book  MATH  Google Scholar 

  32. Niebler, E.: Proto : A compiler construction toolkit for DSELs. In: Proceedings of ACM SIGPLAN Symposium on Library-Centric Software Design, (2007)

  33. Gustafsson, N., Laksberg, A., Sutter, H., Mithani, S.: N3857: Improvements to std::future \(<\)T\(>\) and related APIs. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3857.pdf (2014)

  34. OpenMP Architecture Review Board. OpenMP application program interface version 4, (2013)

  35. Reinders, J.: Intel Threading Building Blocks: outfitting C++ for Multi-Core Processor Parallelism. O’Reilly Media, California (2010)

    Google Scholar 

  36. Spinellis, D.: Notable design patterns for domain-specific languages. J. Syst. Softw. 56(1), 91–99 (2001)

    Article  Google Scholar 

  37. The C++ Standards Committee. ISO/IEC 14882:2011, Standard for Programming Language C++. Technical report. http://www.open-std.org/jtc1/sc22/wg21 (2011)

  38. The C++ Standards Committee. N3797: Working Draft, Standard for Programming Language C++. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf (2013)

  39. Tratt, L.: Model transformations and tool integration. Softw. Syst. Model. 4(2), 112–122 (2005)

    Article  Google Scholar 

  40. Vandevoorde, D., Josuttis, N.M.: C++ Templates. Addison-Wesley Longman Publishing Co, Boston (2002)

    Google Scholar 

  41. Veldhuizen, T.: Expression templates. C++ Report 7, 26–31 (1995)

    Google Scholar 

  42. Escriba, V.B.J.: N3865: More Improvements to std::future \(<\)T\(>\). Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3865.pdf (2014)

  43. Yarkhan, A., Kurzak, J., and Dongarra, J.: Quark users guide. Technical report, Technical Report April, Electrical Engineering and Computer Science, Innovative Computing Laboratory, University of Tenessee (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antoine Tran Tan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran Tan, A., Falcou, J., Etiemble, D. et al. Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language. Int J Parallel Prog 44, 449–465 (2016). https://doi.org/10.1007/s10766-015-0354-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-015-0354-9

Keywords

Navigation