Abstract
Parallelization of sequential applications requires extracting information about the loops and how their variables are accessed, and afterwards, augmenting the source code with extra code depending on such information. In this paper we propose a framework that avoids such an error-prone, time-consuming task. Our solution leverages the compile-time information extracted from the source code to classify all variables used inside each loop according to their accesses. Then, our system, called BFCA+, automatically instruments the source code with the necessary OpenMP directives and clauses to allow its parallel execution, using the standard shared and private clauses for variable classification. The framework is also capable of instrumenting loops for speculative parallelization, with the help of the ATLaS runtime system, that defines a new speculative clause to point out those variables that may lead to a dependency violation. As a result, the target loop is guaranteed to correctly run in parallel, ensuring that its execution follows sequential semantics even in the presence of dependency violations. Our experimental evaluation shows that the framework not only saves development time, but also leads to a faster code than the one manually parallelized.
Similar content being viewed by others
Notes
Only well-formed for loops where the number of iterations are known at the beginning of the loop can be parallelized by the ATLaS framework. See [7] for additional details.
The current version of BFCA\(+\) only transforms a single loop of the application to avoid the transformation of two nested loops, a situation not allowed by the ATLaS runtime system. We expect to overcome this limitation in the near future.
Note that the manual transformation process is included to figure out which loop would be more profitable to be parallelized and then perform an in-depth analysis of the data elements being accessed inside the loop. This is an error-prone, time-consuming process that, for the benchmarks considered, took between 10 and 30 h.
References
Aldea S, Llanos DR, Gonzalez-Escribano A (2012) Using SPEC CPU2006 to evaluate the secuential and parallel code generated by commercial and open-source compilers. J Supercomput 59(1):486–498
Cintra M, Llanos DR (2003) Toward efficient and robust software speculative parallelization on multiprocessors. In: PPoPP’03 proceedings, pp 13–24
Dang FH, Yu H, Rauchwerger L (2002) The R-LRPD test: speculative parallelization of partially parallel loops. In: IPDPS’02 proceedings, pp 20–29
Aldea S, Llanos DR, Gonzalez-Escribano A (2012) Support for thread-level speculation into OpenMP. In: IWOMP’12 proceedings, pp 275–278
Aldea S, Llanos DR, Gonzalez-Escribano A (2014) The BonaFide C analyzer: automatic loop-level characterization and coverage measurement. J Supercomput 68(3):1378–1401
Aldea S, Estebanez A, Llanos DR, Gonzalez-Escribano A (2014) A new GCC plugin-based compiler pass to add support for thread-level speculation into OpenMP. In: EuroPar’14 proceedings, LNCS 8632, Springer, pp 234–245
Aldea S et al (2015) An OpenMP extension that supports thread-level speculation. IEEE Trans Partial Distrib Syst (to appear)
Oancea CE, Mycroft A, Harris T (2009) A lightweight in-place implementation for software thread-level speculation. In: SPAA 2009 proceedings, pp 223–232. ACM, New York
Yiapanis P et al (2013) Optimizing software runtime systems for speculative parallelization. ACM Trans Arch Code Optim (TACO) 9(4):39
Adhianto Laksono et al (2000) Tools for OpenMP application development: the POST project. Concurr Pract Exp 12:1177–1191
Ierotheou Cos S et al (2005) Generating OpenMP code using an interactive parallelization environment. Parallel Comput 31(10–12):999–1012
Jin Haoqiang et al (2003) Automatic multilevel parallelization using OpenMP. J Sci Program EWOMP’11 11(2):177–190 (2)
Johnson S et al (2005) The ParaWise expert assistant—widening accessibility to efficient and scalable tool generated OpenMP code. In: Proceedings of the WOMPAT’04, pp 67–82
Bondhugula, Uday et al (2008) A practical automatic polyhedral parallelizer and locality optimizer. In: PLDI’08 proceedings, pp 101–113
Trifunovic K et al (2010) Graphite two years after: first lessons learned from real-world polyhedral compilation. In: GROW’10 proceedings, pp 4–19
Grosser T et al (2011) Polly—polyhedral optimization in LLVM. In: IMPACT’11 workshop proceedings, Charmonix, France, pp 1–6
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis transformation. In: CGO’04 proceedings, pp 75–86 (2004)
Amini M et al (2012) Par4All: from convex array regions to heterogeneous computing. In: IMPACT’12 HiPEAC workshop proceedings, Paris, France, pp 1–2
Guelton S (2011) Building source-to-source compilers for heterogeneous targets. PhD thesis, Universit europenne de Bretagne, Rennes (2011)
Amini M et al (2011) PIPS is not (just) polyhedral software. In: IMPACT’11 workshop proceedings, Charmonix, France, pp 7–12
Liao C et al (2008) Automatic parallelization using OpenMP based on STL semantics. In: Languages and compilers for parallel computing (LCPC)
Dave Chirag et al (2009) Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput 42(12):36–42
Taillard J, Guyomarch F, Dekeyser JL (2008) A graphical framework for high performance computing using an MDE approach. In: PDP’08 proceedings, pp 165–173
Nardi L et al (2012) YAO: a generator of parallel code for variational data assimilation applications. In: HPCC’12 proceedings, pp 224–232
Clarkson KL, Mehlhorn K, Seidel R (1993) Four results on randomized incremental constructions. Comput Geom Theory Appl 3(4):185–212
Devroye L, Mücke EP, Zhu B (1998) A note on point location in Delaunay triangulations of random points. Algorithmica 22:477–482
Welzl E (1991) Smallest enclosing disks (balls and ellipsoids). In: New results and new trends in computer science. LNCS, vol 555. Springer, New York, pp 359–370
Barnes JE (1997) TREE. Institute for Astronomy, University of Hawaii. ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/
Acknowledgments
This research has been partially supported by MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aldea, S., Llanos, D.R. & Gonzalez-Escribano, A. BFCA+: automatic synthesis of parallel code with TLS capabilities. J Supercomput 73, 88–99 (2017). https://doi.org/10.1007/s11227-016-1623-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1623-0