Abstract
When implementing multimedia applications, solutions in dedicated hardware are chosen only when the required performance or energy-efficiency cannot be met with a software solution. The performance of a hardware design critically depends upon having high levels of parallelism and data locality. Often a long sequence of high-level transformations is needed to sufficiently increase the locality and parallelism. The effect of the transformations is known only after translating the high-level code into a specific design at the circuit level. When the constraints are not met, hardware designers need to redo the high-level loop transformations, and repeat all subsequent translation steps, which leads to long design times.
We propose a method to reduce design time through the synergistic combination of techniques (a) to quickly pinpoint the loop transformations that increase locality; (b) to refactor loops in a polyhedral model and check whether a sequence of refactorings is legal; (c) to generate efficient structural VHDL from the optimized refactored algorithm.
The implementation of these techniques in a tool suite results in a far shorter design time of hours instead of days or weeks. A 2D-inverse discrete wavelet transform was taken as a case study. The results outperform those of a commercial C-to-VHDL compiler, and compare favorably with existing published approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (1986)
Ashar, P., Devadas, S., Newton, A.R.: Sequential logic synthesis. Kluwer Academic Publishers, Dordrecht (1992)
Augé, I., Pétrot, F., Donnet, F., Gomez, P.: Platform-based design from parallel C specifications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24(12), 1811–1826 (2005)
Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT, pp. 7–16 (2004)
Bastoul, C., Cohen, A., Girbal, S., Sharma, S., Temam, O.: Putting polyhedral loop transformations to work. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 209–225. Springer, Heidelberg (2004)
Bastoul, C., Feautrier, P.: Improving data locality by chunking. In: Hedin, G. (ed.) CC 2003 and ETAPS 2003. LNCS, vol. 2622, pp. 320–335. Springer, Heidelberg (2003)
Benini, L., Siegel, P., De Micheli, G.: Saving power by synthesizing gated clocks for sequential circuits. IEEE Design & Test of Computers 11(4), 32–41 (1994)
Berg, E., Hagersten, E.: Fast data-locality profiling of native execution. In: SIGMETRICS, Banff, Alberta, Canada, pp. 169–180 (2005), doi:10.1145/1064212.1064232
Beyls, K., D’Hollander, E.: Discovery of locality-improving refactorings by reuse path analysis. In: Gerndt, M., Kranzlmüller, D. (eds.) HPCC 2006. LNCS, vol. 4208, pp. 220–229. Springer, Heidelberg (2006)
Beyls, K., D’Hollander, E.H.: Generating cache hints for improved program efficiency. Journal of Systems Architecture 51(4), 223–250 (2005)
Beyls, K., D’Hollander, E.H.: Intermediately executed code is the key to find refactorings that improve temporal data locality. In: Proceedings of the 3rd conference on Computing Frontiers, pp. 373–382 (2006)
Bohm, W., Hammes, J., Draper, B., Chawathe, M., Ross, C., Rinker, R., Najjar, W.: Mapping a single assignment programming language to reconfigurable systems. Journal of Supercomputing 21(2), 117–130 (2002)
Bormans, J., Denolf, K., Wuytack, S., Nachtergaele, L., Bolsens, I.: Integrating system-level low power methodologies into a real-life design flow. In: PATMOS, pp. 19–28 (1999)
Cohen, A., Girbal, S., Parello, D., Sigler, M., Temam, O., Vasilache, N.: Facilitating the search for compositions of program transformations. In: ICS, June 2005, pp. 151–160 (2005)
DeHon, A.: The density advantage of configurable computing. IEEE Computer 33(4), 41–49 (2000)
Devos, H., Eeckhaut, H., Schrauwen, B., Christiaens, M., Stroobandt, D.: Ever considered SystemC? In: ProRISC Workshop, pp. 358–363 (2004)
Fursin, G.: Iterative Compilation and Performance Prediction for Numerical Applications. PhD thesis, University of Edinburgh (2004)
Girbal, S.: Optimisation d’applications - Composition de transformations de programme: modèle et outils. PhD thesis, l’Université de Paris XI Orsay (2005)
Guillou, A.C., Quinton, P., Risset, T.: Hardware synthesis for systems of recurrence equations with multi-dimensional schedule. International Journal of Embedded Systems (to appear, 2005)
Hannig, F., Dutta, H., Teich, J.: Mapping a class of dependence algorithms to coarse-grained reconfigurable arrays: Architectural parameters and methodology. International Journal of Embedded Systems 2(1/2), 114–127 (2006)
Huang, C.-T., Tseng, P.-C., Chen, L.-G.: Analysis and VLSI architecture for 1-D and 2-D discrete wavelet transform. IEEE Transactions on Signal Processing 53(4), 1575–1586 (2005)
Knijnenburg, P.M.W., Kisuki, T., O’Boyle, M.F.P.: Iterative compilation. In: Deprettere, F., Teich, J., Vassiliadis, S. (eds.) Embedded Processor Design Challenges. LNCS, vol. 2268, pp. 171–187. Springer, Heidelberg (2002)
Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7), 674–693 (1989)
Martonosi, M., Gupta, A., Anderson, T.E.: Tuning memory performance of sequential and parallel programs. IEEE Computer (April 1995)
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996), http://www.acm.org/pubs/toc/Abstracts/toplas/233564.html
Rau, B.R., Schlansker, M.S.: Embedded computer architecture and automation. IEEE Computer 34(4), 75–83 (2001)
Turjan, A., Kienhuis, B., Deprettere, E.: Translating affine nested-loop programs to process networks. In: CASES (2004), http://www.liacs.nl/~aturjan/publications.html
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: PLDI. SIGPLAN Notices, vol. 26, pp. 30–44 (1991), http://suif.stanford.edu/papers/wolf91a.ps
Wolfe, M.: Experiences with data dependence abstractions. In: ICS, Cologne, West Germany, pp. 321–329 (1991), doi:10.1145/109025.109104
Yu, Y., D’Hollander, E.H.: Loop parallelization using the 3D iteration space visualizer. Journal of Visual Languages and Computing 12, 163–181 (2001)
Zervas, N.D., Anagnostopoulos, G.P., Spiliotopoulos, V., Andreopoulos, Y., Goutis, C.E.: Evaluation of design alternatives for the 2D-discrete wavelet transform. IEEE Transactions on Circuits and Systems for Video Technology 11(12), 1246–1262 (2001)
Zissulescu, C., Kienhuis, B., Deprettere, E.: Expression synthesis in process networks generated by LAURA. In: ASAP, July 2005, pp. 15–21 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Devos, H., Beyls, K., Christiaens, M., Van Campenhout, J., D’Hollander, E.H., Stroobandt, D. (2007). Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-71528-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71527-6
Online ISBN: 978-3-540-71528-3
eBook Packages: Computer ScienceComputer Science (R0)