Abstract
Interpreters are well researched in the field of compiler construction and program generation. They are typically used to realize program execution of different programming languages without a compilation step. However, they can also be used to model complex rule-based simulations: The interpreter applies all rules one after another. These can be iteratively applied on a globally updated state in order to get the final simulation result. Many simulations for domain-specific problems already leverage the parallel processing capabilities of Graphics Processing Units (GPUs). They use hardware-specific tuned rule implementations to achieve maximum performance. However, every interpreter-based system requires a high-level algorithm that detects active rules and determines when they are evaluated. A common approach in this context is the use of different interpreter routines for every problem domain. Executing such functions in an efficient way mainly involves dealing with hardware peculiarities like thread divergences, ALU computations and memory operations. Furthermore, the interpreter is often executed on multiple states in parallel these days. This is particularly important for heuristic search or what-if analyses, for instance. In this paper, we present a novel and easy-to-implement method based on thread compaction to realize generic rule-based interpreters in an efficient way on GPUs. It is optimized for many states using a specially designed memory layout. Benchmarks on our evaluation scenarios show that the performance can be significantly increased in comparison to existing commonly-used implementations.
Similar content being viewed by others
Notes
A single SIMT unit will be referred to as a warp in the scope of this paper. Furthermore, every warp will be visualized with the help of eight lanes (threads). Groups will be visualized with the help of three warps.
This perfectly fits to the common bank-size configuration of 4 bytes on common GPUs.
References
AMD: AMD Vega Instruction Set Architecture (2019)
Billeter, M., Olsson, O., Assarsson, U.: Efficient stream compaction on wide simd many-core architectures. In: Proceedings of the Conference on High Performance Graphics 2009, HPG ’09, pp. 159–166. Association for Computing Machinery, New York, NY, USA (2009)
Campeotto, F., Dal Palù, A., Dovier, A., Fioretto, F., Pontelli, E.: Exploring the use of GPUs in constraint solving. In: Flatt, M., Guo, H.F. (eds.) Practical Aspects of Declarative Languages, pp. 152–167. Springer, Cham (2014)
Fung, W.W.L., Aamodt, T.M.: Thread block compaction for efficient SIMT control flow. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp. 25–36 (2011)
Granvilliers, L., Hains, G.: A conservative scheme for parallel interval narrowing. Inf. Process. Lett. 74, 141–146 (2000)
Hoberock, J., Lu, V., Jia, Y., Hart, J.: Stream compaction for deferred shading, pp. 173–180 (2009)
Hughes, D.M., Lim, I.S., Jones, M.W., Knoll, A., Spencer, B.: Ink-compact: in-kernel stream compaction and its application to multi-kernel data visualization on general-purpose GPUs. Comput. Graph. Forum 32(6), 178–188 (2013)
Hunger, R.: Floating Point Operations in Matrix–Vector Calculus. Technical report, Technische Universität München (2007)
Kelager, M.: Lagrangian Fluid Dynamics Using Smoothed Particle Hydrodynamics. Cambridge Monographs on Mechanics, vol. 77. University of Copenhagen, Copenhagen (2006)
Köster, M., Groß, J., Krüger, A.: FANG: fast and efficient successor-state generation for heuristic optimization on GPUs. In: International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP). Springer (2019)
Köster, M., Groß, J., Krüger, A.: Parallel tracking and reconstruction of states in heuristic optimization systems on GPUs. In: International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT-2019). IEEE (2019)
Köster, M., Krüger, A.: Adaptive position-based fluids: improving performance of fluid simulations for real-time applications. Int. J. Comput. Graph. Animat. 6, 01–16 (2016)
Köster, M., Leißa, R., Hack, S., Membarth, R., Slusallek, P.: Code refinement of stencil codes. Parallel Process. Lett. 24, 1441003 (2014)
Lustig, D., Sahasrabuddhe, S., Giroux, O.: A formal analysis of the NVIDIA PTX memory consistency model. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, pp. 257–270. Association for Computing Machinery, New York, NY, USA (2019)
Macklin, M., Müller, M.: Position based fluids. ACM Trans. Graph. 32, 1–104 (2013)
Müller, M., Heidelberger, B., Hennix, M., Ratcliff, J.: Position based dynamics. J. Vis. Commun. Image Represent. 18(2), 109–118 (2007)
NVIDIA: Faster Parallel Reductions on Kepler (2014)
NVIDIA: CUDA C Programming Guide v10 (2019)
Pierce, B.C.: Types and Programming Languages, 1st edn. The MIT Press, Cambridge (2002)
Rhu, M., Erez, M.: Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation. SIGARCH Comput. Archit. News 41(3), 356–367 (2013)
Ruiz-Andino, A., Araujo, L., Sáenz-Pérez, F., Ruz, J.J.: Parallel arc-consistency for functional constraints. In: Implementation Technology for Programming Languages based on Logic (1998)
Wald, I.: Active thread compaction for GPU path tracing. In: Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics, HPG ’11, pp. 51–58. Association for Computing Machinery, New York, NY, USA (2011)
Acknowledgements
The authors would like to thank Wladimir Panfilenko and Thomas Schmeyer for their suggestions and feedback regarding our method. Furthermore, we would like to thank Gian-Luca Kiefer for additional feedback on the paper. This work was funded by the German Ministry of Education and Research: Project Hybr-iT: Hybrid and Intelligent Human–Robot Collaboration (Grant Number 01IS16026A).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Köster, M., Groß, J. & Krüger, A. Massively Parallel Rule-Based Interpreter Execution on GPUs Using Thread Compaction. Int J Parallel Prog 48, 675–691 (2020). https://doi.org/10.1007/s10766-020-00670-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-020-00670-2