Abstract
Data intensive applications like machine learning or big data analysis have stressed the requirements on memory subsystems. They involve computational kernels whose performance is not limited by the algorithmic complexity, but by the large amount of data they need to process. To counteract the growing gap between computing power and memory bandwidth, near-memory processing techniques have been addressed to improve the performance in such applications significantly. In this paper, we leverage a general purpose processor extended with a reconfigurable framework to execute hardware-accelerated instructions. This framework features a high-bandwidth memory interface to the nearest memory controller, allowing for greatly increased bandwidth compared to the standard system bus. We introduce region-based data processing, which allows to trigger operations by merely storing data and is especially suitable for large many-core designs. We show two different approaches to trigger the architecture for near-memory operations, one using interrupts for software-assisted processing and one directly interfacing with the hardware accelerator. Our evaluations show a performance gain of 72% on SAD kernels, with memory performance improved by 48%. Benchmarking AES encryption, we can show a speed up of 70%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Corda, S., Veenboer, B., Awan, A.J., Kumar, A., Jordans, R., Corporaal, H.: Near memory acceleration on high resolution radio astronomy imaging. In: 2020 9th Mediterranean Conference on Embedded Computing (MECO), pp. 1–6 (2020)
Damschen, M., Rapp, M., Bauer, L., Henkel, J.: i-Core: a runtime-reconfigurable processor platform for cyber-physical systems (2020)
Grudnitsky, A., Bauer, L., Henkel, J.: COREFAB: concurrent reconfigurable fabric utilization in heterogeneous multi-core systems. In: 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pp. 1–10 (2014)
Gu, P., et al.: iPIM: programmable in-memory image processing accelerator using near-bank architecture. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 804–817 (2020)
Harbaum, T., et al.: Auto-Si: an adaptive reconfigurable processor with run-time loop detection and acceleration. In: 2017 30th IEEE International System-on-Chip Conference (SOCC), pp. 153–158 (2017)
Koo, G., et al.: Summarizer: trading communication with computing near storage. In: 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 219–231 (2017)
University of Michigan: MiBench Version 1.0 (2001). https://vhosts.eecs.umich.edu/mibench/
Wong, S., Vassiliadis, S., Cotofana, S.: A sum of absolute differences implementation in FPGA hardware. In: Proceedings 28th Euromicro Conference, pp. 183–188 (2002)
Acknowledgement
This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Center Invasive Computing [SFB/TR 89].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lesniak, F., Kreß, F., Becker, J. (2021). Transparent Near-Memory Computing with a Reconfigurable Processor. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2021. Lecture Notes in Computer Science(), vol 12700. Springer, Cham. https://doi.org/10.1007/978-3-030-79025-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-79025-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79024-0
Online ISBN: 978-3-030-79025-7
eBook Packages: Computer ScienceComputer Science (R0)