Abstract
The continuous demand for higher computational performance and the stagnating developments in the general purpose processor landscape have led to a surge in interest for highly specialized and efficient hardware. Combined with the rising popularity of parameterizable hardware, a new opportunity to optimize these architectures for particular workloads arises, largely driven by the RISC-V Instruction Set Architecture (ISA). This work present an application-specific optimization methodology for general purpose processors, enabling the development of architectures which are faster and more efficient for their designated workloads. Driven by the Cache-Aware Roofline Model (CARM) insights, the methodology guides the configuration of the memory and computational subsystems of the processor. We apply this methodology to two applications, demonstrating up to a \(2.67\times \) performance increase and a \(1.34\times \) improvement to energy efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
PolyBench/C. https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/
Agrawal, R., et al.: FAB: An FPGA-based Accelerator for Bootstrappable Fully Homomorphic Encryption (2022). arXiv:2207.11872
Bobda, C., et al.: The future of FPGA acceleration in datacenters and the cloud. ACM Trans. Reconfigurable Technol. Syst. 15(3), 1–42 (2022)
Cavalcante, M., et al.: Ara: a 1-GHz+ scalable and energy-efficient RISC-V vector processor with multiprecision floating-point support in 22-nm FD-SOI. IEEE Trans. Very Large Scale Integr. Syst. 28(2), 530–543 (2020)
Chen, X., et al.: ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines. Technical report, arXiv:2203.02676, arXiv (2022)
Ilic, A., Pratas, F., Sousa, L.: Cache-aware roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014)
Kolodziej, S., et al.: The SuiteSparse matrix collection website interface. J. Open Source Softw. 4(35), 1244 (2019)
Kulkarni, A.V., Barde, C.R.: A Survey on Performance Modelling and Optimization Techniques for SpMV on GPUs, vol. 5 (2014)
Li, S., et al.: McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of the IEEE/ACM International Symposium on Microarchitecture, pp. 469–480. ACM, New York (2009)
Li, S., Liu, D., Liu, W.: Optimized data reuse via reordering for sparse matrix-vector multiplication on FPGAs. In: IEEE/ACM International Conference on Computer Aided Design (ICCAD), Munich, Germany, pp. 1–9. IEEE (2021)
Lowe-Power, J., et al.: The gem5 Simulator: V20.0+. arXiv:2007.03152 (2020)
Mantovani, F., et al.: Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study (2023). arXiv:2306.01797
Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 898–907 (2017)
Rodrigues, A., Ilic, A., Sousa, L.: Performance modelling-driven optimization of RISC-V hardware for efficient SpMV. In: Bienz, A., Weiland, M., Baboulin, M., Kruse, C. (eds.) High Performance Computing. ISC High Performance 2023. LNCS, vol. 13999, pp. 486–499. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-40843-4_36
Sato, M., et al.: Co-design for A64FX manycore processor and “Fugaku”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, pp. 1–15. IEEE (2020)
Shalf, J.: The future of computing beyond Moore’s Law. Philos. Trans. Royal Soc. A Math. Phys. Eng. Sci. 378(2166), 20190061 (2020)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Technical report, 1407078 (2009)
Zhao, J., et al.: SonicBOOM: The 3rd Generation Berkeley Out-of-Order Machine, p. 7 (2020)
Acknowledgements
This project has received funding from the European High Performance Computing Joint Undertaking (JU) under Framework Partnership Agreement No 800928 and Specific Grant Agreement No 101036168 (EPI SGA2), Grant Agreement No 956213 (SparCity) and Grant Agreement No 101092877 (SYCLOPS). It also received funding from FCT (Fundação para a Ciência e a Tecnologia, Portugal), through the UIDB/50021/2020 project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rodrigues, A., Sousa, L., Ilic, A. (2024). A Performance Modelling-Driven Approach to Hardware Resource Scaling. In: Zeinalipour, D., et al. Euro-Par 2023: Parallel Processing Workshops. Euro-Par 2023. Lecture Notes in Computer Science, vol 14352. Springer, Cham. https://doi.org/10.1007/978-3-031-48803-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-48803-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48802-3
Online ISBN: 978-3-031-48803-0
eBook Packages: Computer ScienceComputer Science (R0)