Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

Martínez, Héctor; Catalán, Sandra; Igual, Francisco D.; Herrero, José R.; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2304.14480 (cs)

[Submitted on 27 Apr 2023]

Title:Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

Authors:Héctor Martínez, Sandra Catalán, Francisco D. Igual, José R. Herrero, Rafael Rodríguez-Sánchez, Enrique S. Quintana-Ortí

View PDF

Abstract:This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level, architecture-dependent kernels in BLAS (Basic Linear Algebra Subprograms). Specifically, we propose customizing the GEMM (general matrix multiplication) kernel, which is invoked from the blocked algorithms for relevant matrix factorizations in LAPACK, to improve performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage an analytical model to dynamically adapt the cache configuration parameters of the GEMM to the shape of the matrix operands. Additionally, we accommodate a flexible development of architecture-specific micro-kernels that allow us to further improve the utilization of the cache hierarchy.
Our experiments on two platforms, equipped with ARM (NVIDIA Carmel, Neon) and x86 (AMD EPYC, AVX2) multi-core processors, demonstrate the benefits of this approach in terms of better cache utilization and, in general, higher performance. However, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2304.14480 [cs.DC]
	(or arXiv:2304.14480v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2304.14480

Submission history

From: Sandra Catalan [view email]
[v1] Thu, 27 Apr 2023 19:44:30 UTC (458 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators