Abstract
We present new performance models and more compact data structures for cache blocking when applied to sparse matrix-vector multiply (SpM × V). We extend our prior models by relaxing the assumption that the vectors fit in cache and find that the new models are accurate enough to predict optimum block sizes. In addition, we determine criteria that predict when cache blocking improves performance. We conclude with architectural suggestions that would make memory systems execute SpM × V faster.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bilmes, J., Asanović, K., Demmel, J., Lam, D., Chin, C.W. PHiPAC: A portable, high- performance, ANSI C coding methodology and its application to matrix multiply, University of Tennessee, LAPACK Working Note 111 (1996)
Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: Proceedings of supercomputing, November (2000)
Fraguela, B.B., Doallo, R., Zapata, E.L.: Memory hierarchy performance prediction for sparse blocked algorithms. Parallel Proc Lett, 9(3) (1999)
Gropp, W.D., Kasushik, D.K., Keyes, D.E., Smith, B.F.: Towards realistic bounds for implicit CFD codes. In: Proceedings of parallel computational fluid dynamics, pp. 241–248 (1999)
Heber, G., Dolgert, A.J., Alt, M., Mazurkiewicz, K.A., Stringer, L.: Fracture mechanics on the Intel Itanium architecture: a case study. In: Workshop on EPIC architectures and compiler technology (ACM MICRO 34), Austin, TX (2001)
Heras, D.B., Perez, V.B., Dominguez, J.C.C., Rivera, F.F.: Modeling and improving locality for irregular problems: sparse matrix-vector product on cache memories as a case study. In: HPCN Europe, pp. 201–210 (1999)
Im, E.-J.: Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May (2000)
Nishtala, R., Vuduc, R.W., Demmel, J.W., Yelick, K.A.: Performance modeling and analysis of cache blocking in sparse matrix vector multiply. Technical report (UCB/CSD-04-1335), University of California, Berkeley, EECS Dept. (2004)
Saad, Y. SPARSKIT: A basic toolkit for sparse matrix computations (1994) www.cs.umn. edu/Research/arpa/SPARSKIT/sparskit.html
Saavedra-Barrera, R.H.: CPU performance evaluation and execution time prediction using narrow spectrum benchmarking. PhD thesis, University of California, Berkeley, February (1992)
Snavely, A., Carrington, L., Wolter, N.: Modeling application performance by convolving machine signatures with application profiles (2001)
Temam, O., Jalby, W.: Characterizing the behavior of sparse algorithms on caches. In: Proceedings of supercomputing ’92 (1992)
Vuduc, R.W. OSKI: Optimized Sparse Kernel Interface (2005) http://bebop.cs.berkeley. edu/oski/
Vuduc, R., Demmel, J.W., Yelick, K.A., Kamil, S., Nishtala, R., Lee, B.: Performance optimizations and bounds for sparse matrix-vector multiply. In: Proceedings of supercomputing, Baltimore, MD, USA, November (2002)
Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley (2003)
Whaley, C., Dongarra, J.: Automatically tuned linear algebra software. In: Proceedings of supercomputer (1998)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nishtala, R., Vuduc, R.W., Demmel, J.W. et al. When cache blocking of sparse matrix vector multiply works and why. AAECC 18, 297–311 (2007). https://doi.org/10.1007/s00200-007-0038-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00200-007-0038-9