iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://api.crossref.org/works/10.1145/3399732

{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T16:50:17Z","timestamp":1725555017980},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["SPP1648"],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2020,9,30]]},"abstract":"\n The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today\u2019s multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation that eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-\n k<\/jats:italic>\n coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially. By means of a parameterized roofline model, we analyze the SymmSpMV performance in detail and discuss outliers. While we focus on SymmSpMV in this article, our algorithm and software are applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring.\n <\/jats:p>","DOI":"10.1145\/3399732","type":"journal-article","created":{"date-parts":[[2020,6,29]],"date-time":"2020-06-29T20:33:36Z","timestamp":1593462816000},"page":"1-37","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":69,"title":["A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication"],"prefix":"10.1145","volume":"7","author":[{"given":"Christie","family":"Alappat","sequence":"first","affiliation":[{"name":"Department of Computer Science, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg, Erlangen, Germany"}]},{"given":"Achim","family":"Basermann","sequence":"additional","affiliation":[{"name":"Simulation and Software Technology, German Aerospace Center, Linder Hoehe, Cologne, Germany"}]},{"given":"Alan R.","family":"Bishop","sequence":"additional","affiliation":[{"name":"Science, Technology and Engineering Directorate, Los Alamos National Laboratory, Los Alamos, USA"}]},{"given":"Holger","family":"Fehske","sequence":"additional","affiliation":[{"name":"Institute of Physics, University of Greifswald, Greifswald, Germany"}]},{"given":"Georg","family":"Hager","sequence":"additional","affiliation":[{"name":"Erlangen Regional Computing Center, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg"}]},{"given":"Olaf","family":"Schenk","sequence":"additional","affiliation":[{"name":"Institute of Computational Science, Universit\u00e0 della Svizzera italiana, Lugano, Switzerland"}]},{"given":"Jonas","family":"Thies","sequence":"additional","affiliation":[{"name":"Simulation and Software Technology, German Aerospace Center, Linder Hoehe, Cologne, Germany"}]},{"given":"Gerhard","family":"Wellein","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg, Erlangen, Germany"}]}],"member":"320","published-online":{"date-parts":[[2020,6,29]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Andreas Alvermann. 2019. ScaMaC: The Scalable Matrix Collection. Retrieved from https:\/\/bitbucket.org\/essex\/matrixcollection\/. Andreas Alvermann. 2019. ScaMaC: The Scalable Matrix Collection. Retrieved from https:\/\/bitbucket.org\/essex\/matrixcollection\/."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1137\/080732158"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2007.08.002"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 21st Symposium on Parallelism in Algorithms and Architectures (SPAA\u201909)","author":"Bulu\u00e7 Aydin","unstructured":"Aydin Bulu\u00e7 , Jeremy T. Fineman , Matteo Frigo , John R. Gilbert , and Charles E. Leiserson . 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks . In Proceedings of the 21st Symposium on Parallelism in Algorithms and Architectures (SPAA\u201909) . ACM, New York, NY, 233--244. DOI:https:\/\/doi.org\/10.1145\/1583991.1584053 10.1145\/1583991.1584053 Aydin Bulu\u00e7, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the 21st Symposium on Parallelism in Algorithms and Architectures (SPAA\u201909). ACM, New York, NY, 233--244. DOI:https:\/\/doi.org\/10.1145\/1583991.1584053"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the IEEE International Parallel 8 Distributed Processing Symposium (IPDPS\u201911)","author":"Bulu\u00e7 Aydin","year":"2011","unstructured":"Aydin Bulu\u00e7 , Samuel Williams , Leonid Oliker , and James Demmel . 2011 . Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication . In Proceedings of the IEEE International Parallel 8 Distributed Processing Symposium (IPDPS\u201911) . IEEE Computer Society, Washington, DC, 721--733. DOI:https:\/\/doi.org\/10.1109\/IPDPS. 2011.73 10.1109\/IPDPS.2011.73 Aydin Bulu\u00e7, Samuel Williams, Leonid Oliker, and James Demmel. 2011. Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In Proceedings of the IEEE International Parallel 8 Distributed Processing Symposium (IPDPS\u201911). IEEE Computer Society, Washington, DC, 721--733. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2011.73"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.780863"},{"key":"e_1_2_1_7_1","volume-title":"Several Strategies for Reducing the Bandwidth of Matrices","author":"Cuthill Elizabeth","unstructured":"Elizabeth Cuthill . 1972. Several Strategies for Reducing the Bandwidth of Matrices . Springer US , Boston, MA , 157--166. DOI:https:\/\/doi.org\/10.1007\/978-1-4615-8675-3_14 10.1007\/978-1-4615-8675-3_14 Elizabeth Cuthill. 1972. Several Strategies for Reducing the Bandwidth of Matrices. Springer US, Boston, MA, 157--166. DOI:https:\/\/doi.org\/10.1007\/978-1-4615-8675-3_14"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/568522.568523"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3134442"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(84)90380-6"},{"key":"e_1_2_1_12_1","unstructured":"Fujitsu. 2019. Retrieved from https:\/\/www.fujitsu.com\/global\/Images\/post-k_supercomputer_with_fujitsu%27s_original_cpu_a64fx_powered_by_arm_isa.pdf. Fujitsu. 2019. Retrieved from https:\/\/www.fujitsu.com\/global\/Images\/post-k_supercomputer_with_fujitsu%27s_original_cpu_a64fx_powered_by_arm_isa.pdf."},{"key":"e_1_2_1_13_1","volume-title":"C (Nov.","author":"Galgon Martin","year":"2015","unstructured":"Martin Galgon , Lukas Kr\u00e4mer , Jonas Thies , Achim Basermann , and Bruno Lang . 2015. On the parallel iterative solution of linear systems arising in the FEAST algorithm for computing inner eigenvalues. Parallel Comput. 49 , C (Nov. 2015 ), 153--163. DOI:https:\/\/doi.org\/10.1016\/j.parco.2015.06.005 10.1016\/j.parco.2015.06.005 Martin Galgon, Lukas Kr\u00e4mer, Jonas Thies, Achim Basermann, and Bruno Lang. 2015. On the parallel iterative solution of linear systems arising in the FEAST algorithm for computing inner eigenvalues. Parallel Comput. 49, C (Nov. 2015), 153--163. DOI:https:\/\/doi.org\/10.1016\/j.parco.2015.06.005"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1002\/1096-9128(200010)12:12<1131::AID-CPE528>3.0.CO;2-2"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/646667.699892"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2513109.2513110"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.43"},{"key":"e_1_2_1_18_1","volume-title":"Smith","author":"Gropp Willam D.","year":"2000","unstructured":"Willam D. Gropp , Dinesh K. Kaushik , David E. Keyes , and Barry F . Smith . 2000 . Towards realistic performance bounds for implicit CFD codes. In Parallel Computational Fluid Dynamics 1999, D. Keyes, J. Periaux, A. Ecer, N. Satofuka, and P. Fox (Eds.). Elsevier , 241--248. DOI:https:\/\/doi.org\/10.1016\/B978-044482851-4.50030-X 10.1016\/B978-044482851-4.50030-X Willam D. Gropp, Dinesh K. Kaushik, David E. Keyes, and Barry F. Smith. 2000. Towards realistic performance bounds for implicit CFD codes. In Parallel Computational Fluid Dynamics 1999, D. Keyes, J. Periaux, A. Ecer, N. Satofuka, and P. Fox (Eds.). Elsevier, 241--248. DOI:https:\/\/doi.org\/10.1016\/B978-044482851-4.50030-X"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004041296"},{"key":"e_1_2_1_20_1","unstructured":"Intel. 2019. Intel Math Kernel Library. Retrieved from https:\/\/software.intel.com\/en-us\/mkl. Intel. 2019. Intel Math Kernel Library. Retrieved from https:\/\/software.intel.com\/en-us\/mkl."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2012.51"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(94)90004-3"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_2_1_24_1","unstructured":"Kokkos. 2018. Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model\u2014Parallel Execution and Memory Abstraction. Retrieved from http:\/\/trilinos.sandia.gov\/packages\/kokkos. Kokkos. 2018. Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model\u2014Parallel Execution and Memory Abstraction. Retrieved from http:\/\/trilinos.sandia.gov\/packages\/kokkos."},{"key":"e_1_2_1_25_1","volume-title":"Chebyshev filter diagonalization on modern manycore processors and GPGPUs","author":"Kreutzer Moritz","unstructured":"Moritz Kreutzer , Dominik Ernst , Alan R. Bishop , Holger Fehske , Georg Hager , Kengo Nakajima , and Gerhard Wellein . 2018. Chebyshev filter diagonalization on modern manycore processors and GPGPUs . In High Performance Computing, Rio Yokota, Mich\u00e8le Weiland, David Keyes, and Carsten Trinitis (Eds.). Springer International Publishing , Cham , 329--349. DOI:https:\/\/doi.org\/10.1007\/978-3-319-92040-5_17 10.1007\/978-3-319-92040-5_17 Moritz Kreutzer, Dominik Ernst, Alan R. Bishop, Holger Fehske, Georg Hager, Kengo Nakajima, and Gerhard Wellein. 2018. Chebyshev filter diagonalization on modern manycore processors and GPGPUs. In High Performance Computing, Rio Yokota, Mich\u00e8le Weiland, David Keyes, and Carsten Trinitis (Eds.). Springer International Publishing, Cham, 329--349. DOI:https:\/\/doi.org\/10.1007\/978-3-319-92040-5_17"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1137\/130930352"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2010.02.003"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TEC.1961.5219222"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126931"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751209"},{"key":"e_1_2_1_31_1","volume-title":"C (Nov.","author":"Liu Weifeng","year":"2015","unstructured":"Weifeng Liu and Brian Vinter . 2015. Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors. Parallel Comput. 49 , C (Nov. 2015 ), 179--193. DOI:https:\/\/doi.org\/10.1016\/j.parco.2015.04.004 10.1016\/j.parco.2015.04.004 Weifeng Liu and Brian Vinter. 2015. Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors. Parallel Comput. 49, C (Nov. 2015), 179--193. DOI:https:\/\/doi.org\/10.1016\/j.parco.2015.04.004"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465013"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2620142"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2014.03.008"},{"key":"e_1_2_1_35_1","first-page":"1","article-title":"Megaman: Scalable manifold learning in Python","volume":"17","author":"McQueen James","year":"2016","unstructured":"James McQueen , Marina Meil\u0103 , Jacob VanderPlas , and Zhongyue Zhang . 2016 . Megaman: Scalable manifold learning in Python . J. Mach. Learn. Res. 17 , 148 (2016), 1 -- 5 . Retrieved from http:\/\/jmlr.org\/papers\/v17\/16-109.html. James McQueen, Marina Meil\u0103, Jacob VanderPlas, and Zhongyue Zhang. 2016. Megaman: Scalable manifold learning in Python. J. Mach. Learn. Res. 17, 148 (2016), 1--5. Retrieved from http:\/\/jmlr.org\/papers\/v17\/16-109.html.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1137\/14097135X"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/71.605773","article-title":"Parallel incremental graph partitioning","volume":"8","author":"Ou Chao-Wei","year":"1997","unstructured":"Chao-Wei Ou and Sanjay Ranka . 1997 . Parallel incremental graph partitioning . IEEE Trans. Parallel Distrib. Syst. 8 , 8 (Aug. 1997), 884--896. DOI:https:\/\/doi.org\/10.1109\/71.605773 10.1109\/71.605773 Chao-Wei Ou and Sanjay Ranka. 1997. Parallel incremental graph partitioning. IEEE Trans. Parallel Distrib. Syst. 8, 8 (Aug. 1997), 884--896. DOI:https:\/\/doi.org\/10.1109\/71.605773","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-07518-1_8"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.82"},{"key":"e_1_2_1_40_1","volume-title":"Iterative Methods for Sparse Linear Systems","author":"Saad Yousef","unstructured":"Yousef Saad . 2003. Iterative Methods for Sparse Linear Systems ( 2 nd ed.). Society for Industrial and Applied Mathematics . DOI:https:\/\/doi.org\/10.1137\/1.9780898718003 10.1137\/1.9780898718003 Yousef Saad. 2003. Iterative Methods for Sparse Linear Systems (2nd ed.). Society for Industrial and Applied Mathematics. DOI:https:\/\/doi.org\/10.1137\/1.9780898718003","edition":"2"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3218176.3218232"},{"key":"e_1_2_1_42_1","unstructured":"SpMP Development Team. 2015. Sparse Matrix Pre-processing Library. Retrieved from https:\/\/github.com\/IntelLabs\/SpMP. SpMP Development Team. 2015. Sparse Matrix Pre-processing Library. Retrieved from https:\/\/github.com\/IntelLabs\/SpMP."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.416.0711"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2010.38"},{"key":"e_1_2_1_45_1","volume-title":"A tutorial on spectral clustering. Stat. Comput. 17, 4 (01","author":"von Luxburg Ulrike","year":"2007","unstructured":"Ulrike von Luxburg . 2007. A tutorial on spectral clustering. Stat. Comput. 17, 4 (01 Dec. 2007 ), 395--416. DOI:https:\/\/doi.org\/10.1007\/s11222-007-9033-z 10.1007\/s11222-007-9033-z Ulrike von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 4 (01 Dec. 2007), 395--416. DOI:https:\/\/doi.org\/10.1007\/s11222-007-9033-z"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/16\/1\/071"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.12.006"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms (IA3\u201915)","author":"Yzelman Albert-Jan N.","year":"2015","unstructured":"Albert-Jan N. Yzelman . 2015 . Generalised vectorisation for sparse matrix-vector multiplication . In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms (IA3\u201915) . ACM, New York, NY, Article 6, 8 pages. DOI:https:\/\/doi.org\/10.1145\/2833179.2833185 10.1145\/2833179.2833185 Albert-Jan N. Yzelman. 2015. Generalised vectorisation for sparse matrix-vector multiplication. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms (IA3\u201915). ACM, New York, NY, Article 6, 8 pages. DOI:https:\/\/doi.org\/10.1145\/2833179.2833185"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1137\/080733243"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3399732","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T10:17:25Z","timestamp":1672568245000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3399732"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,29]]},"references-count":50,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,9,30]]}},"alternative-id":["10.1145\/3399732"],"URL":"http:\/\/dx.doi.org\/10.1145\/3399732","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"value":"2329-4949","type":"print"},{"value":"2329-4957","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,29]]},"assertion":[{"value":"2019-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-06-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}