iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://api.crossref.org/works/10.1145/3319393
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,10]],"date-time":"2024-07-10T02:58:09Z","timestamp":1720580289763},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2019,6,17]],"date-time":"2019-06-17T00:00:00Z","timestamp":1560729600000},"content-version":"vor","delay-in-days":290,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"ARM Ltd"},{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"crossref","award":["EP\/K026399\/1 and EP\/M506485\/1"],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput. Syst."],"published-print":{"date-parts":[[2018,8,31]]},"abstract":"Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited.<\/jats:p>\n This article develops a novel compiler pass to automatically generate software prefetches for indirect memory accesses, a special class of irregular memory accesses often seen in high-performance workloads. We evaluate this across a wide set of systems, all of which gain benefit from the technique. We then evaluate the extent to which good prefetch instructions are architecture dependent and the class of programs that are particularly amenable. Across a set of memory-bound benchmarks, our automated pass achieves average speedups of 1.3\u00d7 for an Intel Haswell processor, 1.1\u00d7 for both an ARM Cortex-A57 and Qualcomm Kryo, 1.2\u00d7 for a Cortex-72 and an Intel Kaby Lake, and 1.35\u00d7 for an Intel Xeon Phi Knight\u2019s Landing, each of which is an out-of-order core, and performance improvements of 2.1\u00d7 and 2.7\u00d7 for the in-order ARM Cortex-A53 and first generation Intel Xeon Phi.<\/jats:p>","DOI":"10.1145\/3319393","type":"journal-article","created":{"date-parts":[[2019,6,18]],"date-time":"2019-06-18T12:14:26Z","timestamp":1560860066000},"page":"1-34","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Software Prefetching for Indirect Memory Accesses"],"prefix":"10.1145","volume":"36","author":[{"given":"Sam","family":"Ainsworth","sequence":"first","affiliation":[{"name":"University of Cambridge, UK"}]},{"given":"Timothy M.","family":"Jones","sequence":"additional","affiliation":[{"name":"University of Cambridge, UK"}]}],"member":"320","published-online":{"date-parts":[[2019,6,17]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Thomas Mueller. 2012. What integer hash function are good that accepts an integer hash key? Stack Overflow. Retrieved from http:\/\/stackoverflow.com\/questions\/664014\/what-integer-hash-function-are-good-that-accepts-an-integer-hash-key#12996028. Thomas Mueller. 2012. What integer hash function are good that accepts an integer hash key? Stack Overflow. Retrieved from http:\/\/stackoverflow.com\/questions\/664014\/what-integer-hash-function-are-good-that-accepts-an-integer-hash-key#12996028."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201917)","author":"Ainsworth S."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173189"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201901)","author":"Annavaram Murali"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/125826.125925"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201901)","author":"Cahoon B."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande (JGI\u201902)","author":"Cahoon Brendon"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/106972.106979"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272743.1272747"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/143365.143486"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605427"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Babak Falsafi and Thomas F. Wenisch. 2014. A primer on hardware prefetching. Synth. Lect. Comput. Arch. 9 1 (2014). Babak Falsafi and Thomas F. Wenisch. 2014. A primer on hardware prefetching. Synth. Lect. Comput. Arch. 9 1 (2014).","DOI":"10.2200\/S00581ED1V01Y201405CAC028"},{"key":"e_1_2_1_13_1","unstructured":"Andrei Frumusanu. 2016. The ARM Cortex A73\u2014Artemis Unveiled. Retrieved from http:\/\/www.anandtech.com\/show\/10347\/arm-cortex-a73-artemis-unveiled\/2. Andrei Frumusanu. 2016. The ARM Cortex A73\u2014Artemis Unveiled. Retrieved from http:\/\/www.anandtech.com\/show\/10347\/arm-cortex-a73-artemis-unveiled\/2."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2581122.2544161"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201914)","author":"Khan M."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2015.35"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/605432.605415"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Kim J."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540748"},{"key":"e_1_2_1_20_1","unstructured":"Rakesh Krishnaiyer. 2012. Compiler Prefetching for the Intel Xeon Phi coprocessor. Retrieved from https:\/\/software.intel.com\/sites\/default\/files\/managed\/54\/77\/5.3-prefetching-on-mic-update.pdf. Rakesh Krishnaiyer. 2012. Compiler Prefetching for the Intel Xeon Phi coprocessor. Retrieved from https:\/\/software.intel.com\/sites\/default\/files\/managed\/54\/77\/5.3-prefetching-on-mic-update.pdf."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the International Parallel and Distributed Processing Symposium (IPDPSW\u201913)","author":"Krishnaiyer R."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628118"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/977395.977673"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2133382.2133384"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201995)","author":"Lipasti Mikko H."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201996)","author":"Luk Chi-Keung"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626407002843"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1188455.1188677"},{"key":"e_1_2_1_29_1","unstructured":"V. Malhotra and C. Kozyrakis. 2006. Library-Based Prefetching for Pointer-Intensive Applications. Technical Report. Computer Systems Laboratory Stanford University. V. Malhotra and C. Kozyrakis. 2006. Library-Based Prefetching for Pointer-Intensive Applications. Technical Report. Computer Systems Laboratory Stanford University."},{"key":"e_1_2_1_30_1","unstructured":"John D. McCalpin. 2013. Native Computing and Optimization on the Intel Xeon Phi Coprocessor. Retrieved from https:\/\/portal.tacc.utexas.edu\/documents\/13601\/933270\/MIC_Native_2013-11-16.pdf. John D. McCalpin. 2013. Native Computing and Optimization on the Intel Xeon Phi Coprocessor. Retrieved from https:\/\/portal.tacc.utexas.edu\/documents\/13601\/933270\/MIC_Native_2013-11-16.pdf."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/377792.377856"},{"key":"e_1_2_1_32_1","unstructured":"Todd C. Mowry. 1994. Tolerating Latency Through Software-Controlled Data Prefetching. Ph.D. Dissertation. Stanford University Computer Systems Laboratory. Todd C. Mowry. 1994. Tolerating Latency Through Software-Controlled Data Prefetching. Ph.D. Dissertation. Stanford University Computer Systems Laboratory."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/143365.143488"},{"key":"e_1_2_1_34_1","unstructured":"Richard C. Murphy Kyle B. Wheeler Brian W. Barrett and James A. Ang. May 5 2010. Introducing the Graph 500. Cray User\u2019s Group (CUG) (May 5 2010). Richard C. Murphy Kyle B. Wheeler Brian W. Barrett and James A. Ang. May 5 2010. Introducing the Graph 500. Cray User\u2019s Group (CUG) (May 5 2010)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2611354.2611365"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201998)","author":"Roth Amir"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201915)","author":"Shevgoor M."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the IEEE International Conference on Data Engineering (ICDE\u201913)","author":"Teubner Jens"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the IEEE International Conference on Computer Design (ICCD\u201999)","author":"VanderWiel S. P."},{"key":"e_1_2_1_40_1","unstructured":"Vish Viswanathan. 2014. Disclosure of H\/W prefetcher control on some Intel processors. Retrieved from https:\/\/software.intel.com\/en-us\/articles\/disclosure-of-hw-prefetcher-control-on-some-intel-processors. Vish Viswanathan. 2014. Disclosure of H\/W prefetcher control on some Intel processors. Retrieved from https:\/\/software.intel.com\/en-us\/articles\/disclosure-of-hw-prefetcher-control-on-some-intel-processors."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45937-5_22"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830807"}],"container-title":["ACM Transactions on Computer Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3319393","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T12:56:19Z","timestamp":1672577779000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3319393"}},"subtitle":["A Microarchitectural Perspective"],"short-title":[],"issued":{"date-parts":[[2018,8,31]]},"references-count":42,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2018,8,31]]}},"alternative-id":["10.1145\/3319393"],"URL":"http:\/\/dx.doi.org\/10.1145\/3319393","relation":{},"ISSN":["0734-2071","1557-7333"],"issn-type":[{"value":"0734-2071","type":"print"},{"value":"1557-7333","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,8,31]]},"assertion":[{"value":"2017-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-06-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}