iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://api.crossref.org/works/10.1177/1094342016661865
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T04:04:15Z","timestamp":1729137855885,"version":"3.27.0"},"reference-count":26,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2016,8,1]],"date-time":"2016-08-01T00:00:00Z","timestamp":1470009600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2017,11]]},"abstract":" The use of accelerators in heterogeneous systems is an established approach in designing petascale applications. Today, Compute Unified Device Architecture (CUDA) offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both the CPU and the GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing hybrid MPI-CUDA programs for properties based on wait states, such as the critical path, a metric proven to identify application bottlenecks effectively. We developed a tool to construct a dependency graph based on an execution trace and the inherent dependencies of the programming models CUDA and Message Passing Interface (MPI). Thereafter, it detects wait states and attributes blame to responsible activities. Together with the property of being on the critical path, we can identify activities that are most viable for optimization. To evaluate the global impact of optimizations to critical activities, we predict the program execution using a graph-based performance projection. The developed approach has been demonstrated with suitable examples to be both scalable and correct. Furthermore, we establish a new categorization of CUDA inefficiency patterns ensuing from the dependencies between CUDA activities. <\/jats:p>","DOI":"10.1177\/1094342016661865","type":"journal-article","created":{"date-parts":[[2016,8,3]],"date-time":"2016-08-03T00:19:04Z","timestamp":1470183544000},"page":"485-498","update-policy":"http:\/\/dx.doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":7,"title":["Scalable critical-path analysis and optimization guidance for hybrid MPI-CUDA applications"],"prefix":"10.1177","volume":"31","author":[{"given":"Felix","family":"Schmitt","sequence":"first","affiliation":[{"name":"Center for Information Services and High Performance Computing (ZIH), Technische Universit\u00e4t Dresden, Germany"}]},{"given":"Robert","family":"Dietrich","sequence":"additional","affiliation":[{"name":"Center for Information Services and High Performance Computing (ZIH), Technische Universit\u00e4t Dresden, Germany"}]},{"given":"Guido","family":"Juckeland","sequence":"additional","affiliation":[{"name":"Center for Information Services and High Performance Computing (ZIH), Technische Universit\u00e4t Dresden, Germany"}]}],"member":"179","published-online":{"date-parts":[[2016,8,1]]},"reference":[{"key":"bibr1-1094342016661865","unstructured":"Adinetz A, Kraus J, Pleiter D (2013) NVIDIA application lab at J\u00fclich. InSiDE 11(1). Available at: http:\/\/inside.hlrs.de\/_old\/htm\/Edition_01_13\/article_26.html."},{"key":"bibr2-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2012.321"},{"volume-title":"Talk at Nvidia GPU technology conference GTC 2013","year":"2013","author":"Chabbi M","key":"bibr3-1094342016661865"},{"key":"bibr4-1094342016661865","unstructured":"CompuGreen L (2014) The Green500 List \u2013 November 2014. Available at: http:\/\/www.green500.org\/lists\/green201411."},{"volume-title":"Tools for exascale computing: challenges and strategies","year":"2011","author":"Daly J","key":"bibr5-1094342016661865"},{"key":"bibr6-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2010.30"},{"key":"bibr7-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1145\/1513895.1513901"},{"issue":"6","key":"bibr8-1094342016661865","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1002\/cpe.1556","volume":"22","author":"Geimer M","year":"2010","journal-title":"Concurrency and Computation: Practice and Experience"},{"key":"bibr9-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-68564-7_9"},{"key":"bibr10-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1145\/359545.359563"},{"key":"bibr11-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2008.917730"},{"key":"bibr12-1094342016661865","first-page":"341","volume-title":"Advances in Parallel Computing","volume":"19","author":"Mayanglambam S","year":"2010"},{"key":"bibr13-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1145\/238020.238023"},{"key":"bibr14-1094342016661865","unstructured":"Mendes CL (1993) Performance prediction by trace transformation. In: Fifth Brazilian symposium on computer architecture. Florianopolis, September 1993. Available at: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.55.488&rep=rep1&type=pdf"},{"key":"bibr15-1094342016661865","unstructured":"Message Passing Interface Forum (2009) MPI: A message-passing interface standard, version 2.2. Available at: https:\/\/www.mpi-forum.org\/docs\/mpi-2.2\/mpi22-report.pdf."},{"key":"bibr16-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1145\/1095430.1081729"},{"first-page":"85","volume-title":"Competence in High Performance Computing 2010","year":"2012","author":"an Mey D","key":"bibr17-1094342016661865"},{"key":"bibr18-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1109\/71.80132"},{"key":"bibr19-1094342016661865","unstructured":"NVIDIA Corporation (2013) Nsight visual studio edition 3.2 user guide. Available at: http:\/\/docs.nvidia.com\/nsight-visual-studio-edition\/3.2\/Nsight_Visual_Studio_Edition_User_Guide.htm (accessed 22 September 2015)."},{"key":"bibr20-1094342016661865","unstructured":"NVIDIA Corporation (2014) CUDA Toolkit Documentation \u2013 CUPTI. Available at: http:\/\/docs.nvidia.com\/cuda\/cupti\/index.html (accessed 22 September 2015)."},{"key":"bibr21-1094342016661865","unstructured":"NVIDIA Corporation (2015) Profiler user\u2019s guide. Available at: http:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/ (accessed 22 September 2015)."},{"key":"bibr22-1094342016661865","unstructured":"Performance Research Lab (2010) ParaProf user\u2019s manual. Available at: http:\/\/www.cs.uoregon.edu\/research\/tau\/docs\/paraprof\/ (accessed 22 September 2015)."},{"key":"bibr23-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.47"},{"key":"bibr24-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1145\/1837853.1693489"},{"key":"bibr25-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1016\/S1383-7621(03)00102-4"},{"key":"bibr26-1094342016661865","doi-asserted-by":"publisher","DOI":"10.1109\/DCS.1988.12538"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016661865","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342016661865","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016661865","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T03:24:43Z","timestamp":1729049083000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342016661865"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,8,1]]},"references-count":26,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2017,11]]}},"alternative-id":["10.1177\/1094342016661865"],"URL":"https:\/\/doi.org\/10.1177\/1094342016661865","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2016,8,1]]}}}