iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://api.crossref.org/works/10.1002/CPE.4374
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T23:38:40Z","timestamp":1723246720613},"reference-count":39,"publisher":"Wiley","issue":"9","license":[{"start":{"date-parts":[[2017,11,27]],"date-time":"2017-11-27T00:00:00Z","timestamp":1511740800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"funder":[{"name":"PDSE-CAPES","award":["3376\/2015-00"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2018,5,10]]},"abstract":"Summary<\/jats:title>New GPGPU technologies, such as CUDA Dynamic Parallelism (CDP), can help dealing with recursive patterns of computation, such as divide\u2010and\u2010conquer, used by backtracking algorithms. In this paper, we propose a GPU\u2010accelerated backtracking algorithm using CDP that extends a well\u2010known parallel backtracking model. The search starts on CPU, processing the search tree until a first cutoff depth. Based on this partial backtracking tree, the algorithm analyzes the memory requirements of subsequent kernel generations. The proposed algorithm performs no dynamic allocation of memory on GPU, unlike related works from the literature. The proposed algorithm has been extensively tested using the N\u2010Queens Puzzle problem and instances of the Asymmetric Traveling Salesman Problem (ATSP) as test\u2010cases. The proposed CDP algorithm may, under some conditions, outperform its non\u2010CDP counterpart by a factor up to 25. But, it may also be up to twice slower. The CDP\u2010based implementation has much better worst case execution times and makes algorithm's performance less dependent on the tuning of parameters. Compared to other CDP\u2010based strategies from the literature, the proposed algorithm is on average 8\u00d7 faster. The proposed algorithm is also hybridized with another CDP\u2010based strategy from the literature. The combination of strategies is in average 4.5\u00d7 faster than the related strategy. We also identify some difficulties, limitations, and bottlenecks concerning the CDP programming model which may be useful for helping potential users.<\/jats:p>","DOI":"10.1002\/cpe.4374","type":"journal-article","created":{"date-parts":[[2017,11,27]],"date-time":"2017-11-27T23:32:59Z","timestamp":1511825579000},"source":"Crossref","is-referenced-by-count":10,"title":["GPU\u2010accelerated backtracking using CUDA Dynamic Parallelism"],"prefix":"10.1002","volume":"30","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-6145-8352","authenticated-orcid":false,"given":"Tiago","family":"Carneiro\u00a0Pessoa","sequence":"first","affiliation":[{"name":"ParGO Research Group (Parallelism, Graphs, and Optimization), Mestrado Doutorado em Ci\u00eancia da Computa\u00e7\u00e3o Universidade Federal do Cear\u00e1 Fortaleza Brazil"}]},{"given":"Jan","family":"Gmys","sequence":"additional","affiliation":[{"name":"Mathematics and Operational Research Department (MARO) University of Mons Mons Belgium"},{"name":"INRIA Lille Nord Europe Universit\u00e9 Lille 1, CNRS\/CRIStAL Villeneuve\u2010d'Ascq France"}]},{"given":"Francisco Heron","family":"de Carvalho J\u00fanior","sequence":"additional","affiliation":[{"name":"ParGO Research Group (Parallelism, Graphs, and Optimization), Mestrado Doutorado em Ci\u00eancia da Computa\u00e7\u00e3o Universidade Federal do Cear\u00e1 Fortaleza Brazil"}]},{"given":"Nouredine","family":"Melab","sequence":"additional","affiliation":[{"name":"INRIA Lille Nord Europe Universit\u00e9 Lille 1, CNRS\/CRIStAL Villeneuve\u2010d'Ascq France"}]},{"given":"Daniel","family":"Tuyttens","sequence":"additional","affiliation":[{"name":"Mathematics and Operational Research Department (MARO) University of Mons Mons Belgium"}]}],"member":"311","published-online":{"date-parts":[[2017,11,27]]},"reference":[{"key":"e_1_2_8_2_1","doi-asserted-by":"crossref","unstructured":"BurtscherM NasreR PingaliK.A quantitative study of irregular programs on GPUs. In: IEEE International Symposium on Workload Characterization (IISWC);2012;San Diego CA USA.141\u2010151.","DOI":"10.1109\/IISWC.2012.6402918"},{"key":"e_1_2_8_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.313122"},{"key":"e_1_2_8_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/156668.156680"},{"key":"e_1_2_8_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2013.05.194"},{"key":"e_1_2_8_6_1","doi-asserted-by":"crossref","unstructured":"LiD WuH BecchiM.Nested parallelism on GPU: exploring parallelization templates for irregular loops and recursive computations. In: 44th International Conference on Parallel Processing (ICPP).Beijing China:IEEE;2015:979\u2010988.","DOI":"10.1109\/ICPP.2015.107"},{"key":"e_1_2_8_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/174130.174145"},{"key":"e_1_2_8_8_1","unstructured":"ZhangW.Branch\u2010and\u2010bound search algorithms and their computational complexity \u00a0 DTIC Document;1996."},{"key":"e_1_2_8_9_1","unstructured":"ZhangW KorfRE.Depth\u2010first vs. best\u2010first search: New results. In: Proceedings of the National Conference on Artificial Intelligence.Washington D.C. USA:John Wiley & Sons Ltd;1993:769\u2010769."},{"key":"e_1_2_8_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23397-5_42"},{"key":"e_1_2_8_11_1","doi-asserted-by":"crossref","unstructured":"CarneiroT MuritibaA NegreirosM de\u00a0CamposG.A new parallel schema for branch\u2010and\u2010bound algorithms using GPGPU. In: 23rd International Symposium on Computer Architecture and High Performance Computing (SBAC\u2010PAD);2011;Vit\u00f3ria ES Brazil.41\u201047.","DOI":"10.1109\/SBAC-PAD.2011.20"},{"key":"e_1_2_8_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2014.2345054"},{"key":"e_1_2_8_13_1","doi-asserted-by":"crossref","unstructured":"FeinbubeF RabeB vonLowisM PolzeA.Nqueens on CUDA: Optimization issues. In: Ninth International Symposium on Parallel and Distributed Computing (ISPDC). Innsbruck Austria: IEEE;2010:63\u201070.","DOI":"10.1109\/ISPDC.2010.22"},{"key":"e_1_2_8_14_1","unstructured":"NVIDIA.NVIDIA's next generation CUDA compute architecture: Kepler GK110. NVIDIA Corporation Whitepaper v1.0;2012."},{"key":"e_1_2_8_15_1","unstructured":"AdinetzA.CUDA dynamic parallelism: API and principles.https:\/\/devblogs.nvidia.com\/parallelforall\/cuda-dynamic-parallelism-api-principles\/.2014."},{"key":"e_1_2_8_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2555243.2555254"},{"key":"e_1_2_8_17_1","first-page":"449","volume-title":"Parallel Processing and Applied Mathematics","author":"Rocki K","year":"2009"},{"key":"e_1_2_8_18_1","doi-asserted-by":"publisher","DOI":"10.15803\/ijnc.6.2_212"},{"key":"e_1_2_8_19_1","volume-title":"CUDA Dynamic Parallelism Programming Guide","year":"2012"},{"key":"e_1_2_8_20_1","volume-title":"CUDA C Programming Guide (Version 8.0)","year":"2016"},{"key":"e_1_2_8_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750393"},{"key":"e_1_2_8_22_1","doi-asserted-by":"crossref","unstructured":"WangJ YalamanchiliS.Characterization and analysis of dynamic parallelism in unstructured GPU applications. In: 2014 IEEE International Symposium on Workload Characterization (IISWC).Raleigh NC USA:IEEE;2014:51\u201060.","DOI":"10.1109\/IISWC.2014.6983039"},{"key":"e_1_2_8_23_1","doi-asserted-by":"crossref","unstructured":"ZhangP HolkE MattyJ et al.Dynamic parallelism for simple and efficient GPU graph algorithms. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms.Austin TX USA:ACM;2015:11.","DOI":"10.1145\/2833179.2833189"},{"key":"e_1_2_8_24_1","doi-asserted-by":"publisher","DOI":"10.1287\/ijoc.7.4.365"},{"key":"e_1_2_8_25_1","volume-title":"In: NVIDIA's GPU Computing Developer Forum \/ XXXII Congresso da Sociedade Brasileira de Computa\u00e7\u00e3o (CSBC)","author":"Carneiro T","year":"2012"},{"key":"e_1_2_8_26_1","volume-title":"In: Iberian Latin American Congress on Computational Methods in Engineering (CILAMCE)","author":"Carneiro T","year":"2011"},{"issue":"2","key":"e_1_2_8_27_1","first-page":"21","article-title":"Many\u2010core approaches to combinatorial problems: case of the Langford problem","volume":"3","author":"Krajecki M","year":"2016","journal-title":"Supercomputing Frontiers and Innovations"},{"key":"e_1_2_8_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-49583-5_24"},{"key":"e_1_2_8_29_1","volume-title":"In: XLIII Simp Brasileiro de Pesquisa Operacional","author":"Pessoa TC","year":"2010"},{"key":"e_1_2_8_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-44808-X_3"},{"key":"e_1_2_8_31_1","volume-title":"In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation","author":"Cook W","year":"2012"},{"key":"e_1_2_8_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(89)90011-7"},{"key":"e_1_2_8_33_1","doi-asserted-by":"crossref","unstructured":"ZhangT ShuW WuM\u2010Y.Optimization of n\u2010queens solvers on graphics processors. In: International Workshop on Advanced Parallel Processing Technologies.Shanghai China:Springer;2011:142\u2010156.","DOI":"10.1007\/978-3-642-24151-2_11"},{"key":"e_1_2_8_34_1","first-page":"445","volume-title":"The Traveling Salesman Problem and its Variations","author":"Johnson D","year":"2004"},{"key":"e_1_2_8_35_1","unstructured":"FischerT St\u00fctzleT HoosH MerzP.An analysis of the hardness of TSP instances for two high performance algorithms. In: Proceedings of the Sixth Metaheuristics International Conference;2005;Vienna Austria.361\u2010367."},{"key":"e_1_2_8_36_1","unstructured":"SomersJ.The N\u2010Queens problem: A study in optimization.http:\/\/jsomers.com\/nqueen_demo\/nqueens.html. Accessed March 09 2016."},{"key":"e_1_2_8_37_1","doi-asserted-by":"crossref","unstructured":"WidmerS WodniokD WeberN GoeseleM.Fast dynamic memory allocator for massively parallel architectures. In: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units.Houston TX USA:ACM;2013:120\u2010126.","DOI":"10.1145\/2458523.2458535"},{"key":"e_1_2_8_38_1","volume-title":"Recent Developments in Bit\u2010Parallel Algorithms","author":"San\u00a0Segundo P","year":"2008"},{"key":"e_1_2_8_39_1","doi-asserted-by":"crossref","unstructured":"EnmyrenJ KesslerCW.Skepu: A multi\u2010backend skeleton programming library for multi\u2010gpu systems. Proceedings of the fourth international workshop on high\u2010level parallel programming and applications HLPP '10.New York NY USA:ACM;2010:5\u201014 https:\/\/doi.org\/10.1145\/1863482.1863487.","DOI":"10.1145\/1863482.1863487"},{"key":"e_1_2_8_40_1","doi-asserted-by":"publisher","DOI":"10.1287\/ijoc.1040.0107"}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.4374","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.4374","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,11]],"date-time":"2023-09-11T16:43:42Z","timestamp":1694450622000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.4374"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,11,27]]},"references-count":39,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2018,5,10]]}},"alternative-id":["10.1002\/cpe.4374"],"URL":"https:\/\/doi.org\/10.1002\/cpe.4374","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"value":"1532-0626","type":"print"},{"value":"1532-0634","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,11,27]]}}}