iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://api.crossref.org/works/10.1002/CPE.2931
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,10]],"date-time":"2024-09-10T17:53:52Z","timestamp":1725990832753},"reference-count":25,"publisher":"Wiley","issue":"8","license":[{"start":{"date-parts":[[2012,10,4]],"date-time":"2012-10-04T00:00:00Z","timestamp":1349308800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2013,6,10]]},"abstract":"SUMMARY<\/jats:title>In this paper, we address the design and implementation of graphical processing unit (GPU)\u2010accelerated branch\u2010and\u2010bound algorithms (B&B) for solving flow\u2010shop scheduling optimization problems (FSP). Such applications are CPU\u2010time consuming and highly irregular. On the other hand, GPUs are massively multithreaded accelerators using the single instruction multiple data model at execution. A major issue that arises when executing on GPU, a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP that contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on well\u2010known FSP benchmarks using an Nvidia Tesla (C2050 GPU card (http:\/\/www.nvidia.com\/docs\/IO\/43395\/NV_DS_Tesla_C2050_C2070_jul10_lores.pdf<\/jats:ext-link>)). Compared with a CPU\u2010based execution, accelerations up to\u2009\u00d7\u200977.46 are achieved for large problem instances. Copyright \u00a9 2012 John Wiley & Sons, Ltd.<\/jats:p>","DOI":"10.1002\/cpe.2931","type":"journal-article","created":{"date-parts":[[2012,10,4]],"date-time":"2012-10-04T09:35:13Z","timestamp":1349343313000},"page":"1121-1136","source":"Crossref","is-referenced-by-count":38,"title":["Reducing thread divergence in a GPU\u2010accelerated branch\u2010and\u2010bound algorithm"],"prefix":"10.1002","volume":"25","author":[{"given":"I.","family":"Chakroun","sequence":"first","affiliation":[{"name":"Universit\u00e9 Lille 1 CNRS\/LIFL INRIA Lille Nord Europe Cit\u00e9 scientifique \u2010 59655 Villeneuve d'Ascq cedex France"}]},{"given":"M.","family":"Mezmaz","sequence":"additional","affiliation":[{"name":"Mathematics and Operational Research Department (MathRO) University of Mons Belgium"}]},{"given":"N.","family":"Melab","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Lille 1 CNRS\/LIFL INRIA Lille Nord Europe Cit\u00e9 scientifique \u2010 59655 Villeneuve d'Ascq cedex France"}]},{"given":"A.","family":"Bendjoudi","sequence":"additional","affiliation":[{"name":"CEntre de Recherche sur l'Information Scientifique et Technique (CERIST) 16030 Ben\u2010Aknoun Algiers Algeria"}]}],"member":"311","published-online":{"date-parts":[[2012,10,4]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.598276"},{"key":"e_1_2_10_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.48868"},{"key":"e_1_2_10_4_1","doi-asserted-by":"publisher","DOI":"10.1080\/10556780802086300"},{"key":"e_1_2_10_5_1","isbn-type":"print","volume-title":"Scientific Computing with Multicore and Accelerators","author":"Dongarra JJ","year":"2010","ISBN":"http:\/\/id.crossref.org\/isbn\/143982536X"},{"key":"e_1_2_10_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.206"},{"key":"e_1_2_10_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/0377-2217(93)90182-M"},{"key":"e_1_2_10_8_1","unstructured":"ChakrounI BendjoudiA MelabN.Reducing thread divergence in GPU\u2010based B&B applied to the Flow\u2010shop problem.9th International Conference on Parallel Processing and Applied Mathematics PPAM\u201911 Poland Torun September 11\u201014 2011. (To appear)."},{"issue":"10","key":"e_1_2_10_9_1","doi-asserted-by":"crossref","first-page":"1389","DOI":"10.1016\/j.jpdc.2008.05.011","article-title":"Program optimization carving for GPU computing","volume":"68","author":"Ryoo S","year":"2008","journal-title":"Journal of Parallel and Distributed Computing"},{"key":"e_1_2_10_10_1","doi-asserted-by":"crossref","unstructured":"FungW ShamI YuanG AamodtT.Dynamic warp formation and scheduling for efficient gpu control flow.MICRO \u201907: Proceedings of the 40th Annual IEEE\/ACM International Symposium on Micro\u2010architecture Washington DC USA 2007;407\u2013420.","DOI":"10.1109\/MICRO.2007.30"},{"key":"e_1_2_10_11_1","doi-asserted-by":"crossref","unstructured":"MengJ TarjanD SkadronK.Dynamic warp subdivision for integrated branch and memory divergence tolerance.Proceedings of the 37th Annual International Symposium on Computer Architecture ISCA New York USA 2010;235\u2013246.","DOI":"10.1145\/1815961.1815992"},{"key":"e_1_2_10_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1810085.1810104"},{"key":"e_1_2_10_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1964179.1964184"},{"key":"e_1_2_10_14_1","doi-asserted-by":"publisher","DOI":"10.1287\/moor.1.2.117"},{"key":"e_1_2_10_15_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.26.1.53"},{"key":"e_1_2_10_16_1","doi-asserted-by":"publisher","DOI":"10.1002\/nav.3800010110"},{"key":"e_1_2_10_17_1","unstructured":"MelabN.Contributions \u00e0 la r\u00e9solution de probl\u00e8mes d'optimisation combinatoire sur grilles de calcul. HDR thesis LIFL USTL Novembre2005."},{"key":"e_1_2_10_18_1","unstructured":"Nvidia CUDA C Programming Best Practices Guide. Available from:http:\/\/developer.download.Nvidia.com\/compute\/cuda\/2\\_3\/toolkit\/docs\/Nvidia\\_CUDA\\_BestPracticesGuide\\_2.3.pdf[accessed on 19 January 2011]."},{"key":"e_1_2_10_19_1","unstructured":"MelabN ChakrounI BendjoudiA.GPU\u2010accelerated bounding for branch\u2010and\u2010bound applied to a permutation problem using data access optimization. under submission in concurrency and computation: practice and experience \u2010 manuscript CPE\u201012\u20100085."},{"key":"e_1_2_10_20_1","doi-asserted-by":"crossref","unstructured":"MezmazM MelabN TalbiE\u2010G.A grid\u2010enabled branch and bound algorithm for solving challenging combinatorial optimization problems.Proceedings of 21th IEEE International Parallel and Distributed Processing Symposium (IPDPS) Long Beach California March 26th\u201030th 2007.","DOI":"10.1109\/IPDPS.2007.370217"},{"key":"e_1_2_10_21_1","doi-asserted-by":"crossref","unstructured":"ChakrounI MelabN.An adaptative multi\u2010GPU\u2010based branch\u2010and\u2010bound. A case study: the flow\u2010shop scheduling problem.The 14th IEEE International Conference on High Performance Computing and Communications (HPCC\u20102012) Liverpool England.","DOI":"10.1109\/HPCC.2012.59"},{"key":"e_1_2_10_22_1","unstructured":"COMPUTE VISUAL PROFILER User Guide. Available from:http:\/\/developer.Nvidia.com\/Nvidia\u2010gpu\u2010computing\u2010documentation#VisualProfiler[accessed on 3 October 2011]."},{"key":"e_1_2_10_23_1","doi-asserted-by":"publisher","DOI":"10.1364\/BOE.1.000658"},{"key":"e_1_2_10_24_1","unstructured":"Available from:http:\/\/www.Nvidia.com\/docs\/IO\/43395\/NV\\_DS\\_Tesla\\_C2050\\_C2070\\_jul10\\_lores.pdf[accessed on 12 May 2012]."},{"key":"e_1_2_10_25_1","unstructured":"Available from:http:\/\/ark.intel.com\/products\/47933\/Intel\u2010Core\u2010i7\u2010970\u2010Processor\u2010\\%2812M\u2010Cache\u20103\\_20\u2010GHz\u20104\\_80\u2010GTs\u2010Intel\u2010QPI\\%29[accessed on 12 May 2012]."},{"key":"e_1_2_10_26_1","unstructured":"Available from:http:\/\/download.intel.com\/support\/processors\/corei7\/sb\/core\\_i7\u2010900\\_d.pdf[accessed on 12 May 2012]."}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.2931","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.2931","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,11]],"date-time":"2023-09-11T23:22:28Z","timestamp":1694474548000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.2931"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,10,4]]},"references-count":25,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2013,6,10]]}},"alternative-id":["10.1002\/cpe.2931"],"URL":"https:\/\/doi.org\/10.1002\/cpe.2931","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"value":"1532-0626","type":"print"},{"value":"1532-0634","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,10,4]]}}}