Abstract
Coprocessors are increasingly becoming key building blocks of High Performance Computing platforms. These many-core energy-efficient devices boost the performance of traditional processors. On the other hand, Branch-and-Bound (B&B) algorithms are tree-based exact methods for solving to optimality combinatorial optimization problems (COPs). Solving large COPs results in the generation of a very large pool of subproblems and the evaluation of their associated lower bounds. Generating and evaluating those subproblems on coprocessors raises several issues including processor-coprocessor data transfer optimization, vectorization, thread divergence, and so on. In this paper, we investigate the offload-based parallel design and implementation of B&B algorithms for coprocessors addressing these issues. Two major many-core architectures are considered and compared: Nvidia GPU and Intel MIC. The proposed approaches have been experimented using the Flow-Shop scheduling problem and two hardware configurations equivalent in terms of energy consumption: Nvidia Tesla K40 and Intel Xeon Phi 5110P. The reported results show that the GPU-accelerated approach outperforms the MIC offload-based one even in its vectorized version. Moreover, vectorization improves the efficiency of the MIC offload-based approach with a factor of two.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
GPU and MIC stand for respectively Graphics Processing Unit and Many Integrated Cores.
- 2.
An optimization problem consists of minimizing or maximizing a cost function. Without loss of generality, in this paper the minimization case and the permutation Flow-Shop scheduling problem are considered.
- 3.
References
Barreto, L., Bauer, M.: Parallel branch and bound algorithm-a comparison between serial, OpenMP and MPI implementations. J. Phys. Conf. Ser. 256, 012018 (2010)
Bolze, R., et al.: Grid’5000: a large scale and highly reconfigurable experimental grid testbed. IJHPCA 20(4), 481–494 (2006)
Chakroun, I.: Parallel heterogeneous branch and bound algorithms for multi-core and multi-GPU environments. Ph.D. thesis from Université Lille 1 (2013)
Chakroun, I., Melab, N., Mezmaz, M., Tuyttens, D.: Combining multi-core and GPU computing for solving combinatorial optimization problems. J. Parallel Distrib. Comput. 73(12), 1563–1577 (2013). https://doi.org/10.1016/j.jpdc.2013.07.023
Chakroun, I., Mezmaz, M., Melab, N., Bendjoudi, A.: Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm. Concurr. Comput. Pract. Exp. 25(8), 1121–1136 (2013)
Chrysos, G.: Intel Xeon Phi X100 family coprocessor - the architecture. Intel Inc. (2012). https://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner
Garey, M., Johnson, D., Sethi, R.: The complexity of flow-shop and job-shop scheduling. Math. Oper. Res. 1, 117–129 (1976)
Johnson, S.: Optimal two and three-stage production schedules with setup times included. Nav. Res. Logistis Q. 1, 61–68 (1954)
Lageweg, B., Lenstra, J., Kan, A.R.: A general bounding scheme for the permutation flow-shop problem. Oper. Res. 26(1), 53–67 (1978)
Lalami, M.: Contribution à la résolution de problèms d’optimisation combinatoire: méthodes séquentielles et parallèles. Université de Toulouse III - Paul Sabatier (2012)
Lawler, E.L., Wood, D.E.: Branch-and-bound methods: a survey. Oper. Res. 14(4), 699–719 (1966). https://doi.org/10.1287/opre.14.4.699
Melab, N., Chakroun, I., Bendjoudi, A.: Graphics processing unit-accelerated bounding for branch-and-bound applied to a permutation problem using data access optimization. Concurr. Comput. Pract. Exp. 26(16), 2667–2683 (2014)
Melab, N., Chakroun, I., Mezmaz, M., Tuyttens, D.: A GPU-accelerated branch-and-bound algorithm for the flow-shop scheduling problem. In: 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012, Beijing, China, 24–28 Sept 2012, pp. 10–17 (2012)
Melab, N., Leroy, R., Mezmaz, M., Tuyttens, D.: Parallel branch-and-bound using private IVM-based work stealing on xeon phi MIC coprocessor. In: 2015 International Conference on High Performance Computing and Simulation, HPCS 2015, Amsterdam, Netherlands, 20–24 July 2015, pp. 394–399 (2015)
Mezmaz, M., Melab, N., Talbi, E.G.: A grid-enabled branch and bound algorithm for solving challenging combinatorial optimization problems. In: Proceedings of 21th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, California (2007)
Nvidia: NVIDIA’s next generation CUDA compute architecture: Kepler GK110/210. Whitepaper (2014)
Saini, S., Jin, H., Jespersen, D., Cheung, S., Djomehri, M., Chang, J., Hood, R.: Early multi-node performance evaluation of a knights corner (KNC) based NASA supercomputer. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, IPDPS 2015, Hyderabad, India, 25–29 May 2015, pp. 57–67 (2015)
Top500 HPC international ranking. http://www.top500.org/
Vu, T.T.: Heterogeneity and locality-aware work stealing for large scale branch-and-bound irregular algorithms. Ph.D. thesis from Université Lille 1 (2014)
Acknowledgements
Experiments presented in this paper were carried out using the Intel Xeon Phi of Digitalis (http://digitalis.imag.fr) and Grid’5000 testbed (https://www.grid5000.fr). Therefore, we would like to thank the Digitalis staff and especially Pierre Neyron from LIG/CNRS Lab for making the MIC processor fully functional and available. Grid’5000 is supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Melab, N., Gmys, J., Mezmaz, M., Tuyttens, D. (2020). Many-Core Branch-and-Bound for GPU Accelerators and MIC Coprocessors. In: Bartz-Beielstein, T., Filipič, B., Korošec, P., Talbi, EG. (eds) High-Performance Simulation-Based Optimization. Studies in Computational Intelligence, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-030-18764-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-18764-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18763-7
Online ISBN: 978-3-030-18764-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)