Abstract
In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, since many of the functionalities needed are based on standard linear algebra (LA) routines, we designed MagmaDNN to derive its performance power from the MAGMA library. The close integration provides the fundamental (scalable high-performance) LA routines available in MAGMA as a backend to MagmaDNN. We present some design issues for performance and scalability that are specific to ML using Deep Neural Networks (DNN), as well as the MagmaDNN designs towards overcoming them. In particular, MagmaDNN uses well established HPC techniques from the area of dense LA, including task-based parallelization, DAG representations, scheduling, mixed-precision algorithms, asynchronous solvers, and autotuned hyperparameter optimization. We illustrate these techniques and their incorporation and use to outperform other frameworks, currently available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
https://bitbucket.org/icl/magmadnn/.
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016). http://arxiv.org/abs/1603.04467
Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Performance, design, and autotuning of batched GEMM for GPUs. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 21–38. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41321-1_2
Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. CoRR abs/1802.09941 (2018). http://arxiv.org/abs/1802.09941
Chen, J., Monga, R., Bengio, S., Józefowicz, R.: Revisiting distributed synchronous SGD. CoRR abs/1604.00981 (2016). http://arxiv.org/abs/1604.00981
Chen, S., Gessinger, A., Tomov, S.: Design and acceleration of convolutional neural networks on modern architectures. Technical report, Joint Institute for Computational Sciences (JICS), UTK (2018). 2018 Summer Research Experiences for Undergraduate (REU), Knoxville, TN 2018
Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. CoRR abs/1410.0759 (2014). http://arxiv.org/abs/1410.0759
Gates, M., Tomov, S., Dongarra, J.: Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs. Parallel Comput. 74, 3–18 (2018). https://doi.org/10.1016/j.parco.2017.10.004. http://www.sciencedirect.com/science/article/pii/S0167819117301758. Parallel Matrix Algorithms and Applications (PMAA’16)
Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017). http://arxiv.org/abs/1706.02677
Iandola, F.N., Ashraf, K., Moskewicz, M.W., Keutzer, K.: FireCaffe: near-linear acceleration of deep neural network training on compute clusters. CoRR abs/1511.00175 (2015). http://arxiv.org/abs/1511.00175
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. CoRR abs/1408.5093 (2014). http://arxiv.org/abs/1408.5093
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Smith, S.L., Kindermans, P., Le, Q.V.: Don’t decay the learning rate, increase the batch size. CoRR abs/1711.00489 (2017). http://arxiv.org/abs/1711.00489
Sorna, A., Cheng, X., D’Azevedo, E., Wong, K., Tomov, S.: Optimizing the fast fourier transform using mixed precision on tensor core hardware. In: 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW). pp. 3–7, December 2018. https://doi.org/10.1109/HiPCW.2018.8634417
Tomov, N., Tomov, S.: On deep neural networks for detecting heart disease. CoRR abs/1808.07168 (2018). http://arxiv.org/abs/1808.07168
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5), 232–240 (2010). https://doi.org/10.1016/j.parco.2009.12.005. http://www.sciencedirect.com/science/article/pii/S0167819109001276. Parallel Matrix Algorithms and Applications
Tomov, S., Haidar, A., Ayala, A., Schultz, D., Dongarra, J.: Design and implementation for FFT-ECP on distributed accelerated systems. ECP WBS 2.3.3.09 Milestone Report FFT-ECP ST-MS-10-1410, Innovative Computing Laboratory, University of Tennessee, April 2019. 04–2019 revision
Wong, K., Brown, L., Coan, J., White, D.: Distributive interoperable executive library (DIEL) for systems of multiphysics simulation. In: 2014 15th International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 49–55. IEEE (2014)
You, Y., Gitman, I., Ginsburg, B.: Scaling SGD batch size to 32k for imagenet training. CoRR abs/1708.03888 (2017). http://arxiv.org/abs/1708.03888
You, Y., Zhang, Z., Hsieh, C., Demmel, J.: 100-epoch imagenet training with AlexNet in 24 minutes. CoRR abs/1709.05011 (2017). http://arxiv.org/abs/1709.05011
Acknowledgments
This work was conducted at the Joint Institute for Computational Sciences (JICS) and the Innovative Computing Laboratory (ICL). This work is sponsored by the National Science Foundation (NSF), through NSF REU Award #1659502, with additional Support from the University of Tennessee, Knoxville (UTK), the National Institute for Computational Sciences (NICS), and NSF Awards #1740250 and #1709069. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by NSF grant #ACI-1548562. Computational Resources are available through a XSEDE education allocation awards TG-ASC170031 and TG-ASC190013. In addition, the computing work was also performed on technical workstations donated by the BP High Performance Computing Team, as well as on GPUs donated by NVIDIA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Nichols, D., Tomov, NS., Betancourt, F., Tomov, S., Wong, K., Dongarra, J. (2019). MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-34356-9_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)