Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment |
Choi, HyeonSeong
(Korea Aerospace University)
Kim, Youngrang (Korea Aerospace University) Lee, Jaehwan (Korea Aerospace University) Kim, Yoonhee (Sookmyung Women's University) |
1 | E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, A. Lumsdaine, R. Castain, D. Daniel, R. Graham, and T. Woodall, "Open MPI: Goals, concept, and design of a next generation MPI implementation," in Proc. of European Parallel Virtual Machine/Message Passing Interface Users' Group Meeting, vol. 3241, pp. 97-104, 2004. |
2 | V. G. Siddaramanna and A. A. R. John, "Effect of Performance on Containerized Deep Learning Applications," Presented at WinTechCon-2018, organized by IEEE CAS Bangalore Chapter, IEEE Bangalore Section, and IEEE WiE Council, pp. 1-6, 2018. |
3 | A. Sergeev and M. D. Balso, "Horovod: fast and easy distributed deep learning in TensorFlow," arXiv preprint arXiv:1802.05799, 2018. |
4 | M. G. Xavier, M. V. Neves, F. D. Rossi, T. C. Ferreto, T. Lange, and C. A. Rose, "Performance evaluation of container-based virtualization for high performance computing environments," in Proc. of the 21st Euromicro International Conference on Parallel, pp. 233-240, 2013. |
5 | P. Saha, A. Beltre, P. Uminski, and M. Govindaraju, "Evaluation of docker containers for scientific workloads in the cloud," in Proc. of International Conference on Advanced Research Computing, pp. 1-8, 2018. |
6 | T. Kamarainen, Y. Shan, M. Siekkinen, and A. Ylajaaski, "Virtual machines vs. containers in cloud gaming systems," in Proc. of International Workshop on Network and Systems Support for Games (NetGames), pp. 1-6, 2015. |
7 | P. Xu, S. Shi, and X. Chu, "Performance evaluation of deep learning tools in Docker containers," in Proc. of the 3rd International Conference on Big Data Computing and Communications (BIGCOM), pp. 395-403, 2017. |
8 | J. Zhang, X. Lu, and D. K. Panda, "Is Singularity-based Container Technology Ready for Running MPI Applications on HPC Clouds?," in Proc. of the 10th International Conference on Utility and Cloud Computing, pp. 151-160, 2017. |
9 | H. Mikami, Hiroaki, P. Uchupala, Y. Tanaka, and Y. Kageyama, "Massively distributed SGD: ImageNet/ResNet-50 training in a flash," arXiv preprint arXiv:1811.05233, 2018. |
10 | J. Deng, W. Dong, R. Socher, L. Jia, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009. |
11 | J. Sylvain, "Nccl 2.0," GTC, 2017. |
12 | G. Heigold, E. McDermott, V. Vanhoucke, A. Senior, and M. Bacchiani, "Asynchronous stochastic optimization for sequence training of deep neural networks," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5587-5591, 2014. |
13 | Z. Li, M. Kihl, Q. Lu, and J. A. Andersson, "Performance Overhead Comparison between Hypervisor and Container Based Virtualization," in Proc. of IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), pp. 955-962, 2017. |
14 | C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015. |
15 | E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, "Regularized evolution for image classifier architecture search," in Proc. of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 4780-4789, 2019. |
16 | Y. Huang, Y. Cheng, A. Bapna, O. First, D. Chen, M. Chen, H. Lee, K. Ngiam, Q. V. Le, Y. Wu, and Z. Chen, "Gpipe: Efficient training of giant neural networks using pipeline parallelism," Advances in Neural Information Processing Systems, vol. 32, 2019. |
17 | P. Goyal, P. Dollar, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, "Accurate, large minibatch sgd: Training imagenet in 1 hour," arXiv preprint arXiv:1706.02677, 2017. |
18 | D. Bernstein, "Containers and Cloud: From LXC to Docker to Kubernetes," IEEE Cloud Computing, vol. 1, no. 3, pp. 81-84, Sep. 2014. DOI |
19 | Overview of amazon web services, Amazon Whitepapers, 2020. |
20 | J. Dongarra, S. W. Otto, M. Snir, and D. Walker, "An introduction to the MPI standard," Communications of the ACM 18, 1995. |
21 | M. Zinkevich, M. Weimer, L. Li, and A. Smola, "Parallelized stochastic gradient descent," Advances in Neural Information Processing Systems, 2010. |
22 | M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, and M. Isard, "Tensorflow: Large-scale machine learning on heterogeneous distributed systems," in Proc. of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16), pp. 265-283, 2016. |
23 | W. Gropp, E. Lusk, N. Doss, and A. Skjullum, "A high-performance, portable implementation of the MPI message passing interface standard," Parallel Computing, vol. 22, no. 6, pp. 789-828, 1996. DOI |
24 | B. Barker, "Message passing interface (MPI)," in Proc. of Workshop: High Performance Computing on Stampede, vol. 262, 2015. |
25 | J. Dean, G. S. Corradeo, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, and P. Tucker, "Large scale distributed deep networks," in Proc. of the 25th International Conference on Neural Information Processing Systems, pp. 1223-1231, 2012. |