Deep Learning Model Parallelism

Park, Y.M.;Ahn, S.Y.;Lim, E.J.;Choi, Y.S.;Woo, Y.C.;Choi, W.;

doi:10.22648/ETRI.2018.J.330401

Electronics and Telecommunications Trends (전자통신동향분석)

Volume 33 Issue 4
/
Pages.1-13
/
2018
/
1225-6455(pISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Deep Learning Model Parallelism

딥러닝 모델 병렬 처리

박유미 (고성능컴퓨팅연구그룹) ;
안신영 (고성능컴퓨팅연구그룹) ;
임은지 (고성능컴퓨팅연구그룹) ;
최용석 (고성능컴퓨팅연구그룹) ;
우영춘 (IDX 원천기술연구실) ;
최완 (IDX 원천기술연구실)

Published : 2018.08.01

https://doi.org/10.22648/ETRI.2018.J.330401 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Deep learning (DL) models have been widely applied to AI applications such image recognition and language translation with big data. Recently, DL models have becomes larger and more complicated, and have merged together. For the accelerated training of a large-scale deep learning model, model parallelism that partitions the model parameters for non-shared parallel access and updates across multiple machines was provided by a few distributed deep learning frameworks. Model parallelism as a training acceleration method, however, is not as commonly used as data parallelism owing to the difficulty of efficient model parallelism. This paper provides a comprehensive survey of the state of the art in model parallelism by comparing the implementation technologies in several deep learning frameworks that support model parallelism, and suggests a future research directions for improving model parallelism technology.

Keywords

Acknowledgement

Grant : 대규모 딥러닝 고속 처리를 위한 HPC 시스템 개발

Supported by : 정보통신기술진흥센터

References

E.P. Xing and Q. Ho, "A New Look at the System, Algorithm and Theory Foundations of Large-Scale Distributed Machine Learning," KDD 2015 Tutorial.
L. Rokach, "Ensemble-Based Classifiers," Artif. Intell. Rev., vol. 33, no. 1-2, Feb. 2010, pp. 1-39. https://doi.org/10.1007/s10462-009-9124-7
J. Ngiam et al., "Multimodal Deep Learning," Proc. Int. Conf. Mach. Learning, Bellevue, USA, 2011, pp. 1-9.
S.J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Trans Knolw. Data Eng., vol. 22, no. 10, 2010, pp. 1345-1359. https://doi.org/10.1109/TKDE.2009.191
안신영 외, "딥러닝 분산 처리 기술 동향," 전자통신동향분석, 제31권제3호, 2016, pp. 131-141. https://doi.org/10.22648/ETRI.2016.J.310314
Training with Multiple GPUs Using Model Parallelism. https://mxnet.incubator.apache.org/faq/model_parallel_lstm.html
T. Chen et al., "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems," In Proc. LearningSys, Montreal, Canada, Oct. 10, 2015.
A. Krizhevsky, "One Weird Trick for Parallelizing Convolutional Neural Networks," 2014, arXiv preprint arXiv: abs/1404.5997.
K. Zhang, "Data Parallel and Model Parallel Distributed Training with Tensorflow," http://kuozhangub.blogspot.kr/2017/08/data-parallel-and-model-parallel.html
A. Oland and B. Raj, "Reducing Communication Overhead in Distributed Learning by an Order of Magnitude (Almost)," In IEEE Int. Conf. Acoustics, Speech Signal Process., Brisbane, Australia, 2015, pp. 2219-2223.
T. Xiao et al., "Fast Parallel Training of Neural Language Models," Int. Joint Conf. Artif. Intell., Melbourne, Australia, Aug. 2017. pp. 4193-4199.
P. Goyal et al., "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour," June 2017, arXiv: 1706.02677.
D. Amodei et al., "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin." ICML, NY, USA, June 2016, pp. 173-182.
E.P. Xing et al., "Petuum: A New Platform for Distributed Machine Learning on Big Data Eric," IEEE Trans. Big Data, vol. 1, no. 2, 2015, pp. 49-67. https://doi.org/10.1109/TBDATA.2015.2472014
S. Lee et al., "On Model Parallelization and Scheduling Strategies for Distributed Machine Learning," Int. Conf. Neural Inform. Process. Syst., vol. 2, 2014, pp. 2834-2842.
J.K. Kim et al., "STRADS: a Distributed Framework for Scheduled Model Parallel Machine Learning," Proc. Eur. Conf. Comput. Syst., London, UK, Apr. 2016, pp. 1-16.
W. Wang et al., "SINGA: Putting Deep Learning in the Hands of Multimedia Users," In ACM Multimedia, Brisbane, Australia, Oct. 2015, pp. 25-34.
M. Abadi et al., "TensorFlow: A System for Large-Scale Machine Learning," Proc. USENIX Symp. Oper. Syst. Des. Implement., Savannah, GA, USA, 2016, pp. 265-283.
J. Yangqing et al., "Caffe: Convolutional Architecture for Fast Feature Embedding," In Proc. Int. Conf. Multimedia, Orlando, FL, USA, Nov. 2014, pp. 675-678.
S.Y. Ahn et al., "A Novel Shared Memory Framework for Distributed Deep Learning in High-Performance Computing Architecture," accepted in ICSE 2018.
T.M. Breuel, "The Effects of Hyperparameters on SGD Training of Neural Networks," 2015, arXiv preprint arXiv: 1508.02788.
P. Goyal et al., "Accurate, Large Minibatch SGD: Training Imagenet in 1 Hour," 2017, arXiv preprint arXiv: 1706.02677.
J. Dean et al., "Large Scale Distributed Deep Networks," NIPS'12, vol. 1, Dec. 2012, pp. 1223-1231.
A. Gaunt et al., "AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks," 2018, arXiv preprint arXiv: 1705.09786.
D. Shrivastava et al., "A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark," 2017, arXiv preprint arXiv: 1708.05840v.

Electronics and Telecommunications Trends (전자통신동향분석)

Deep Learning Model Parallelism

딥러닝 모델 병렬 처리

Abstract

Keywords

Acknowledgement

References

Detail Search