Deep Learning Model Parallelism

Park, Y.M.;Ahn, S.Y.;Lim, E.J.;Choi, Y.S.;Woo, Y.C.;Choi, W.;

doi:10.22648/ETRI.2018.J.330401

전자통신동향분석 (Electronics and Telecommunications Trends)

제33권4호
/
Pages.1-13
/
2018
/
1225-6455(pISSN)

한국전자통신연구원 (Electronics and Telecommunications Research Institute)

DOI QR Code

딥러닝 모델 병렬 처리

Deep Learning Model Parallelism

박유미 (고성능컴퓨팅연구그룹) ;
안신영 (고성능컴퓨팅연구그룹) ;
임은지 (고성능컴퓨팅연구그룹) ;
최용석 (고성능컴퓨팅연구그룹) ;
우영춘 (IDX 원천기술연구실) ;
최완 (IDX 원천기술연구실)

발행 : 2018.08.01

https://doi.org/10.22648/ETRI.2018.J.330401 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Deep learning (DL) models have been widely applied to AI applications such image recognition and language translation with big data. Recently, DL models have becomes larger and more complicated, and have merged together. For the accelerated training of a large-scale deep learning model, model parallelism that partitions the model parameters for non-shared parallel access and updates across multiple machines was provided by a few distributed deep learning frameworks. Model parallelism as a training acceleration method, however, is not as commonly used as data parallelism owing to the difficulty of efficient model parallelism. This paper provides a comprehensive survey of the state of the art in model parallelism by comparing the implementation technologies in several deep learning frameworks that support model parallelism, and suggests a future research directions for improving model parallelism technology.

키워드

과제정보

연구 과제번호 : 대규모 딥러닝 고속 처리를 위한 HPC 시스템 개발

연구 과제 주관 기관 : 정보통신기술진흥센터

참고문헌

E.P. Xing and Q. Ho, "A New Look at the System, Algorithm and Theory Foundations of Large-Scale Distributed Machine Learning," KDD 2015 Tutorial.
L. Rokach, "Ensemble-Based Classifiers," Artif. Intell. Rev., vol. 33, no. 1-2, Feb. 2010, pp. 1-39. https://doi.org/10.1007/s10462-009-9124-7
J. Ngiam et al., "Multimodal Deep Learning," Proc. Int. Conf. Mach. Learning, Bellevue, USA, 2011, pp. 1-9.
S.J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Trans Knolw. Data Eng., vol. 22, no. 10, 2010, pp. 1345-1359. https://doi.org/10.1109/TKDE.2009.191
안신영 외, "딥러닝 분산 처리 기술 동향," 전자통신동향분석, 제31권제3호, 2016, pp. 131-141. https://doi.org/10.22648/ETRI.2016.J.310314
Training with Multiple GPUs Using Model Parallelism. https://mxnet.incubator.apache.org/faq/model_parallel_lstm.html
T. Chen et al., "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems," In Proc. LearningSys, Montreal, Canada, Oct. 10, 2015.
A. Krizhevsky, "One Weird Trick for Parallelizing Convolutional Neural Networks," 2014, arXiv preprint arXiv: abs/1404.5997.
K. Zhang, "Data Parallel and Model Parallel Distributed Training with Tensorflow," http://kuozhangub.blogspot.kr/2017/08/data-parallel-and-model-parallel.html
A. Oland and B. Raj, "Reducing Communication Overhead in Distributed Learning by an Order of Magnitude (Almost)," In IEEE Int. Conf. Acoustics, Speech Signal Process., Brisbane, Australia, 2015, pp. 2219-2223.
T. Xiao et al., "Fast Parallel Training of Neural Language Models," Int. Joint Conf. Artif. Intell., Melbourne, Australia, Aug. 2017. pp. 4193-4199.
P. Goyal et al., "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour," June 2017, arXiv: 1706.02677.
D. Amodei et al., "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin." ICML, NY, USA, June 2016, pp. 173-182.
E.P. Xing et al., "Petuum: A New Platform for Distributed Machine Learning on Big Data Eric," IEEE Trans. Big Data, vol. 1, no. 2, 2015, pp. 49-67. https://doi.org/10.1109/TBDATA.2015.2472014
S. Lee et al., "On Model Parallelization and Scheduling Strategies for Distributed Machine Learning," Int. Conf. Neural Inform. Process. Syst., vol. 2, 2014, pp. 2834-2842.
J.K. Kim et al., "STRADS: a Distributed Framework for Scheduled Model Parallel Machine Learning," Proc. Eur. Conf. Comput. Syst., London, UK, Apr. 2016, pp. 1-16.
W. Wang et al., "SINGA: Putting Deep Learning in the Hands of Multimedia Users," In ACM Multimedia, Brisbane, Australia, Oct. 2015, pp. 25-34.
M. Abadi et al., "TensorFlow: A System for Large-Scale Machine Learning," Proc. USENIX Symp. Oper. Syst. Des. Implement., Savannah, GA, USA, 2016, pp. 265-283.
J. Yangqing et al., "Caffe: Convolutional Architecture for Fast Feature Embedding," In Proc. Int. Conf. Multimedia, Orlando, FL, USA, Nov. 2014, pp. 675-678.
S.Y. Ahn et al., "A Novel Shared Memory Framework for Distributed Deep Learning in High-Performance Computing Architecture," accepted in ICSE 2018.
T.M. Breuel, "The Effects of Hyperparameters on SGD Training of Neural Networks," 2015, arXiv preprint arXiv: 1508.02788.
P. Goyal et al., "Accurate, Large Minibatch SGD: Training Imagenet in 1 Hour," 2017, arXiv preprint arXiv: 1706.02677.
J. Dean et al., "Large Scale Distributed Deep Networks," NIPS'12, vol. 1, Dec. 2012, pp. 1223-1231.
A. Gaunt et al., "AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks," 2018, arXiv preprint arXiv: 1705.09786.
D. Shrivastava et al., "A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark," 2017, arXiv preprint arXiv: 1708.05840v.

전자통신동향분석 (Electronics and Telecommunications Trends)

딥러닝 모델 병렬 처리

Deep Learning Model Parallelism

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)