DOI QR코드

DOI QR Code

딥러닝 모델 병렬 처리

Deep Learning Model Parallelism

  • 발행 : 2018.08.01

초록

Deep learning (DL) models have been widely applied to AI applications such image recognition and language translation with big data. Recently, DL models have becomes larger and more complicated, and have merged together. For the accelerated training of a large-scale deep learning model, model parallelism that partitions the model parameters for non-shared parallel access and updates across multiple machines was provided by a few distributed deep learning frameworks. Model parallelism as a training acceleration method, however, is not as commonly used as data parallelism owing to the difficulty of efficient model parallelism. This paper provides a comprehensive survey of the state of the art in model parallelism by comparing the implementation technologies in several deep learning frameworks that support model parallelism, and suggests a future research directions for improving model parallelism technology.

키워드

과제정보

연구 과제번호 : 대규모 딥러닝 고속 처리를 위한 HPC 시스템 개발

연구 과제 주관 기관 : 정보통신기술진흥센터

참고문헌

  1. E.P. Xing and Q. Ho, "A New Look at the System, Algorithm and Theory Foundations of Large-Scale Distributed Machine Learning," KDD 2015 Tutorial.
  2. L. Rokach, "Ensemble-Based Classifiers," Artif. Intell. Rev., vol. 33, no. 1-2, Feb. 2010, pp. 1-39. https://doi.org/10.1007/s10462-009-9124-7
  3. J. Ngiam et al., "Multimodal Deep Learning," Proc. Int. Conf. Mach. Learning, Bellevue, USA, 2011, pp. 1-9.
  4. S.J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Trans Knolw. Data Eng., vol. 22, no. 10, 2010, pp. 1345-1359. https://doi.org/10.1109/TKDE.2009.191
  5. 안신영 외, "딥러닝 분산 처리 기술 동향," 전자통신동향분석, 제31권제3호, 2016, pp. 131-141. https://doi.org/10.22648/ETRI.2016.J.310314
  6. Training with Multiple GPUs Using Model Parallelism. https://mxnet.incubator.apache.org/faq/model_parallel_lstm.html
  7. T. Chen et al., "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems," In Proc. LearningSys, Montreal, Canada, Oct. 10, 2015.
  8. A. Krizhevsky, "One Weird Trick for Parallelizing Convolutional Neural Networks," 2014, arXiv preprint arXiv: abs/1404.5997.
  9. K. Zhang, "Data Parallel and Model Parallel Distributed Training with Tensorflow," http://kuozhangub.blogspot.kr/2017/08/data-parallel-and-model-parallel.html
  10. A. Oland and B. Raj, "Reducing Communication Overhead in Distributed Learning by an Order of Magnitude (Almost)," In IEEE Int. Conf. Acoustics, Speech Signal Process., Brisbane, Australia, 2015, pp. 2219-2223.
  11. T. Xiao et al., "Fast Parallel Training of Neural Language Models," Int. Joint Conf. Artif. Intell., Melbourne, Australia, Aug. 2017. pp. 4193-4199.
  12. P. Goyal et al., "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour," June 2017, arXiv: 1706.02677.
  13. D. Amodei et al., "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin." ICML, NY, USA, June 2016, pp. 173-182.
  14. E.P. Xing et al., "Petuum: A New Platform for Distributed Machine Learning on Big Data Eric," IEEE Trans. Big Data, vol. 1, no. 2, 2015, pp. 49-67. https://doi.org/10.1109/TBDATA.2015.2472014
  15. S. Lee et al., "On Model Parallelization and Scheduling Strategies for Distributed Machine Learning," Int. Conf. Neural Inform. Process. Syst., vol. 2, 2014, pp. 2834-2842.
  16. J.K. Kim et al., "STRADS: a Distributed Framework for Scheduled Model Parallel Machine Learning," Proc. Eur. Conf. Comput. Syst., London, UK, Apr. 2016, pp. 1-16.
  17. W. Wang et al., "SINGA: Putting Deep Learning in the Hands of Multimedia Users," In ACM Multimedia, Brisbane, Australia, Oct. 2015, pp. 25-34.
  18. M. Abadi et al., "TensorFlow: A System for Large-Scale Machine Learning," Proc. USENIX Symp. Oper. Syst. Des. Implement., Savannah, GA, USA, 2016, pp. 265-283.
  19. J. Yangqing et al., "Caffe: Convolutional Architecture for Fast Feature Embedding," In Proc. Int. Conf. Multimedia, Orlando, FL, USA, Nov. 2014, pp. 675-678.
  20. S.Y. Ahn et al., "A Novel Shared Memory Framework for Distributed Deep Learning in High-Performance Computing Architecture," accepted in ICSE 2018.
  21. T.M. Breuel, "The Effects of Hyperparameters on SGD Training of Neural Networks," 2015, arXiv preprint arXiv: 1508.02788.
  22. P. Goyal et al., "Accurate, Large Minibatch SGD: Training Imagenet in 1 Hour," 2017, arXiv preprint arXiv: 1706.02677.
  23. J. Dean et al., "Large Scale Distributed Deep Networks," NIPS'12, vol. 1, Dec. 2012, pp. 1223-1231.
  24. A. Gaunt et al., "AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks," 2018, arXiv preprint arXiv: 1705.09786.
  25. D. Shrivastava et al., "A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark," 2017, arXiv preprint arXiv: 1708.05840v.