DOI QR코드

DOI QR Code

대규모 신경회로망 분산 GPU 기계 학습을 위한 Caffe 확장

Extending Caffe for Machine Learning of Large Neural Networks Distributed on GPUs

  • 오종수 (경북대학교 전자공학부) ;
  • 이동호 (경북대학교 전자공학부)
  • 투고 : 2017.09.13
  • 심사 : 2018.01.30
  • 발행 : 2018.04.30

초록

Caffe는 학술 연구용으로 널리 사용되는 신경회로망 학습 소프트웨어이다. 신경회로망 구조 결정에서 가장 중요한 요소에 GPU 기억 용량이 포함된다. 예를 들어 많은 객체 검출 소프트웨어는 신경회로망이 12GB 이하의 기억 용량을 사용하게 하여 하나의 GPU에 적합하게 설계되어 있다. 본 논문에서는 큰 신경회로망을 두 개 이상의 GPU에 분산 저장하여 12GB 이상의 기억 용량을 사용할 수 있게 Caffe를 확장하였다. 확장된 소프트웨어를 검증하기 위하여 3개 GPU를 가진 PC에서 최신 객체 검출 소프트웨어의 배치 크기에 따른 학습 효율을 실험하였다.

Caffe is a neural net learning software which is widely used in academic researches. The GPU memory capacity is one of the most important aspects of designing neural net architectures. For example, many object detection systems require to use less than 12GB to fit a single GPU. In this paper, we extended Caffe to allow to use more than 12GB GPU memory. To verify the effectiveness of the extended software, we executed some training experiments to determine the learning efficiency of the object detection neural net software using a PC with three GPUs.

키워드

참고문헌

  1. J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng, "Large scale distributed deep networks," Advances in Neural Information Processing Systems 25, pp.1223-1231, 2012.
  2. F. Niu, B. Retcht, C. Re, and S. J. Wright. "Hogwild!: A lock-free approach to parallelizing stochastic gradient descent," Advances in Neural Information Processing Systems 24, pp.693-701, 2011.
  3. J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz, "Revisiting distributed synchronous SGD," The 4th International Conference on Learning Representations: Workshop Track, arXiv eprint arXiv:1604.00981, 2016.
  4. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," Proceedings of the 22nd ACM International Conference on Multimedia, pp.675-678, 2014.
  5. S. Hadjis, F. Abuzaid, C. Zhang, and C. Re, "Caffe con Troll: Shallow ideas to speed up deep learning," Proceedings of the 4th Workshop on Data analytics in the Cloud, pp.2:1-2:4, 2015.
  6. P. Goyal, P. Dollar, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, "Accurate, large minibatch SGD: Training ImageNet in 1 hour," arXiv eprint arXiv:1706.02677, 2017.
  7. Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, "A unified multi-scale deep convolutional neural network for fast object detection," Computer Vision - ECCV 2016: Part IV, Vol.9908 of Lecture Notes in Computer Science, pp.354-370, 2016.
  8. S. Ioffe, "Batch renormalization: Towards reducing minibatch dependence in batch-normalized models," arXiv eprint arXiv:1702.03275, 2017.
  9. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems 28, pp.91-99, 2015.
  10. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, "Vision meets robotics: The KITTI dataset," The International Journal of Robotics Research, Vol.32, Issue 11, pp.1231-1237, 2013. https://doi.org/10.1177/0278364913491297
  11. S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cuDNN: Efficient primitives for deep learning," The NIPS 2014 Deep Learning and Representation Learning Workshop, arXiv eprint arXiv:1410.0759, 2014.
  12. N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, P. T. P. Tang, "On large-batch training for deep learning: Generalization gap and sharp minima," The 5th International Conference on Learning Representations: Conference Track, arXiv eprint arXiv:1609.04836, 2017.
  13. J. Dai, Y. Li, K. He, and J. Sun, "R-FCN: Object detection via region-based fully convolutional networks," Advances in Neural Information Processing Systems 29, pp.379-387, 2016.
  14. A. G. Wilson, Z. Hu, R. Salakhutdinov, and E. P. Xing, "Deep kernel learning," Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Vol.51 of Proceedings of Machine Learning Research, pp.370-378, 2016.