DOI QR코드

DOI QR Code

Efficient Fixed-Point Representation for ResNet-50 Convolutional Neural Network

ResNet-50 합성곱 신경망을 위한 고정 소수점 표현 방법

  • Kang, Hyeong-Ju (School of Computer Science and Engineering, Korea University of Technology and Education)
  • Received : 2017.09.11
  • Accepted : 2017.10.22
  • Published : 2018.01.31

Abstract

Recently, the convolutional neural network shows high performance in many computer vision tasks. However, convolutional neural networks require enormous amount of operation, so it is difficult to adopt them in the embedded environments. To solve this problem, many studies are performed on the ASIC or FPGA implementation, where an efficient representation method is required. The fixed-point representation is adequate for the ASIC or FPGA implementation but causes a performance degradation. This paper proposes a separate optimization of representations for the convolutional layers and the batch normalization layers. With the proposed method, the required bit width for the convolutional layers is reduced from 16 bits to 10 bits for the ResNet-50 neural network. Since the computation amount of the convolutional layers occupies the most of the entire computation, the bit width reduction in the convolutional layers enables the efficient implementation of the convolutional neural networks.

최근 합성곱 신경망은 컴퓨터 비전에 관련된 여러 분야에서 높은 성능을 보여 주고 있으나 합성곱 신경망이 요구하는 많은 연산양은 임베디드 환경에 도입되는 것을 어렵게 하고 있다. 이를 해결하기 위해 ASIC이나 FPGA를 통한 합성곱 신경망의 구현에 많은 관심이 모이고 있고, 이러한 구현을 위해서는 효율적인 고정 소수점 표현이 필요하다. 고정 소수점 표현은 ASIC이나 FPGA에서의 구현에 적합하나 합성곱 신경망의 성능이 저하될 수 있는 문제가 있다. 이 논문에서는 합성곱 계층과 배치(batch) 정규화 계층에 대해 고정 소수점 표현을 분리해서, ResNet-50 합성곱 신경망의 합성곱 계층을 표현하기 위해 필요한 비트 수를 16비트에서 10비트로 줄일 수 있게 하였다. 연산이 집중되는 합성곱 계층이 더 간단하게 표현되므로 합성곱 신경망 구현이 전체적으로 더 효율적으로 될 것이다.

Keywords

Acknowledgement

Supported by : National Research Foundation of Korea(NRF)

References

  1. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proceedings of Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
  2. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition [Internet]. Available: http://arxiv.org/abs/1409.1556.
  3. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions [Internet]. Available: http://arxiv.org/abs/1409.4842.
  4. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of Computer Vision and Pattern Recognition, pp. 770-778, 2016.
  5. S.-H. Kwon, K.-W. Park, B.-H. Chang, "A comparison of predicting movie success between artificial neural network and decision Tree," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol.7, no.4, pp. 593-601, Apr. 2017.
  6. Y.-J. Kim, E.-G. Kim, "Image based Fire Detection using Convolutional Neural Network," Journal of the Korea Institute of Information and Communication Engineering, vol. 20, no. 9, pp. 1649-1656, Sep. 2016. https://doi.org/10.6109/jkiice.2016.20.9.1649
  7. T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in Proceedings of International Conference on Architectureal Suport for Programming Languages and Operating Systems, pp. 269-283, 2014.
  8. Y.-H. Chen, J. Emer, and V. Sze, "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks," in Proceedings of International Symposium on Computer Architecture, pp. 367-379, 2016.
  9. S. Liu, Z. Du, J. Tao, D. Han, T. Luo, Y. Zie, Y. Chen, and T. Chen, "Cambricon: An instruction set architecture for neural networks," in Proceedings of International Symposium on Computer Architecture, pp. 393-405, 2016.
  10. L. Song, Y. Wang, Y. Han, X. Zhao, B. Liu, and X. Li, "C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization," in Proceedings of Design Automation Conference, pp. 123:1-123:6, 2016.
  11. Y. Wang, H. Li, and X. Li, "Re-architecting the on-chip memory sub-system of machine learning accelerator for embedded devices," in Proceedings of International Conference on Computer Aided Design, pp. 13:1-13:6, 2016.
  12. J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang, "Going deeper with embedded FPGA platform for convolutional neural network," in Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 26-35, 2016.
  13. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in Proceedings of International Conference on Machine Learning, pp. 1737-1746, 2015.
  14. D. D. Lin, S. S. Talathi, and V. S. Annapureddy. Fixed point quantization of deep convolutional networks [Internet]. Available: http://arxiv.org/abs/1511.06393.
  15. W. Sung, S. Shin, and K. Hwang. Resiliency of deep neural networks under quantization [Internet]. Available: http://arxiv.org/abs/1511.06488.
  16. M. Courbariaux, "Training deep neural networks with low precision multiplications," in Proceedings of International Conference on Learning Representations, 2015.
  17. P. Judd, J. Algericio, T. Hetherington, T. Aamodt, N. E. Jerger, R. Urtasun, and A. Moshovos. Reduced-precision strategies for bounded memory in deep neural nets [Internet]. Available: http://arxiv.org/abs/1511.05236.
  18. P. Gysel, M. Motamedi, and S. Ghiasi. Hardware-oriented approximation of convolutional neural networks [Internet]. Available: http://arxiv.org/abs/1604.03168.
  19. D. Miyashita, E. H. Lee, and B. Murmann. Convolutional neural networks using loogarithmic data representation [Internet]. Available: http://arxiv.org/abs/1603.01025.
  20. Z. Deng, C. Xu, Q. Cai, and P. Faraboschi, "Reducedprecision memory value approximation for deep learning," HP Laboratories, Tech. Rep. HPL-2015-100, 2015.
  21. S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding [Internet]. Available: http://arxiv.org/abs/1510.00149.
  22. L. Lai, N. Suda, and V. Chandra. Deep convolutional neural network inference with floating-point weights and fixed-point activations [Internet]. Available: http://arxiv. org/abs/1703.03073.
  23. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Gudarrama, and T. Darrel. Caffe: Convolutional Architecture for Fast Feature Embedding [Internet]. Available: http://arxiv.org/abs/1408.5093.
  24. K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Networks [Internet]. Available: https://github.com/KaimingHe/deep-residual-networks.
  25. J. Deng, W. Dong, R. Socher, J.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in Proceedings of Computer Vision and Pattern Recognition, pp. 248-255, 2009.