DOI QR코드

DOI QR Code

Toward Optimal FPGA Implementation of Deep Convolutional Neural Networks for Handwritten Hangul Character Recognition

  • Park, Hanwool (School of Computer Science and Electrical Engineering, Handong Global University) ;
  • Yoo, Yechan (School of Computer Science and Electrical Engineering, Handong Global University) ;
  • Park, Yoonjin (School of Computer Science and Electrical Engineering, Handong Global University) ;
  • Lee, Changdae (School of Computer Science and Electrical Engineering, Handong Global University) ;
  • Lee, Hakkyung (School of Computer Science and Electrical Engineering, Handong Global University) ;
  • Kim, Injung (School of Computer Science and Electrical Engineering, Handong Global University) ;
  • Yi, Kang (School of Computer Science and Electrical Engineering, Handong Global University)
  • 투고 : 2017.09.14
  • 심사 : 2018.02.09
  • 발행 : 2018.03.30

초록

Deep convolutional neural network (DCNN) is an advanced technology in image recognition. Because of extreme computing resource requirements, DCNN implementation with software alone cannot achieve real-time requirement. Therefore, the need to implement DCNN accelerator hardware is increasing. In this paper, we present a field programmable gate array (FPGA)-based hardware accelerator design of DCNN targeting handwritten Hangul character recognition application. Also, we present design optimization techniques in SDAccel environments for searching the optimal FPGA design space. The techniques we used include memory access optimization and computing unit parallelism, and data conversion. We achieved about 11.19 ms recognition time per character with Xilinx FPGA accelerator. Our design optimization was performed with Xilinx HLS and SDAccel environment targeting Kintex XCKU115 FPGA from Xilinx. Our design outperforms CPU in terms of energy efficiency (the number of samples per unit energy) by 5.88 times, and GPGPU in terms of energy efficiency by 5 times. We expect the research results will be an alternative to GPGPU solution for real-time applications, especially in data centers or server farms where energy consumption is a critical problem.

키워드

과제정보

연구 과제 주관 기관 : Handong Global University

참고문헌

  1. S. S. Farfade, M. J. Saberian, and L. J. Li, "Multi-view face detection using deep convolutional neural networks," in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, China, 2015, pp. 643-650.
  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, vol. 25, pp. 1097-1105, 2012.
  3. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and F. F. Li, "Large-scale video classification with convolutional neural networks," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 1725-1732.
  4. H. Takenouchi, T. Watanabe, and H. Asai, "Development of DT-CNN emulator based on GPGPU," in Proceedings of the 2009 RISP International Workshop on Nonlinear Circuits and Signal Processing (NCSP2009), Honolulu, HI, 2009.
  5. K. Ovtcharov, O. Ruwase, J. Y. Kim, J. Fowers, K. Strauss, and E. S. Chung, "Accelerating deep convolutional neural networks using specialized hardware," Microsoft Research Whitepaper, 2015.
  6. C. Farabet, B. Martini, P. Akselrod, S. Talay, Y. LeCun, and E. Culurciello, "Hardware accelerated convolutional neural networks for synthetic vision systems," in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), Paris, France, 2010, pp. 257-260.
  7. M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal, "Memory-centric accelerator design for convolutional neural networks," in Proceedings of the 2013 IEEE 31st International Conference on Computer Design (ICCD), Asheville, NC, 2013, pp. 13-19.
  8. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPGA-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, 2015, pp. 161-170.
  9. N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J. S. Seo, and Y. Cao, "Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, 2016, pp. 16-25.
  10. I. J. Kim and X. Xie, "Handwritten Hangul recognition using deep convolutional neural networks," International Journal on Document Analysis and Recognition (IJDAR), vol. 18, no. 1, pp. 1-13, 2015. https://doi.org/10.1007/s10032-014-0229-4
  11. Y. Bengio, "Learning deep architectures for AI," Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009. https://doi.org/10.1561/2200000006