Design and Implementation of Accelerator Architecture for Binary Weight Network on FPGA with Limited Resources

Kim, Jong-Hyun;Yun, SangKyun;

doi:10.7471/ikeee.2020.24.1.225

Journal of IKEEE (전기전자학회논문지)

Volume 24 Issue 1
/
Pages.225-231
/
2020
/
1226-7244(pISSN)
/
2288-243X(eISSN)

Institute of Korean Electrical and Electronics Engineers (한국전기전자학회)

DOI QR Code

Design and Implementation of Accelerator Architecture for Binary Weight Network on FPGA with Limited Resources

한정된 자원을 갖는 FPGA에서의 이진가중치 신경망 가속처리 구조 설계 및 구현

Kim, Jong-Hyun (Telechips Inc.) ;
Yun, SangKyun (Department of Computer and Telecomm. Engineering, Yonsei University)

김종현 ;
윤상균

Received : 2020.03.06
Accepted : 2020.03.23
Published : 2020.03.31

https://doi.org/10.7471/ikeee.2020.24.1.225 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a method to accelerate BWN based on FPGA with limited resources for embedded system. Because of the limited number of logic elements available, a single computing unit capable of handling Conv-layer, FC-layer of various sizes must be designed and reused. Also, if the input feature map can not be parallel processed at one time, the output must be calculated by reading the inputs several times. Since the number of available BRAM modules is limited, the number of data bits in the BWN accelerator must be minimized. The image classification processing time of the BWN accelerator is superior when compared with a embedded CPU and is faster than a desktop PC and 50% slower than a GPU system. Since the BWN accelerator uses a slow clock of 50MHz, it can be seen that the BWN accelerator is advantageous in performance versus power.

본 연구에서는 임베디드 시스템에 적용하기 위해 자원이 제한된 조건의 FPGA를 기반으로 BWN 가속처리를 하는 방법을 제시하였다. 사용할 수 있는 로직의 개수가 제한적이기 때문에 다양한 크기의 Conv-layer, FC-layer를 처리할 수 있는 하나의 연산장치를 설계해서 재활용하였다. Input feature map 데이터를 한번에 병렬처리를 할 수 없는 경우 데이터를 여러 번 읽어서 중간결과를계산하고 합산하여 최종 출력을 계산하였다. 사용할 수 있는 BRAM 모듈 개수가 제한적이기 때문에 BWN 가속기내의 데이터 bit수를 최소화한 구조를 사용하였다. 구현한 BWN가속기의 이미지 분류 처리 시간은 소형 시스템과 비교하였을 때 처리시간 측면에서 우수함을 보였고 고성능 시스템과 비교하였을 때는 데스크탑 PC보다는 빠르고 높은 클럭속도의 GPU시스템의 50%정도 느렸다. BWN가속기는 50MHz의 느린 clock을 사용하므로 성능대비 전력측면에서 유리함을 확인할 수 있었다.

Keywords

References

C. Zhang and P. Li, "Optimizing Fpga-based accelerator design for deep convolutional neural networks," in FPGA'15, pp.161-170, 2015. DOI: 10.1145/2684746.2689060
J. Qiu and J. Wang, "Going deeper with embedded FPGA platform for convolutional neural network," in FPGA'16, pp.26-35, 2016. DOI: 10.1145/2847263.2847265
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556, 2014.
Y. H. Chen and T. Krishna, "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," in 2016 IEEE Int. Solid-State Circuits Conf.(ISSCC), pp.262-263, 2016. DOI: 10.1109/JSSC.2016.2616357
K. He and X. Zhang. "Deep residual learning for image recognition," arXiv:1512.0338, 2015.
S. Gupta, A. Agrawal, et.al, "Deep learning with limited numerical precision," arXiv:1502.02551, 2015.
D. Lin, S. Talathi, V. Annapureddy, "Fixed point quantization of deep convolutional networks," arXiv:1511.06393, 2016.
M. Courbariaux, Y. Bengio, and J.-P. David, "BinaryConnect: Training deep neural networks with binary weights during propagations," in Proc. Adv. Neural Inf. Process. Syst., pp.3123-3131, 2015.
M. Courbariaux and Y. Bengio, "Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1,"CoRR. 2016.
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized neural networks," in Proc. Adv. Neural Inf. Process. Syst., pp.4107-4115, 2016.
M. Rastegari and V. Ordonez, "XNOR-Net: ImageNet classication using binary convolutional neural networks," In Proc. the European Conf. Computer Vision(ECCV'16), pp.525-542, 2016.
R. Zhao and W. Song, "Accelerating binarized convolutional neural networks with softwareprogrammable FPGAs," in FPGA'17, pp.15-24, 2017. DOI: 10.1145/3020078.3021741
Y. Umuroglu and N. J. Fraser, "FINN: a framework for fast, scalable binarized neural network inference," in FPGA'17, pp.65-74, 2017. DOI: 10.1145/3020078.3021744
S. Liang and S. Yin, "FP-BNN: Binarized neural network on FPGA," Neurocomputing, vol. 275, pp.1072-1086, 2018. DOI: 10.1016/j.neucom.2017.09.046
R. Andri and L. Cavigelli, "YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights," in ISVLSI '16, pp.236-241, 2016. DOI: 10.1109/ISVLSI.2016.111
CIFAR-10 and CIFAR-100 datasets, https://www.cs.toronto.edu/-kriz/cifar.html
J. H. Kim and S. K. Yun, "Accuracy analysis of fixed point arithmetic for hardware implementation of binary weight network," Journal of IKEEE, Vol.22, No.3, 805-809, 2019. DOI: 10.7471/ikeee.2018.22.3.805

Journal of IKEEE (전기전자학회논문지)

Design and Implementation of Accelerator Architecture for Binary Weight Network on FPGA with Limited Resources

한정된 자원을 갖는 FPGA에서의 이진가중치 신경망 가속처리 구조 설계 및 구현

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)