[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7471/ikeee.2020.24.3.895

Implementation of low power BSPE Core for deep learning hardware accelerators

Jo, Cheol-Won (Dept. of Computer Eng., Seokyeong University)
Lee, Kwang-Yeob (Dept. of Electronics and Computer Eng., Seokyeong University)
Nam, Ki-Hun (Dept. of Computer Eng., Seokyeong University)

Publication Information

Journal of IKEEE / v.24, no.3, 2020 , pp. 895-900 More about this Journal

Abstract

In this paper, BSPE replaced the existing multiplication algorithm that consumes a lot of power. Hardware resources are reduced by using a bit-serial multiplier, and variable integer data is used to reduce memory usage. In addition, MOA resource usage and power usage were reduced by applying LOA (Lower-part OR Approximation) to MOA (Multi Operand Adder) used to add partial sums. Therefore, compared to the existing MBS (Multiplication by Barrel Shifter), hardware resource reduction of 44% and power consumption of 42% were reduced. Also, we propose a hardware architecture design for BSPE Core.

Keywords

Deep Learning; quantization; BSPE; LOA; Overlapping Computation;

Citations & Related Records

Reference

1	C. W. Cho, G. Y. Lee, "Low power for deep learning hardware accelerators Bit-Serial Multiplier based Processing Element," IKEEE Conference, 2020.
2	C. W. Cho, G. Y. Lee, "Bit-Serial multiplier based Neural Processing Element with Approximate adder tree," International SoC Design Conference (ISOCC), 2020.
3	Mahdiani, Hamid Reza, et al. "Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications," IEEE Transactions on Circuits and Systems I: Regular Papers, Vol.57, No.4 pp.850-862, 2009. DOI: 10.1109/TCSI.2009.2027626 DOI
4	Abdelouahab, Kamel, Maxime Pelcat, and Francois Berry. "The challenge of multi-operand adders in CNNs on FPGAs: how not to solve it!," Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. pp.157-160, 2018. DOI: 10.1145/3229631.3235024
5	Chen, Tianshi, et al. "Diannao: A small-footprint high-throughput accelerator for ubiquitous machinelearning," ACM SIGARCH Computer Architecture News, Vol.42, No.1, pp.269-284, 2014. DOI: 10.1145/2541940.2541967 DOI
6	Chen, Yu-Hsin, et al. "Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks," IEEE journal of solidstate circuits, Vol.52, No.1 pp.127-138, 2016. DOI: 10.1109/JSSC.2016.2616357 DOI
7	Jouppi, Norman P., et al. "In-datacenter performance analysis of a tensor processing unit," Proceedings of the 44th Annual International Symposium on Computer Architecture, Vol.45, No.2, 2017. DOI: 10.1145/3140659.3080246
8	Lee, Jinmook, et al. "UNPU: A 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision," 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2018. DOI: 10.1109/ISSCC.2018.8310262
9	Abdelouahab, Kamel, Maxime Pelcat, and Francois Berry. "The challenge of multi-operand adders in CNNs on FPGAs: how not to solve it!," Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. pp.187-160, 2018. DOI: 10.1145/3229631.3235024
10	Park, Hyunbin, Dohyun Kim, and Shiho Kim. "Digital Neuron: A Hardware Inference Accelerator for Convolutional Deep Neural Networks," arXiv preprint arXiv:1812.07517, 2018.
11	Sharma, Hardik, et al. "Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network," 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018. DOI: 10.1109/ISCA.2018.00069
12	Alwani, Manoj, et al. "Fused-layer CNN accelerators," 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016. DOI: 10.5555/3195638.3195664

KSCI

Implementation of low power BSPE Core for deep learning hardware accelerators 딥러닝을 하드웨어 가속기를 위한 저전력 BSPE Core 구현

Implementation of low power BSPE Core for deep learning hardware accelerators