Browse > Article
http://dx.doi.org/10.7471/ikeee.2020.24.3.895

Implementation of low power BSPE Core for deep learning hardware accelerators  

Jo, Cheol-Won (Dept. of Computer Eng., Seokyeong University)
Lee, Kwang-Yeob (Dept. of Electronics and Computer Eng., Seokyeong University)
Nam, Ki-Hun (Dept. of Computer Eng., Seokyeong University)
Publication Information
Journal of IKEEE / v.24, no.3, 2020 , pp. 895-900 More about this Journal
Abstract
In this paper, BSPE replaced the existing multiplication algorithm that consumes a lot of power. Hardware resources are reduced by using a bit-serial multiplier, and variable integer data is used to reduce memory usage. In addition, MOA resource usage and power usage were reduced by applying LOA (Lower-part OR Approximation) to MOA (Multi Operand Adder) used to add partial sums. Therefore, compared to the existing MBS (Multiplication by Barrel Shifter), hardware resource reduction of 44% and power consumption of 42% were reduced. Also, we propose a hardware architecture design for BSPE Core.
Keywords
Deep Learning; quantization; BSPE; LOA; Overlapping Computation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. W. Cho, G. Y. Lee, "Low power for deep learning hardware accelerators Bit-Serial Multiplier based Processing Element," IKEEE Conference, 2020.
2 C. W. Cho, G. Y. Lee, "Bit-Serial multiplier based Neural Processing Element with Approximate adder tree," International SoC Design Conference (ISOCC), 2020.
3 Mahdiani, Hamid Reza, et al. "Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications," IEEE Transactions on Circuits and Systems I: Regular Papers, Vol.57, No.4 pp.850-862, 2009. DOI: 10.1109/TCSI.2009.2027626   DOI
4 Abdelouahab, Kamel, Maxime Pelcat, and Francois Berry. "The challenge of multi-operand adders in CNNs on FPGAs: how not to solve it!," Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. pp.157-160, 2018. DOI: 10.1145/3229631.3235024
5 Chen, Tianshi, et al. "Diannao: A small-footprint high-throughput accelerator for ubiquitous machinelearning," ACM SIGARCH Computer Architecture News, Vol.42, No.1, pp.269-284, 2014. DOI: 10.1145/2541940.2541967   DOI
6 Chen, Yu-Hsin, et al. "Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks," IEEE journal of solidstate circuits, Vol.52, No.1 pp.127-138, 2016. DOI: 10.1109/JSSC.2016.2616357   DOI
7 Jouppi, Norman P., et al. "In-datacenter performance analysis of a tensor processing unit," Proceedings of the 44th Annual International Symposium on Computer Architecture, Vol.45, No.2, 2017. DOI: 10.1145/3140659.3080246
8 Lee, Jinmook, et al. "UNPU: A 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision," 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2018. DOI: 10.1109/ISSCC.2018.8310262
9 Abdelouahab, Kamel, Maxime Pelcat, and Francois Berry. "The challenge of multi-operand adders in CNNs on FPGAs: how not to solve it!," Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. pp.187-160, 2018. DOI: 10.1145/3229631.3235024
10 Park, Hyunbin, Dohyun Kim, and Shiho Kim. "Digital Neuron: A Hardware Inference Accelerator for Convolutional Deep Neural Networks," arXiv preprint arXiv:1812.07517, 2018.
11 Sharma, Hardik, et al. "Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network," 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018. DOI: 10.1109/ISCA.2018.00069
12 Alwani, Manoj, et al. "Fused-layer CNN accelerators," 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016. DOI: 10.5555/3195638.3195664