DOI QR코드

DOI QR Code

A low-cost compensated approximate multiplier for Bfloat16 data processing on convolutional neural network inference

  • Received : 2020.09.25
  • Accepted : 2021.02.25
  • Published : 2021.08.01

Abstract

This paper presents a low-cost two-stage approximate multiplier for bfloat16 (brain floating-point) data processing. For cost-efficient approximate multiplication, the first stage implements Mitchell's algorithm that performs the approximate multiplication using only two adders. The second stage adopts the exact multiplication to compensate for the error from the first stage by multiplying error terms and adding its truncated result to the final output. In our design, the low-cost multiplications in both stages can reduce hardware costs significantly and provide low relative errors by compensating for the error from the first stage. We apply our approximate multiplier to the convolutional neural network (CNN) inferences, which shows small accuracy drops with well-known pre-trained models for the ImageNet database. Therefore, our design allows low-cost CNN inference systems with high test accuracy.

Keywords

Acknowledgement

This research was supported by the research fund of Dankook University in 2018.

References

  1. J. N. Mitchell, Computer multiplication and division using binary logarithms, IRE Trans. Electr. Comput. EC-11 (1962), no. 4, 512-517. https://doi.org/10.1109/TEC.1962.5219391
  2. D. J. McLaren, Improved Mitchell-based logarithmic multiplier for low-power dsp applications, in Proc. IEEE Int. [Systems-on-Chip] SOC Conf. (Portland, OR, USA), Sept. 2003, pp. 53-56.
  3. J. M. Jou, S. R. Kuang, and R. Der Chen, Design of low-error fixed-width multipliers for DSP applications, IEEE Trans. Circuits Syst. II Analog Digit. Signal Process. 46 (1999), no. 6, 836-842. https://doi.org/10.1109/82.769795
  4. L.-D. Van, S.-S. Wang, and W.-S. Feng, Design of the lower error fixed-width multiplier and its application, IEEE Trans. Circuits Syst. II Analog Digit. Signal Process. 47 (2000), no. 10, 1112-1118. https://doi.org/10.1109/82.877155
  5. S. J. Jon and H. H. Wang, Fixed-width multiplier for DSP application, in Proc. Int. Conf. Comput. Des. (Austin, TX, USA), Sept. 2000, pp. 318-322.
  6. K.-J. Cho et al., Design of low-error fixed-width modified booth multiplier, IEEE Trans Very Large Scale Integr. VLSI Syst. 12 (2004), no. 5, 522-531. https://doi.org/10.1109/TVLSI.2004.825853
  7. S. S. Bhusare and V. S. Kanchana Bhaaskaran, Fixed-width multiplier with simple compensation bias, Procedia Mater. Sci. 10 (2015), 395-402. https://doi.org/10.1016/j.mspro.2015.06.071
  8. Z. Babic, A. Avramovic, and P. Bulic, An itera-tive logarithmic multiplier, Microprocess. Microsyst. 35 (2011), no. 1, 23-33. https://doi.org/10.1016/j.micpro.2010.07.001
  9. P. Kulkarni, P. Gupta, and M. D. Ercegovac, Trading accuracy for power in a multiplier architecture, J. Low Power Electron. 7 (2011), no. 4, 490-501. https://doi.org/10.1166/jolpe.2011.1157
  10. M. B. Sullivan and E. E. Swartzlander, Truncated error correction for flexible approximate multiplication, in Proc. Asilomar Conf. Signals, Syst. Comput. (ASILOMAR) (Pacific Grove, CA, USA), Nov. 2012, pp. 355-359.
  11. S. Hashemi, R. Bahar, and S. Reda, DRUM: A dynamic range unbiased multiplier for approximate applications, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des. (Austin, TX, USA), Nov. 2015, pp. 418-425.
  12. H. Jiang et al., Approxi-mate radix-8 booth multipliers for low-power and high-performance operation, IEEE Trans. Comput. 65 (2016), no. 8, 2638-2644. https://doi.org/10.1109/TC.2015.2493547
  13. S. E. Ahmed, S. Kadam, and M. B. Srinivas, An iterative logarithmic multiplier with improved precision, IEEE Symp. Comput. Arithmetic (ARITH), (Silicon Valley, CA, USA), July 2016, pp. 104-111.
  14. R. Zendegani et al., RoBA multiplier: A rounding-based approximate multiplier for high-speed yet energy-efficient digital signal processing, IEEE Trans. Very Large Scale Integr. VLSI Syst. 25 (2017), no. 2, 393-401. https://doi.org/10.1109/TVLSI.2016.2587696
  15. V. Mrazek et al., Evoapprox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods, in Proc. Des., Autom. Test (Lausanne, Switzerland), Mar. 2017, pp. 258-261.
  16. W. Liu et al., Design of approximate logarithmic multipliers, in Proc. Great Lakes Symp. VLSI, (Banff, Canada), May 2017, pp. 47-52.
  17. M. S. Kim et al., Low-power implementation of Mitchell's approximate logarithmic multiplication for convolutional neural networks, in Proc. Asia S. Pac. Des. Autom. Conf. (ASP-DAC), (Jeju, Rep. of Korea), Jan. 2018, pp. 617-622.
  18. I. Alouani et al., A novel heterogeneous approximate multiplier for low power and high performance, IEEE Embed. Syst. Lett. 10 (2018), no. 2, 45-48. https://doi.org/10.1109/les.2017.2778341
  19. S. Ullah, S. S. Murthy, and A. Kumar, Smapproxlib: Library of FPGA-based approximate multipliers, in Proc. ACM/ESDA/IEEE Des. Autom. Conf. (DAC), (San Francisco, CA, USA), June 2018, pp. 1-6.
  20. P. Yin et al., Design of dynamic range approximate logarithmic multipliers, in Proc. Great Lakes Symp. VLSI, (Chicago, IL, USA), May 2018, pp. 423-426.
  21. M. S. Kim et al, Efficient Mitchell's approximate log multipliers for convolutional neural networks, IEEE Trans. Comput. 68 (2018), no. 5, 660-675. https://doi.org/10.1109/tc.2018.2880742
  22. H. J. Kim et al., A cost-efficient iterative truncated logarithmic multiplication for convolutional neural networks, in Proc. IEEE Symp. Comput. Arithmetic (ARITH), (Kyoto, Japan), June 2019, pp. 108-111.
  23. Z. Babic, A. Avramovic, and P. Bulic, An iterative Mitchell's algorithm based multiplier, in Proc. IEEE Int. Symp. Signal Process. Inform. Technol. (Sarajevo, Bosnia and Herzegovina), Dec. 2008, pp. 303-308.
  24. N. Burgess et al., Bfloat16 processing for neural networks, in Proc. IEEE Symp. Comput. Arithmetic (ARITH), (Kyoto, Japan), June 2019, pp. 88-91.
  25. D. Lutz, Arm floating point 2019: Latency, area, power, in Proc. IEEE Symp. Comput. Arithmetic (ARITH), (Kyoto, Japan), June 2019, pp. 97-98.
  26. D. Kalamkar et al., A study of bfloat16 for deep learning training, arXiv preprint, CoRR, 2019, arXiv: 1905.12322.
  27. K. H. Abed and R. E. Siferd, CMOS VLSI implementation of a low-power logarithmic converter, IEEE Trans. Comput. 52 (2003), no. 11, 1421-1433. https://doi.org/10.1109/TC.2003.1244940
  28. Y. Jia et al., Caffe: Convolutional architecture for fast feature embedding, in Proc. ACM Int. Conf. Multimed. (Orlando, FL, USA), Nov. 2014, pp. 675-678.
  29. A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Proc. Int. Conf. Neural Inform. Process. Syst. (Red Hook, NY, USA), Dec. 2012, pp. 1097-1105.
  30. K. Chatfield et al., Return of the devil in the details: Delving deep into convolutional nets, arXiv preprint, CoRR, 2014, arXiv: 1405.3531.
  31. C. Szegedy et al., Going deeper with convolutions, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Boston, MA, USA), June 2015, pp. 1-9.
  32. K. He et al., Deep residual learning for image recognition, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Las Vegas, NV, USA), June 2016, pp. 770-778.
  33. C. Szegedy et al., Inception-v4, inception-resnet and the impact of residual connections on learning, arXiv preprint, CoRR, 2016, arXiv: 1602.07261.
  34. A. G. Howard et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint, CoRR, 2017, arXiv: 1704.04861.
  35. M. Sandler et al., Mobilenetv2: Inverted residuals and linear bottlenecks, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Salt Lake City, UT, USA), June 2018, pp. 4510-4520.
  36. J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Salt Lake City, UT, USA), June 2018, pp. 7132-7141.
  37. G. Huang, et al., Convolutional networks with dense connectivity, IEEE Trans. Pattern Anal. Mach. Intell. (2019), 1.
  38. O. Russakovsky, et al., Imagenet large scale visual recognition challenge, Int. J. Comput. Vis. (IJCV) 115 (2015), no. 3, 211-252. https://doi.org/10.1007/s11263-015-0816-y