[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7471/ikeee.2019.23.4.1314

Bit Operation Optimization and DNN Application using GPU Acceleration

Kim, Sang Hyeok (Dept. of Computer Engineering, Hanbat National University)
Lee, Jae Heung (Dept. of Computer Engineering, Hanbat National University)

Publication Information

Journal of IKEEE / v.23, no.4, 2019 , pp. 1314-1320 More about this Journal

Abstract

In this paper, we propose a new method for optimizing bit operations and applying them to DNN(Deep Neural Network) in software environment. As a method for this, we propose a packing function for bitwise optimization and a masking matrix multiplication operation for application to DNN. The packing function converts 32-bit real value to 2-bit quantization value through threshold comparison operation. When this sequence is over, four 32-bit real values are changed to one 8-bit value. The masking matrix multiplication operation consists of a special operation for multiplying the packed weight value with the normal input value. And each operation was then processed in parallel using a GPU accelerator. As a result of this experiment, memory saved about 16 times than 32-bit DNN Model. Nevertheless, the accuracy was within 1%, similar to the 32-bit model.

Keywords

AI; Deep Learning; Neural Network; Memory Saving; Optimization;

Citations & Related Records

Reference

1	N. S. Sohoni, C. R. Aberger, M. Leszczyynski, J. Zhamg and C. Re "Low Memory Neural Network Training: A Technical Report," https://arxiv.org/abs/1904.10631
2	M. Courbariaux, Y. Bengio and J. David, "Binary Connect: Training Deep Neural Networks with binary weights during propagations," 2015. https://arxiv.org/abs/1511.00363
3	C. Zhu, S. Han, H. Mao and W. J. Dally, "Trained ternary quantization," International Conference on Learning Representations, 2017.
4	S. Han, H. Mao and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," NIPS Deep Learning Symposium, 2015.
5	Nikola Sakharnykh, "Maximizing Unified Memory Performance in CUDA," https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda
6	J. Choi, P. I-Jen, C. Z. Wang, S. Venkataramani, V. Srinivasan and K. Gopalakrishnan, "Bridging the Accuracy Gap for 2-bit Quantized Neural Networks(QNN)," https://arxiv.org/abs/1807.06964
7	S. Uhlich, L. Mauch, K. Yoshiyama, F. Cardinaux, J. A. Garcia, S. Tiedmann, T. Kemp and A. Nakamura, "Differentiable Quantization of Deep Neural Networks," https://arxiv.org/abs/1905.11452
8	M. Rastegari, V. Ordonez, J. Redmon and A. Farhadi., "Xnor-net: Imagenet classification using binary convolution neural networks," European Conference on Computer Vision, pp.525-542, 2016.
9	J. Choi, S. Venkataramani, V. Srinivasan, K. Gopalakrishana, Z. Wang, and P. Chuang, "Accurate And Efficient 2-bit Quantized Neural Networks," https://sysml.cc/doc/2019/168.pdf
10	F. Li, B. Zhang and B. Liu, "Ternary Weight Networks," https://arxiv.org/abs/1605.04711

KSCI

Bit Operation Optimization and DNN Application using GPU Acceleration GPU 가속기를 통한 비트 연산 최적화 및 DNN 응용

Bit Operation Optimization and DNN Application using GPU Acceleration