• Title/Summary/Keyword: Deep Learning Model Parameter Quantization

Search Result 5, Processing Time 0.023 seconds

Analysis of Deep learning Quantization Technology for Micro-sized IoT devices (초소형 IoT 장치에 구현 가능한 딥러닝 양자화 기술 분석)

  • YoungMin KIM;KyungHyun Han;Seong Oun Hwang
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.1
    • /
    • pp.9-17
    • /
    • 2023
  • Deep learning with large amount of computations is difficult to implement on micro-sized IoT devices or moblie devices. Recently, lightweight deep learning technologies have been introduced to make sure that deep learning can be implemented even on small devices by reducing the amount of computation of the model. Quantization is one of lightweight techniques that can be efficiently used to reduce the memory and size of the model by expressing parameter values with continuous distribution as discrete values of fixed bits. However, the accuracy of the model is reduced due to discrete value representation in quantization. In this paper, we introduce various quantization techniques to correct the accuracy. We selected APoT and EWGS from existing quantization techniques, and comparatively analyzed the results through experimentations The selected techniques were trained and tested with CIFAR-10 or CIFAR-100 datasets in the ResNet model. We found out problems with them through experimental results analysis and presented directions for future research.

Lightweight of ONNX using Quantization-based Model Compression (양자화 기반의 모델 압축을 이용한 ONNX 경량화)

  • Chang, Duhyeuk;Lee, Jungsoo;Heo, Junyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.93-98
    • /
    • 2021
  • Due to the development of deep learning and AI, the scale of the model has grown, and it has been integrated into other fields to blend into our lives. However, in environments with limited resources such as embedded devices, it is exist difficult to apply the model and problems such as power shortages. To solve this, lightweight methods such as clouding or offloading technologies, reducing the number of parameters in the model, or optimising calculations are proposed. In this paper, quantization of learned models is applied to ONNX models used in various framework interchange formats, neural network structure and inference performance are compared with existing models, and various module methods for quantization are analyzed. Experiments show that the size of weight parameter is compressed and the inference time is more optimized than before compared to the original model.

Compression of DNN Integer Weight using Video Encoder (비디오 인코더를 통한 딥러닝 모델의 정수 가중치 압축)

  • Kim, Seunghwan;Ryu, Eun-Seok
    • Journal of Broadcast Engineering
    • /
    • v.26 no.6
    • /
    • pp.778-789
    • /
    • 2021
  • Recently, various lightweight methods for using Convolutional Neural Network(CNN) models in mobile devices have emerged. Weight quantization, which lowers bit precision of weights, is a lightweight method that enables a model to be used through integer calculation in a mobile environment where GPU acceleration is unable. Weight quantization has already been used in various models as a lightweight method to reduce computational complexity and model size with a small loss of accuracy. Considering the size of memory and computing speed as well as the storage size of the device and the limited network environment, this paper proposes a method of compressing integer weights after quantization using a video codec as a method. To verify the performance of the proposed method, experiments were conducted on VGG16, Resnet50, and Resnet18 models trained with ImageNet and Places365 datasets. As a result, loss of accuracy less than 2% and high compression efficiency were achieved in various models. In addition, as a result of comparison with similar compression methods, it was verified that the compression efficiency was more than doubled.

Quantized CNN-based Super-Resolution Method for Compressed Image Reconstruction (압축된 영상 복원을 위한 양자화된 CNN 기반 초해상화 기법)

  • Kim, Yongwoo;Lee, Jonghwan
    • Journal of the Semiconductor & Display Technology
    • /
    • v.19 no.4
    • /
    • pp.71-76
    • /
    • 2020
  • In this paper, we propose a super-resolution method that reconstructs compressed low-resolution images into high-resolution images. We propose a CNN model with a small number of parameters, and even if quantization is applied to the proposed model, super-resolution can be implemented without deteriorating the image quality. To further improve the quality of the compressed low-resolution image, a new degradation model was proposed instead of the existing bicubic degradation model. The proposed degradation model is used only in the training process and can be applied by changing only the parameter values to the original CNN model. In the super-resolution image applying the proposed degradation model, visual artifacts caused by image compression were effectively removed. As a result, our proposed method generates higher PSNR values at compressed images and shows better visual quality, compared to conventional CNN-based SR methods.

Lightweight Deep Learning Model for Real-Time 3D Object Detection in Point Clouds (실시간 3차원 객체 검출을 위한 포인트 클라우드 기반 딥러닝 모델 경량화)

  • Kim, Gyu-Min;Baek, Joong-Hwan;Kim, Hee Yeong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.9
    • /
    • pp.1330-1339
    • /
    • 2022
  • 3D object detection generally aims to detect relatively large data such as automobiles, buses, persons, furniture, etc, so it is vulnerable to small object detection. In addition, in an environment with limited resources such as embedded devices, it is difficult to apply the model because of the huge amount of computation. In this paper, the accuracy of small object detection was improved by focusing on local features using only one layer, and the inference speed was improved through the proposed knowledge distillation method from large pre-trained network to small network and adaptive quantization method according to the parameter size. The proposed model was evaluated using SUN RGB-D Val and self-made apple tree data set. Finally, it achieved the accuracy performance of 62.04% at mAP@0.25 and 47.1% at mAP@0.5, and the inference speed was 120.5 scenes per sec, showing a fast real-time processing speed.