• Title/Summary/Keyword: Deep learning accelerator

Search Result 17, Processing Time 0.03 seconds

Weight Recovery Attacks for DNN-Based MNIST Classifier Using Side Channel Analysis and Implementation of Countermeasures (부채널 분석을 이용한 DNN 기반 MNIST 분류기 가중치 복구 공격 및 대응책 구현)

  • Youngju Lee;Seungyeol Lee;Jeacheol Ha
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.6
    • /
    • pp.919-928
    • /
    • 2023
  • Deep learning technology is used in various fields such as self-driving cars, image creation, and virtual voice implementation, and deep learning accelerators have been developed for high-speed operation in hardware devices. However, several side channel attacks that recover secret information inside the accelerator using side-channel information generated when the deep learning accelerator operates have been recently researched. In this paper, we implemented a DNN(Deep Neural Network)-based MNIST digit classifier on a microprocessor and attempted a correlation power analysis attack to confirm that the weights of deep learning accelerator could be sufficiently recovered. In addition, to counter these power analysis attacks, we proposed a Node-CUT shuffling method that applies the principle of misalignment at the time of power measurement. It was confirmed through experiments that the proposed countermeasure can effectively defend against side-channel attacks, and that the additional calculation amount is reduced by more than 1/3 compared to using the Fisher-Yates shuffling method.

Web Service Platform for Optimal Quantization of CNN Models (CNN 모델의 최적 양자화를 위한 웹 서비스 플랫폼)

  • Roh, Jaewon;Lim, Chaemin;Cho, Sang-Young
    • Journal of the Semiconductor & Display Technology
    • /
    • v.20 no.4
    • /
    • pp.151-156
    • /
    • 2021
  • Low-end IoT devices do not have enough computation and memory resources for DNN learning and inference. Integer quantization of real-type neural network models can reduce model size, hardware computational burden, and power consumption. This paper describes the design and implementation of a web-based quantization platform for CNN deep learning accelerator chips. In the web service platform, we implemented visualization of the model through a convenient UI, analysis of each step of inference, and detailed editing of the model. Additionally, a data augmentation function and a management function of files that store models and inference intermediate results are provided. The implemented functions were verified using three YOLO models.

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

  • Ahn, Saehyun;Kang, Suk-Ju
    • Journal of Broadcast Engineering
    • /
    • v.26 no.2
    • /
    • pp.167-174
    • /
    • 2021
  • Recently, deep learning technology is widely used in various computer vision applications, such as object recognition, classification, and image generation. In particular, the deep learning-based super-resolution has been gaining significant performance improvement. Fast super-resolution convolutional neural network (FSRCNN) is a well-known model as a deep learning-based super-resolution algorithm that output image is generated by a deconvolutional layer. In this paper, we propose an FPGA-based convolutional neural networks accelerator that considers parallel computing efficiency. In addition, the proposed method proposes Optimal-FSRCNN, which is modified the structure of FSRCNN. The number of multipliers is compressed by 3.47 times compared to FSRCNN. Moreover, PSNR has similar performance to FSRCNN. We developed a real-time image processing technology that implements on FPGA.

Reverse Engineering of Deep Learning Network Secret Information Through Side Channel Attack (부채널 분석을 이용한 딥러닝 네트워크 신규 내부 비밀정보 복원 방법 연구)

  • Park, Sujin;Lee, Juheon;Kim, HeeSeok
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.5
    • /
    • pp.855-867
    • /
    • 2022
  • As the need for a deep learning accelerator increases with the development of IoT equipment, research on the implementation and safety verification of the deep learning accelerator is actively. In this paper, we propose a new side channel analysis methodology for secret information that overcomes the limitations of the previous study in Usenix 2019. We overcome the disadvantage of limiting the range of weights and restoring only a portion of the weights in the previous work, and restore the IEEE754 32bit single-precision with 99% accuracy with a new method using CPA. In addition, it overcomes the limitations of existing studies that can reverse activation functions only for specific inputs. Using deep learning, we reverse activation functions with 99% accuracy without conditions for input values with a new method. This paper not only overcomes the limitations of previous studies, but also proves that the proposed new methodology is effective.

CNN-based watermarking processor design optimization method (CNN기반의 워터마킹 프로세서 설계 최적화 방법)

  • Kang, Ji-Won;Lee, Jae-Eun;Seo, Young-Ho;Kim, Dong-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.644-645
    • /
    • 2021
  • In this paper, we propose a hardware structure of a watermarking processor based on deep learning technology to protect the intellectual property rights of ultra-high resolution digital images and videos. We propose an optimization methodology to implement a deep learning-based watermarking algorithm in hardware.

  • PDF

Toward Optimal FPGA Implementation of Deep Convolutional Neural Networks for Handwritten Hangul Character Recognition

  • Park, Hanwool;Yoo, Yechan;Park, Yoonjin;Lee, Changdae;Lee, Hakkyung;Kim, Injung;Yi, Kang
    • Journal of Computing Science and Engineering
    • /
    • v.12 no.1
    • /
    • pp.24-35
    • /
    • 2018
  • Deep convolutional neural network (DCNN) is an advanced technology in image recognition. Because of extreme computing resource requirements, DCNN implementation with software alone cannot achieve real-time requirement. Therefore, the need to implement DCNN accelerator hardware is increasing. In this paper, we present a field programmable gate array (FPGA)-based hardware accelerator design of DCNN targeting handwritten Hangul character recognition application. Also, we present design optimization techniques in SDAccel environments for searching the optimal FPGA design space. The techniques we used include memory access optimization and computing unit parallelism, and data conversion. We achieved about 11.19 ms recognition time per character with Xilinx FPGA accelerator. Our design optimization was performed with Xilinx HLS and SDAccel environment targeting Kintex XCKU115 FPGA from Xilinx. Our design outperforms CPU in terms of energy efficiency (the number of samples per unit energy) by 5.88 times, and GPGPU in terms of energy efficiency by 5 times. We expect the research results will be an alternative to GPGPU solution for real-time applications, especially in data centers or server farms where energy consumption is a critical problem.

Sparse Matrix Compression Technique and Hardware Design for Lightweight Deep Learning Accelerators (경량 딥러닝 가속기를 위한 희소 행렬 압축 기법 및 하드웨어 설계)

  • Kim, Sunhee;Shin, Dongyeob;Lim, Yong-Seok
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.4
    • /
    • pp.53-62
    • /
    • 2021
  • Deep learning models such as convolutional neural networks and recurrent neual networks process a huge amounts of data, so they require a lot of storage and consume a lot of time and power due to memory access. Recently, research is being conducted to reduce memory usage and access by compressing data using the feature that many of deep learning data are highly sparse and localized. In this paper, we propose a compression-decompression method of storing only the non-zero data and the location information of the non-zero data excluding zero data. In order to make the location information of non-zero data, the matrix data is divided into sections uniformly. And whether there is non-zero data in the corresponding section is indicated. In this case, section division is not executed only once, but repeatedly executed, and location information is stored in each step. Therefore, it can be properly compressed according to the ratio and distribution of zero data. In addition, we propose a hardware structure that enables compression and decompression without complex operations. It was designed and verified with Verilog, and it was confirmed that it can be used in hardware deep learning accelerators.

Design of deep learning based hardware accelerator for digital watermarking (디지털 워터마킹을 위한 딥러닝 기반 하드웨어 가속기의 설계)

  • Lee, Jae-Eun;Seo, Young-Ho;Kim, Dong-Wook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.544-545
    • /
    • 2020
  • 본 논문에서는 영상 콘텐츠의 지적재산권 보호를 위하여 딥 러닝을 기반으로 하는 워터마킹 시스템 및 하드웨어 가속기 구조를 제안한다. 제안하는 워터마킹 시스템은 호스트 영상과 워터마크가 같은 해상도를 갖도록 변화시키는 전처리 네트워크, 전처리 네트워크를 거친 호스트 영상과 워터마크를 정합하여 워터마크를 삽입하는 네트워크, 그리고 워터마크를 추출하는 네트워크로 구성된다. 이 중 호스트 영상의 전처리 네트워크와 삽입 네트워크를 하드웨어로 설계한다.

  • PDF

An Implementation of a Convolutional Accelerator based on a GPGPU for a Deep Learning (Deep Learning을 위한 GPGPU 기반 Convolution 가속기 구현)

  • Jeon, Hee-Kyeong;Lee, Kwang-yeob;Kim, Chi-yong
    • Journal of IKEEE
    • /
    • v.20 no.3
    • /
    • pp.303-306
    • /
    • 2016
  • In this paper, we propose a method to accelerate convolutional neural network by utilizing a GPGPU. Convolutional neural network is a sort of the neural network learning features of images. Convolutional neural network is suitable for the image processing required to learn a lot of data such as images. The convolutional layer of the conventional CNN required a large number of multiplications and it is difficult to operate in the real-time on the embedded environment. In this paper, we reduce the number of multiplications through Winograd convolution operation and perform parallel processing of the convolution by utilizing SIMT-based GPGPU. The experiment was conducted using ModelSim and TestDrive, and the experimental results showed that the processing time was improved by about 17%, compared to the conventional convolution.

Design and Implementation of Human and Object Classification System Using FMCW Radar Sensor (FMCW 레이다 센서 기반 사람과 사물 분류 시스템 설계 및 구현)

  • Sim, Yunsung;Song, Seungjun;Jang, Seonyoung;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.26 no.3
    • /
    • pp.364-372
    • /
    • 2022
  • This paper proposes the design and implementation results for human and object classification systems utilizing frequency modulated continuous wave (FMCW) radar sensor. Such a system requires the process of radar sensor signal processing for multi-target detection and the process of deep learning for the classification of human and object. Since deep learning requires such a great amount of computation and data processing, the lightweight process is utmost essential. Therefore, binary neural network (BNN) structure was adopted, operating convolution neural network (CNN) computation in a binary condition. In addition, for the real-time operation, a hardware accelerator was implemented and verified via FPGA platform. Based on performance evaluation and verified results, it is confirmed that the accuracy for multi-target classification of 90.5%, reduced memory usage by 96.87% compared to CNN and the run time of 5ms are achieved.