• Title/Summary/Keyword: Convolutional multiplication

Search Result 10, Processing Time 0.02 seconds

Analysis of Reduced-Width Truncated Mitchell Multiplication for Inferences Using CNNs

  • Kim, HyunJin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.5
    • /
    • pp.235-242
    • /
    • 2020
  • This paper analyzes the effect of reduced output width of the truncated logarithmic multiplication and application to inferences using convolutional neural networks (CNNs). For small hardware overhead, output width is reduced in the truncated Mitchell multiplier, so that fractional bits in multiplication output are minimized in error-resilient applications. This analysis shows that when reducing output width in the truncated Mitchell multiplier, even though worst-case relative error increases, average relative error can be kept small. When adopting 8 fractional bits in multiplication output in the evaluations, there is no significant performance degradation in target CNNs compared to existing exact and original Mitchell multipliers.

A low-cost compensated approximate multiplier for Bfloat16 data processing on convolutional neural network inference

  • Kim, HyunJin
    • ETRI Journal
    • /
    • v.43 no.4
    • /
    • pp.684-693
    • /
    • 2021
  • This paper presents a low-cost two-stage approximate multiplier for bfloat16 (brain floating-point) data processing. For cost-efficient approximate multiplication, the first stage implements Mitchell's algorithm that performs the approximate multiplication using only two adders. The second stage adopts the exact multiplication to compensate for the error from the first stage by multiplying error terms and adding its truncated result to the final output. In our design, the low-cost multiplications in both stages can reduce hardware costs significantly and provide low relative errors by compensating for the error from the first stage. We apply our approximate multiplier to the convolutional neural network (CNN) inferences, which shows small accuracy drops with well-known pre-trained models for the ImageNet database. Therefore, our design allows low-cost CNN inference systems with high test accuracy.

Low-Power Multiplication Processing Element Hardware to Support Parallel Convolutional Neural Network Processing (합성곱 신경망 병렬 연산처리를 지원하는 저전력 곱셈 프로세싱 엘리먼트 설계)

  • Eunpyoung Park;Jongsu Park
    • Journal of Platform Technology
    • /
    • v.12 no.2
    • /
    • pp.58-63
    • /
    • 2024
  • CNNs tend to take a long time to learn and consume a lot of power due to lack of system resources with many data processing units when there are repetitive handles that do not have high performance in the image field. In this paper, we propose a handling method based on a low-power bus that can increase the exchange rate of multipliers and multiplicands within the convolution mixer, which is a tendency activity that occurs when a convolution mixer has multiplication, which is the core element of combination. Convolutional neural networks have proprietary low-power shared processor support and the design was implemented on an Intel DE1-SoC FPGA board using Verilog-HDL. The experiments validated the performance by comparing it with the exchange rate of the multiplier originally proposed by Shen on MNIST's numeric image database.

  • PDF

Parallel-Addition Convolution Algorithm in Grayscale Image (그레이스케일 영상의 병렬가산 컨볼루션 알고리즘)

  • Choi, Jong-Ho
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.4
    • /
    • pp.288-294
    • /
    • 2017
  • Recently, deep learning using convolutional neural network (CNN) has been extensively studied in image recognition. Convolution consists of addition and multiplication. Multiplication is computationally expensive in hardware implementation, relative to addition. It is also important factor limiting a chip design in an embedded deep learning system. In this paper, I propose a parallel-addition processing algorithm that converts grayscale images to the superposition of binary images and performs convolution only with addition. It is confirmed that the convolution can be performed by a parallel-addition method capable of reducing the processing time in experiment for verifying the availability of proposed algorithm.

Design of Multipliers Optimized for CNN Inference Accelerators (CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계)

  • Lee, Jae-Woo;Lee, Jaesung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1403-1408
    • /
    • 2021
  • Recently, FPGA-based AI processors are being studied actively. Deep convolutional neural networks (CNN) are basic computational structures performed by AI processors and require a very large amount of multiplication. Considering that the multiplication coefficients used in CNN inference operation are all constants and that an FPGA is easy to design a multiplier tailored to a specific coefficient, this paper proposes a methodology to optimize the multiplier. The method utilizes 2's complement and distributive law to minimize the number of bits with a value of 1 in a multiplication coefficient, and thereby reduces the number of required stacked adders. As a result of applying this method to the actual example of implementing CNN in FPGA, the logic usage is reduced by up to 30.2% and the propagation delay is also reduced by up to 22%. Even when implemented with an ASIC chip, the hardware area is reduced by up to 35% and the delay is reduced by up to 19.2%.

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

  • Zhou, Xuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.337-351
    • /
    • 2021
  • Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.

A Video Expression Recognition Method Based on Multi-mode Convolution Neural Network and Multiplicative Feature Fusion

  • Ren, Qun
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.556-570
    • /
    • 2021
  • The existing video expression recognition methods mainly focus on the spatial feature extraction of video expression images, but tend to ignore the dynamic features of video sequences. To solve this problem, a multi-mode convolution neural network method is proposed to effectively improve the performance of facial expression recognition in video. Firstly, OpenFace 2.0 is used to detect face images in video, and two deep convolution neural networks are used to extract spatiotemporal expression features. Furthermore, spatial convolution neural network is used to extract the spatial information features of each static expression image, and the dynamic information feature is extracted from the optical flow information of multiple expression images based on temporal convolution neural network. Then, the spatiotemporal features learned by the two deep convolution neural networks are fused by multiplication. Finally, the fused features are input into support vector machine to realize the facial expression classification. Experimental results show that the recognition accuracy of the proposed method can reach 64.57% and 60.89%, respectively on RML and Baum-ls datasets. It is better than that of other contrast methods.

Privacy-preserving and Communication-efficient Convolutional Neural Network Prediction Framework in Mobile Cloud Computing

  • Bai, Yanan;Feng, Yong;Wu, Wenyuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4345-4363
    • /
    • 2021
  • Deep Learning as a Service (DLaaS), utilizing the cloud-based deep neural network models to provide customer prediction services, has been widely deployed on mobile cloud computing (MCC). Such services raise privacy concerns since customers need to send private data to untrusted service providers. In this paper, we devote ourselves to building an efficient protocol to classify users' images using the convolutional neural network (CNN) model trained and held by the server, while keeping both parties' data secure. Most previous solutions commonly employ homomorphic encryption schemes based on Ring Learning with Errors (RLWE) hardness or two-party secure computation protocols to achieve it. However, they have limitations on large communication overheads and costs in MCC. To address this issue, we present LeHE4SCNN, a scalable privacy-preserving and communication-efficient framework for CNN-based DLaaS. Firstly, we design a novel low-expansion rate homomorphic encryption scheme with packing and unpacking methods (LeHE). It supports fast homomorphic operations such as vector-matrix multiplication and addition. Then we propose a secure prediction framework for CNN. It employs the LeHE scheme to compute linear layers while exploiting the data shuffling technique to perform non-linear operations. Finally, we implement and evaluate LeHE4SCNN with various CNN models on a real-world dataset. Experimental results demonstrate the effectiveness and superiority of the LeHE4SCNN framework in terms of response time, usage cost, and communication overhead compared to the state-of-the-art methods in the mobile cloud computing environment.

Compact CNN Accelerator Chip Design with Optimized MAC And Pooling Layers (MAC과 Pooling Layer을 최적화시킨 소형 CNN 가속기 칩)

  • Son, Hyun-Wook;Lee, Dong-Yeong;Kim, HyungWon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1158-1165
    • /
    • 2021
  • This paper proposes a CNN accelerator which is optimized Pooling layer operation incorporated in Multiplication And Accumulation(MAC) to reduce the memory size. For optimizing memory and data path circuit, the quantized 8bit integer weights are used instead of 32bit floating-point weights for pre-training of MNIST data set. To reduce chip area, the proposed CNN model is reduced by a convolutional layer, a 4*4 Max Pooling, and two fully connected layers. And all the operations use specific MAC with approximation adders and multipliers. 94% of internal memory size reduction is achieved by simultaneously performing the convolution and the pooling operation in the proposed architecture. The proposed accelerator chip is designed by using TSMC65nmGP CMOS process. That has about half size of our previous paper, 0.8*0.9 = 0.72mm2. The presented CNN accelerator chip achieves 94% accuracy and 77us inference time per an MNIST image.

A Study on Attack against NTRU Signature Implementation and Its Countermeasure (NTRU 서명 시스템 구현에 대한 오류 주입 공격 및 대응 방안 연구)

  • Jang, Hocheol;Oh, Soohyun;Ha, Jaecheol
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.3
    • /
    • pp.551-561
    • /
    • 2018
  • As the computational technology using quantum computing has been developed, several threats on cryptographic systems are recently increasing. Therefore, many researches on post-quantum cryptosystems which can withstand the analysis attacks using quantum computers are actively underway. Nevertheless, the lattice-based NTRU system, one of the post-quantum cryptosystems, is pointed out that it may be vulnerable to the fault injection attack which uses the weakness of implementation of NTRU. In this paper, we investigate the fault injection attacks and their previous countermeasures on the NTRU signature system and propose a secure and efficient countermeasure to defeat it. As a simulation result, the proposed countermeasure has high fault detection ratio and low implementation costs.