• Title/Summary/Keyword: matrix vector multiplication

Search Result 34, Processing Time 0.02 seconds

Bit-level Array Structure Representation of Weight and Optimization Method to Design Pre-Trained Neural Network (학습된 신경망 설계를 위한 가중치의 비트-레벨 어레이 구조 표현과 최적화 방법)

  • Lim, Guk-Chan;Kwak, Woo-Young;Lee, Hyun-Soo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.9
    • /
    • pp.37-44
    • /
    • 2002
  • This paper proposes efficient digital hardware design method by using fixed weight of pre-trained neural network. For this, arithmetic operations of PEs(Processing Elements) are represented with matrix-vector multiplication. The relationship of fixed weight and input data present bit-level array structure architecture which is consisted operation node. To minimize the operation node, this paper proposes node elimination method and setting common node depend on bit pattern of weight. The result of FPGA simulation shows the efficiency on hardware cost and operation speed with full precision. And proposed design method makes possibility that many PEs are implemented to on-chip.

A Study on GPU Computing of Bi-conjugate Gradient Method for Finite Element Analysis of the Incompressible Navier-Stokes Equations (유한요소 비압축성 유동장 해석을 위한 이중공액구배법의 GPU 기반 연산에 대한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Jung, Hye Dong;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.40 no.9
    • /
    • pp.597-604
    • /
    • 2016
  • A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.

Privacy-preserving and Communication-efficient Convolutional Neural Network Prediction Framework in Mobile Cloud Computing

  • Bai, Yanan;Feng, Yong;Wu, Wenyuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4345-4363
    • /
    • 2021
  • Deep Learning as a Service (DLaaS), utilizing the cloud-based deep neural network models to provide customer prediction services, has been widely deployed on mobile cloud computing (MCC). Such services raise privacy concerns since customers need to send private data to untrusted service providers. In this paper, we devote ourselves to building an efficient protocol to classify users' images using the convolutional neural network (CNN) model trained and held by the server, while keeping both parties' data secure. Most previous solutions commonly employ homomorphic encryption schemes based on Ring Learning with Errors (RLWE) hardness or two-party secure computation protocols to achieve it. However, they have limitations on large communication overheads and costs in MCC. To address this issue, we present LeHE4SCNN, a scalable privacy-preserving and communication-efficient framework for CNN-based DLaaS. Firstly, we design a novel low-expansion rate homomorphic encryption scheme with packing and unpacking methods (LeHE). It supports fast homomorphic operations such as vector-matrix multiplication and addition. Then we propose a secure prediction framework for CNN. It employs the LeHE scheme to compute linear layers while exploiting the data shuffling technique to perform non-linear operations. Finally, we implement and evaluate LeHE4SCNN with various CNN models on a real-world dataset. Experimental results demonstrate the effectiveness and superiority of the LeHE4SCNN framework in terms of response time, usage cost, and communication overhead compared to the state-of-the-art methods in the mobile cloud computing environment.

Fast Analysis of Fractal Antenna by Using FMM (FMM에 의한 프랙탈 안테나 고속 해석)

  • Kim, Yo-Sik;Lee, Kwang-Jae;Kim, Kun-Woo;Oh, Kyung-Hyun;Lee, Taek-Kyung;Lee, Jae-Wook
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.19 no.2
    • /
    • pp.121-129
    • /
    • 2008
  • In this paper, we present a fast analysis of multilayer microstrip fractal structure by using the fast multipole method (FMM). In the analysis, accurate spatial green's functions from the real-axis integration method(RAIM) are employed to solve the mixed potential integral equation(MPIE) with FMM algorithm. MoM's iteration and memory requirement is $O(N^2)$ in case of calculation using the green function. the problem is the unknown number N can be extremely large for calculation of large scale objects and high accuracy. To improve these problem is fast algorithm FMM. FMM use the addition theorem of green function. So, it reduce the complexity of a matrix-vector multiplication and reduce the cost of calculation to the order of $O(N^{1.5})$, The efficiency is proved from comparing calculation results of the moment method and Fast algorithm.