• 제목/요약/키워드: Matrix Multiplication

검색결과 167건 처리시간 0.03초

Secure Outsourced Computation of Multiple Matrix Multiplication Based on Fully Homomorphic Encryption

  • Wang, Shufang;Huang, Hai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권11호
    • /
    • pp.5616-5630
    • /
    • 2019
  • Fully homomorphic encryption allows a third-party to perform arbitrary computation over encrypted data and is especially suitable for secure outsourced computation. This paper investigates secure outsourced computation of multiple matrix multiplication based on fully homomorphic encryption. Our work significantly improves the latest Mishra et al.'s work. We improve Mishra et al.'s matrix encoding method by introducing a column-order matrix encoding method which requires smaller parameter. This enables us to develop a binary multiplication method for multiple matrix multiplication, which multiplies pairwise two adjacent matrices in the tree structure instead of Mishra et al.'s sequential matrix multiplication from left to right. The binary multiplication method results in a logarithmic-depth circuit, thus is much more efficient than the sequential matrix multiplication method with linear-depth circuit. Experimental results show that for the product of ten 32×32 (64×64) square matrices our method takes only several thousand seconds while Mishra et al.'s method will take about tens of thousands of years which is astonishingly impractical. In addition, we further generalize our result from square matrix to non-square matrix. Experimental results show that the binary multiplication method and the classical dynamic programming method have a similar performance for ten non-square matrices multiplication.

확률분포 생성을 통한 근사 행렬 곱셈 간소화 방법 (Probability distribution-based approximation matrix multiplication simplification algorithm)

  • 권오영;서경택
    • 한국정보통신학회논문지
    • /
    • 제26권11호
    • /
    • pp.1623-1629
    • /
    • 2022
  • 행렬 곱셈은 과학 및 공학 분야에서 널리 사용되는 기본 연산이다. 딥러닝의 학습 알고리즘에도 행렬 곱셈이 많이 사용된다. 따라서 행렬 곱셈을 효과적으로 수행하기 위한 다양한 알고리즘들 개발하고 있다. 이중 행렬 곱셈의 연산량을 줄이는 방법으로 근사 행렬 곱셈 방법이 있다. 근사 행렬 곱셈은 행렬의 열과 행을 선택하기 위한 적절한 확률 분포를 결정하고, 이 분포에 따라 행렬의 열과 행을 선택하여 근사 행렬 곱셈을 수행한다. 기존의 방법들을 행렬 곱셈에 참여하는 두 개의 행렬 A, B를 모두 고려하여 확률 분포를 생성한다. 본 논문은 행렬 A만을 대상으로 근사 행렬 곱셈에 사용될 행렬의 열과 행을 선택하는 확률 분포를 생성하는 방법을 제안하였다. 기존의 방법들과 제안된 방법들을 사용하여 1000×1000, 2000×2000, 3000×3000, 4000×4000, 5000×5000 행렬에 대하여 근사 행렬 곱셈을 수행하였다. 기존의 방법보다 제안한 방법을 적용한 근사 행렬 곱셈이 평균 0.02%에서 2.34%까지 원래 행렬 곱셈 결과에 더 근접하는 결과를 보였다.

GPU 기반 행렬 곱셈 병렬처리 알고리즘 (Parallel Algorithm for Matrix-Matrix Multiplication on the GPU)

  • 박상근
    • 융복합기술연구소 논문집
    • /
    • 제9권1호
    • /
    • pp.1-6
    • /
    • 2019
  • Matrix multiplication is a fundamental mathematical operation that has numerous applications across most scientific fields. In this paper, we presents a parallel GPU computation algorithm for dense matrix-matrix multiplication using OpenGL compute shader, which can play a very important role as a fundamental building block for many high-performance computing applications. Experimental results on NVIDIA Quad 4000 show that the proposed algorithm runs about 208 times faster than previous CPU algorithm and achieves performance of 75 GFLOPS in single precision for dense matrices with matrix size 4,096. Such performance proves that our algorithm is practical for real applications.

Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계 (A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices)

  • 권두올;정태상
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제54권12호
    • /
    • pp.723-731
    • /
    • 2005
  • The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.

A Hybrid Approach on Matrix Multiplication

  • Tolentino Maribel;Kim Myung-Kyu;Chae Soo-Hoan
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2006년도 한국컴퓨터종합학술대회 논문집 Vol.33 No.1 (A)
    • /
    • pp.400-402
    • /
    • 2006
  • Matrix multiplication is an important problem in linear algebra. its main significance for combinatorial algorithms is its equivalence to a variety of other problems, such as transitive closure and reduction, solving linear systems, and matrix inversion. Thus the development of high-performance matrix multiplication implies faster algorithms for all of these problems. In this paper. we present a quantitative comparison of the theoretical and empirical performance of key matrix multiplication algorithms and use our analysis to develop a faster algorithm. We propose a Hybrid approach on Winograd's and Strassen's algorithms that improves the performance and discuss the performance of the hybrid Winograd-Strassen algorithm. Since Strassen's algorithm is based on a $2{\times}2$ matrix multiplication it makes the implementation very slow for larger matrix because of its recursive nature. Though we cannot get the theoretical threshold value of Strassen's algorithm, so we determine the threshold to optimize the use of Strassen's algorithm in nodes through various experiments and provided a summary shown in a table and graphs.

  • PDF

GPU와 지역성을 이용한 행렬 곱셈 가속 (Matrix Multiplication Acceleration with GPU and Locality)

  • 권오영;이창묵
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2009년도 추계학술대회
    • /
    • pp.902-903
    • /
    • 2009
  • 행렬 곱셈은 과학 및 공학분야에 다양하게 응용되고 있다. 행렬 곱셈의 경우 지역성을 활용하면 수행 성능을 크게 개선할 수 있다. GPU가 장착된 PC에서 CPU의 컴퓨팅 능력과 GPU의 컴퓨팅 능력을 같이 활용하여 행렬 곱셈을 가속하는 방법을 제시하였다. 제안된 방법이 GPU만을 사용하는 것보다 약 15%~30%의 성능을 향상시켰다.

  • PDF

2D Mesh SIMD 구조에서의 병렬 행렬 곱셈의 수치적 성능 분석 (An Analytical Evaluation of 2D Mesh-connected SIMD Architecture for Parallel Matrix Multiplication)

  • 김정길
    • 정보통신설비학회논문지
    • /
    • 제10권1호
    • /
    • pp.7-13
    • /
    • 2011
  • Matrix multiplication is a fundamental operation of linear algebra and arises in many areas of science and engineering. This paper introduces an efficient parallel matrix multiplication scheme on N ${\times}$ N mesh-connected SIMD array processor, called multiple hierarchical SIMD architecture (HMSA). The architectural characteristic of HMSA is the hierarchically structured control units which consist of a global control unit, N local control units configured diagonally, and $N^2$ processing elements (PEs) arranged in an N ${\times}$ N array. PEs are communicating through local buses connecting four adjacent neighbor PEs in mesh-torus networks and global buses running across the rows and columns called horizontal buses and vertical buses, respectively. This architecture enables HMSA to have the features of diagonally indexed concurrent broadcast and the accessibility to either rows (row control mode) or columns (column control mode) of 2D array PEs alternately. An algorithmic mapping method is used for performance evaluation by mapping matrix multiplication on the proposed architecture. The asymptotic time complexities of them are evaluated and the result shows that paralle matrix multiplication on HMSA can provide significant performance improvement.

  • PDF

모든 m$\times$k 불리언 행렬과의 효율적 곱셈에 관한 연구 (A Study on the Efficient Multiplication with All m$\times$k Boolean Matrices)

  • 한재일
    • 한국콘텐츠학회논문지
    • /
    • 제6권2호
    • /
    • pp.27-33
    • /
    • 2006
  • 불리언 행렬은 다양한 분야에 응용되어 유용하게 사용되고 있으며 불리언 행렬에 대한 많은 연구가 수행되었다 대부분의 연구에서는 불리언 행렬의 곱셈을 다루고 있으나 모두 두 불리언 행렬 사이의 곱셈에 관심을 두고 있으며 다수의 n$\times$m 불리언 행렬과 모든 m$\times$k불리언 행렬 사이의 곱셈은 극히 소수의 연구에서 보이고 있다. 본 논문은 기존에 제시된 두 불리언 행렬의 최적 곱셈 알고리즘이 모든 불리언행렬에 대한 곱셈을 해야 하는 경우 부적합함을 보이고 n$\times$m 불리언 행렬과 모든 m$\times$k 불리언 행렬의 곱셈을 효율적으로 계산할 수 있는 이론을 정립한 후 이를 적용한 불리언 행렬 곱셈의 실행결과에 대하여 논한다.

  • PDF

D-클래스 계산을 위한 불리언 행렬의 효율적 곱셈 및 알고리즘 (Efficient Multiplication of Boolean Matrices and Algorithm for D-Class Computation)

  • 한재일;신범주
    • 한국산업정보학회논문지
    • /
    • 제12권2호
    • /
    • pp.68-78
    • /
    • 2007
  • D-클래스는 주어진 동치관계(equivalence relation)에 있는 $n{\times}n$ 불리언 행렬의 집합으로 정의된다. D-클래스 계산은 $n{\times}n$ 불리언 행렬의 전체 집합을 대상으로 이 집합에서 조합할 수 있는 모든 세 불리언 행렬 사이의 곱셈을 요구한다. 그러나 불리언 행렬에 대한 대부분의 연구는 단지 두 개의 불리언 행렬에 대한 효율적인 곱셈에 집중되었으며 모든 불리언 행렬 사이의 곱셈에 대한 연구는 최근에야 소수가 보이고 있다. 본 논문은 모든 세 개의 불리언 행렬 곱셈과 모든 D-클래스를 보다 효율적으로 계산할 수 있는 이론을 제시하고 이를 적용한 알고리즘과 실행결과에 대하여 논한다.

  • PDF

FPGA기반 뉴럴네트워크 가속기에서 2차 타일링 기반 행렬 곱셈 최적화 (Optimizing 2-stage Tiling-based Matrix Multiplication in FPGA-based Neural Network Accelerator)

  • 권진세;이제민;권용인;박제만;유미선;김태호;김형신
    • 대한임베디드공학회논문지
    • /
    • 제17권6호
    • /
    • pp.367-374
    • /
    • 2022
  • The acceleration of neural networks has become an important topic in the field of computer vision. An accelerator is absolutely necessary for accelerating the lightweight model. Most accelerator-supported operators focused on direct convolution operations. If the accelerator does not provide GEMM operation, it is mostly replaced by CPU operation. In this paper, we proposed an optimization technique for 2-stage tiling-based GEMM routines on VTA. We improved performance of the matrix multiplication routine by maximizing the reusability of the input matrix and optimizing the operation pipelining. In addition, we applied the proposed technique to the DarkNet framework to check the performance improvement of the matrix multiplication routine. The proposed GEMM method showed a performance improvement of more than 2.4 times compared to the non-optimized GEMM method. The inference performance of our DarkNet framework has also improved by at least 2.3 times.