• 제목/요약/키워드: Matrix Multiplication

검색결과 167건 처리시간 0.029초

다중스레드 구조에서 함수 언어 루프의 효과적 실행 (The Efficient Execution of Functional Language Loops on the Multithreaded Architectures)

  • 하상호
    • 한국정보처리학회논문지
    • /
    • 제7권3호
    • /
    • pp.962-970
    • /
    • 2000
  • Multithreading is attractive in that it can tolerate memory latency and synchronization by effectively overlapping communication with computation. While several compiler techniques have been developed to produce multithreaded codes from functional languages programs, there still remains a lot of works to implement loops effectively. Executing lops in a style of multithreading usually causes some overheads, which can reduce severely the effect of multirheading. This paper suggests several methods in terms of architectures or compilers which can optimize loop execution by multithreading. We then simulate and analyze them for the matrix multiplication program.

  • PDF

A Study on Effect of Code Distribution and Data Replication for Multicore Computing Architectures

  • Cho, Doosan
    • International Journal of Advanced Culture Technology
    • /
    • 제9권4호
    • /
    • pp.282-287
    • /
    • 2021
  • A multicore system must be able to take full advantage of the program's instruction and data parallelism. This study introduces the data replication technique as a support technique to maximize the program's instruction and data parallelism. Instruction level parallelism can be limited by data dependency. In this case, if data is replicated to each processor core and used, instruction level parallelism can be used to the maximum. The technique proposed in this study can maximize the performance improvement effect when applied to scientific applications such as matrix multiplication operation.

Robust Constrained Predictive Control without On-line Optimizations

  • Lee, Y. I.;B. Kouvaritakis
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2001년도 ICCAS
    • /
    • pp.27.4-27
    • /
    • 2001
  • A stabilizing control method for linear systems with model uncertainties and hard input constraints is developed, which does not require on-line optimizations. This work is motivated by the constrained robust MPC(CRMPC) approach [3] which adopts the dual mode prediction strategy (i.e. free control moves and invariant set) and minimizes a worst case performance criterion. Based on the observation that, a feasible control sequence for a particular state can be found as a linear combination of feasible sequences for other states, we suggest a stabilizing control algorithm providing sub-optimal and feasible control sequences using pre-computed optimal sequences for some canonical states. The on-line computation of the proposed method reduces to simple matrix multiplication.

  • PDF

A Study on Circular Filtering in Orthogonal Transform Domain

  • Song, Bong-Seop;Lee, Sang-Uk
    • Journal of Electrical Engineering and information Science
    • /
    • 제1권2호
    • /
    • pp.125-133
    • /
    • 1996
  • In this paper, we dicuss on the properties related to the circular filtering in orthogonal transform domain. The efficient filtering schemes in six orthogonal transform domains are presented by generalizing the convolution-multiplication property of the DFT. In brief, the circular filtering can be accomplished by multiplying the transform domain filtering matrix W, which is shown to be very sparse, yielding the computational gains compared with the time domain processing. As an application, decimation and interpolation techniques in orthogonal transform domains are also investigated.

  • PDF

영상 잡음 제거를 위한 주성분 분석 기반 비 지역적 평균 알고리즘의 효율적인 공분산 행렬 계산 방법 (An Efficient Method to Compute a Covariance Matrix of the Non-local Means Algorithm for Image Denoising with the Principal Component Analysis)

  • 김정환;정제창
    • 방송공학회논문지
    • /
    • 제21권1호
    • /
    • pp.60-65
    • /
    • 2016
  • 본 논문에서는 영상에 존재하는 잡음 (noise) 들을 제거하는 방법 중 하나인 비 지역적 평균 (non-local means, NLM) 알고리즘을 먼저 소개하고 비 지역적 평균 알고리즘의 개선된 방법 중 하나인 주성분 분석 (principal component analysis, PCA) 기반의 알고리즘에 대해서도 소개한다. 주성분 분석을 활용하기 위해서는 선행적으로 공분산 행렬 (covariance matrix)을 구해야 하는데, 영상의 모든 픽셀들을 대상으로 하였을 때 이 공분산 행렬을 구하기 위해서는 큰 크기를 가지는 행렬 곱 연산이 필요하다. 만약 비 지역적 평균 알고리즘의 영상 패치 (neighborhood patch) 의 크기를 S × S = S2, 영상 전체의 픽셀 수를 Q라고 한다면 공분산 행렬을 구하기 위해서는 S2 × Q 크기의 행렬 곱 연산이 필요하게 된다. 이는 영상의 특성을 고려하면 비효율적인 연산이다. 따라서 본 논문에서는 공분산 행렬을 효율적으로 구하기 위해, 영상 패치들간의 일정 간격을 유지하면서 샘플링을 하는 방법을 제안하고자 한다. 최종적으로, 샘플링 후에는 S2 × floor (Width/l) × (Height/l) 크기를 가진 행렬의 곱 연산으로 공분산 행렬을 구할 수 있다.

Efficient Implementation of a Pseudorandom Sequence Generator for High-Speed Data Communications

  • Hwang, Soo-Yun;Park, Gi-Yoon;Kim, Dae-Ho;Jhang, Kyoung-Son
    • ETRI Journal
    • /
    • 제32권2호
    • /
    • pp.222-229
    • /
    • 2010
  • A conventional pseudorandom sequence generator creates only 1 bit of data per clock cycle. Therefore, it may cause a delay in data communications. In this paper, we propose an efficient implementation method for a pseudorandom sequence generator with parallel outputs. By virtue of the simple matrix multiplications, we derive a well-organized recursive formula and realize a pseudorandom sequence generator with multiple outputs. Experimental results show that, although the total area of the proposed scheme is 3% to 13% larger than that of the existing scheme, our parallel architecture improves the throughput by 2, 4, and 6 times compared with the existing scheme based on a single output. In addition, we apply our approach to a $2{\times}2$ multiple input/multiple output (MIMO) detector targeting the 3rd Generation Partnership Project Long Term Evolution (3GPP LTE) system. Therefore, the throughput of the MIMO detector is significantly enhanced by parallel processing of data communications.

ERROR REDUCTION FOR HIGHER DERIVATIVES OF CHEBYSHEV COLLOCATION METHOD USING PRECONDITIONSING AND DOMAIN DECOMPOSITION

  • Darvishi, M.T.;Ghoreishi, F.
    • Journal of applied mathematics & informatics
    • /
    • 제6권2호
    • /
    • pp.523-538
    • /
    • 1999
  • A new preconditioning method is investigated to reduce the roundoff error in computing derivatives using Chebyshev col-location methods(CCM). Using this preconditioning causes ration of roundoff error of preconditioning method and CCm becomes small when N gets large. Also for accuracy enhancement of differentiation we use a domain decomposition approach. Error analysis shows that for this domain decomposition method error reduces proportional to the length of subintervals. Numerical results show that using domain decomposition and preconditioning simultaneously gives super accu-rate approximate values for first derivative of the function and good approximate values for moderately high derivatives.

고유구조 지정에 의한 슬라이딩 평면 설계와 불확실한 시스템의 슬라이딩 모드 제어 (Sliding Surface Design by Eigenstructure Assignment and Sliding Mode Control of Matched Uncertain Systems)

  • 이태봉;양현석
    • 제어로봇시스템학회논문지
    • /
    • 제15권8호
    • /
    • pp.812-817
    • /
    • 2009
  • In this paper, a new method to design sliding surfaces using eigenstructure assignment is proposed. Most conventional methods for constructing the surfaces require special form like canonical or regular canonical form of system matrices. But the proposed method can be applied to arbitrary system matrices. Futhermore, the surface matrix, C can be decided for the matrix multiplication, CB to have a designated form. SVD is used to decide desirable eigenvectors explicitly. To verify the proposed algorithm, a sliding mode controller for a multivariable system with matched uncertainty is constructed. The controller is designed to guarantee minimum approach velocity to the sliding surface.

Rapid and Brief Communication GPU implementation of neural networks

  • Oh, Kyoung-Su;Jung, Kee-Chul
    • 한국HCI학회:학술대회논문집
    • /
    • 한국HCI학회 2007년도 학술대회 3부
    • /
    • pp.322-325
    • /
    • 2007
  • Graphics processing unit (GPU) is used for a faster artificial neural network. It is used to implement the matrix multiplication of a neural network to enhance the time performance of a text detection system. Preliminary results produced a 20-fold performance enhancement using an ATI RADEON 9700 PRO board. The parallelism of a GPU is fully utilized by accumulating a lot of input feature vectors and weight vectors, then converting the many inner-product operations into one matrix operation. Further research areas include benchmarking the performance with various hardware and GPU-aware learning algorithms. (c) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

불리언 행렬의 모노이드에서의 J 관계 계산 알고리즘 (Algorithm for Computing J Relations in the Monoid of Boolean Matrices)

  • 한재일
    • 한국IT서비스학회지
    • /
    • 제7권4호
    • /
    • pp.221-230
    • /
    • 2008
  • Green's relations are five equivalence relations that characterize the elements of a semigroup in terms of the principal ideals. The J relation is one of Green's relations. Although there are known algorithms that can compute Green relations, they are not useful for finding all J relations in the semigroup of all $n{\times}n$ Boolean matrices. Its computation requires multiplication of three Boolean matrices for each of all possible triples of $n{\times}n$ Boolean matrices. The size of the semigroup of all $n{\times}n$ Boolean matrices grows exponentially as n increases. It is easy to see that it involves exponential time complexity. The computation of J relations over the $5{\times}5$ Boolean matrix is left an unsolved problem. The paper shows theorems that can reduce the computation time, discusses an algorithm for efficient J relation computation whose design reflects those theorems and gives its execution results.