• Title/Summary/Keyword: Matrix Multiplication

Search Result 167, Processing Time 0.03 seconds

The Efficient Execution of Functional Language Loops on the Multithreaded Architectures (다중스레드 구조에서 함수 언어 루프의 효과적 실행)

  • Ha, Sang-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.3
    • /
    • pp.962-970
    • /
    • 2000
  • Multithreading is attractive in that it can tolerate memory latency and synchronization by effectively overlapping communication with computation. While several compiler techniques have been developed to produce multithreaded codes from functional languages programs, there still remains a lot of works to implement loops effectively. Executing lops in a style of multithreading usually causes some overheads, which can reduce severely the effect of multirheading. This paper suggests several methods in terms of architectures or compilers which can optimize loop execution by multithreading. We then simulate and analyze them for the matrix multiplication program.

  • PDF

A Study on Effect of Code Distribution and Data Replication for Multicore Computing Architectures

  • Cho, Doosan
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.282-287
    • /
    • 2021
  • A multicore system must be able to take full advantage of the program's instruction and data parallelism. This study introduces the data replication technique as a support technique to maximize the program's instruction and data parallelism. Instruction level parallelism can be limited by data dependency. In this case, if data is replicated to each processor core and used, instruction level parallelism can be used to the maximum. The technique proposed in this study can maximize the performance improvement effect when applied to scientific applications such as matrix multiplication operation.

Robust Constrained Predictive Control without On-line Optimizations

  • Lee, Y. I.;B. Kouvaritakis
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.27.4-27
    • /
    • 2001
  • A stabilizing control method for linear systems with model uncertainties and hard input constraints is developed, which does not require on-line optimizations. This work is motivated by the constrained robust MPC(CRMPC) approach [3] which adopts the dual mode prediction strategy (i.e. free control moves and invariant set) and minimizes a worst case performance criterion. Based on the observation that, a feasible control sequence for a particular state can be found as a linear combination of feasible sequences for other states, we suggest a stabilizing control algorithm providing sub-optimal and feasible control sequences using pre-computed optimal sequences for some canonical states. The on-line computation of the proposed method reduces to simple matrix multiplication.

  • PDF

A Study on Circular Filtering in Orthogonal Transform Domain

  • Song, Bong-Seop;Lee, Sang-Uk
    • Journal of Electrical Engineering and information Science
    • /
    • v.1 no.2
    • /
    • pp.125-133
    • /
    • 1996
  • In this paper, we dicuss on the properties related to the circular filtering in orthogonal transform domain. The efficient filtering schemes in six orthogonal transform domains are presented by generalizing the convolution-multiplication property of the DFT. In brief, the circular filtering can be accomplished by multiplying the transform domain filtering matrix W, which is shown to be very sparse, yielding the computational gains compared with the time domain processing. As an application, decimation and interpolation techniques in orthogonal transform domains are also investigated.

  • PDF

An Efficient Method to Compute a Covariance Matrix of the Non-local Means Algorithm for Image Denoising with the Principal Component Analysis (영상 잡음 제거를 위한 주성분 분석 기반 비 지역적 평균 알고리즘의 효율적인 공분산 행렬 계산 방법)

  • Kim, Jeonghwan;Jeong, Jechang
    • Journal of Broadcast Engineering
    • /
    • v.21 no.1
    • /
    • pp.60-65
    • /
    • 2016
  • This paper introduces the non-local means (NLM) algorithm for image denoising, and also introduces an improved algorithm which is based on the principal component analysis (PCA). To do the PCA, a covariance matrix of a given image should be evaluated first. If we let the size of neighborhood patches of the NLM S × S2, and let the number of pixels Q, a matrix multiplication of the size S2 × Q is required to compute a covariance matrix. According to the characteristic of images, such computation is inefficient. Therefore, this paper proposes an efficient method to compute the covariance matrix by sampling the pixels. After sampling, the covariance matrix can be computed with matrices of the size S2 × floor (Width/l) × (Height/l).

Efficient Implementation of a Pseudorandom Sequence Generator for High-Speed Data Communications

  • Hwang, Soo-Yun;Park, Gi-Yoon;Kim, Dae-Ho;Jhang, Kyoung-Son
    • ETRI Journal
    • /
    • v.32 no.2
    • /
    • pp.222-229
    • /
    • 2010
  • A conventional pseudorandom sequence generator creates only 1 bit of data per clock cycle. Therefore, it may cause a delay in data communications. In this paper, we propose an efficient implementation method for a pseudorandom sequence generator with parallel outputs. By virtue of the simple matrix multiplications, we derive a well-organized recursive formula and realize a pseudorandom sequence generator with multiple outputs. Experimental results show that, although the total area of the proposed scheme is 3% to 13% larger than that of the existing scheme, our parallel architecture improves the throughput by 2, 4, and 6 times compared with the existing scheme based on a single output. In addition, we apply our approach to a $2{\times}2$ multiple input/multiple output (MIMO) detector targeting the 3rd Generation Partnership Project Long Term Evolution (3GPP LTE) system. Therefore, the throughput of the MIMO detector is significantly enhanced by parallel processing of data communications.

ERROR REDUCTION FOR HIGHER DERIVATIVES OF CHEBYSHEV COLLOCATION METHOD USING PRECONDITIONSING AND DOMAIN DECOMPOSITION

  • Darvishi, M.T.;Ghoreishi, F.
    • Journal of applied mathematics & informatics
    • /
    • v.6 no.2
    • /
    • pp.523-538
    • /
    • 1999
  • A new preconditioning method is investigated to reduce the roundoff error in computing derivatives using Chebyshev col-location methods(CCM). Using this preconditioning causes ration of roundoff error of preconditioning method and CCm becomes small when N gets large. Also for accuracy enhancement of differentiation we use a domain decomposition approach. Error analysis shows that for this domain decomposition method error reduces proportional to the length of subintervals. Numerical results show that using domain decomposition and preconditioning simultaneously gives super accu-rate approximate values for first derivative of the function and good approximate values for moderately high derivatives.

Sliding Surface Design by Eigenstructure Assignment and Sliding Mode Control of Matched Uncertain Systems (고유구조 지정에 의한 슬라이딩 평면 설계와 불확실한 시스템의 슬라이딩 모드 제어)

  • Lee, Tae-Bong;Yang, Hyun-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.8
    • /
    • pp.812-817
    • /
    • 2009
  • In this paper, a new method to design sliding surfaces using eigenstructure assignment is proposed. Most conventional methods for constructing the surfaces require special form like canonical or regular canonical form of system matrices. But the proposed method can be applied to arbitrary system matrices. Futhermore, the surface matrix, C can be decided for the matrix multiplication, CB to have a designated form. SVD is used to decide desirable eigenvectors explicitly. To verify the proposed algorithm, a sliding mode controller for a multivariable system with matched uncertainty is constructed. The controller is designed to guarantee minimum approach velocity to the sliding surface.

Rapid and Brief Communication GPU implementation of neural networks

  • Oh, Kyoung-Su;Jung, Kee-Chul
    • 한국HCI학회:학술대회논문집
    • /
    • 2007.02c
    • /
    • pp.322-325
    • /
    • 2007
  • Graphics processing unit (GPU) is used for a faster artificial neural network. It is used to implement the matrix multiplication of a neural network to enhance the time performance of a text detection system. Preliminary results produced a 20-fold performance enhancement using an ATI RADEON 9700 PRO board. The parallelism of a GPU is fully utilized by accumulating a lot of input feature vectors and weight vectors, then converting the many inner-product operations into one matrix operation. Further research areas include benchmarking the performance with various hardware and GPU-aware learning algorithms. (c) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

Algorithm for Computing J Relations in the Monoid of Boolean Matrices (불리언 행렬의 모노이드에서의 J 관계 계산 알고리즘)

  • Han, Jae-Il
    • Journal of Information Technology Services
    • /
    • v.7 no.4
    • /
    • pp.221-230
    • /
    • 2008
  • Green's relations are five equivalence relations that characterize the elements of a semigroup in terms of the principal ideals. The J relation is one of Green's relations. Although there are known algorithms that can compute Green relations, they are not useful for finding all J relations in the semigroup of all $n{\times}n$ Boolean matrices. Its computation requires multiplication of three Boolean matrices for each of all possible triples of $n{\times}n$ Boolean matrices. The size of the semigroup of all $n{\times}n$ Boolean matrices grows exponentially as n increases. It is easy to see that it involves exponential time complexity. The computation of J relations over the $5{\times}5$ Boolean matrix is left an unsolved problem. The paper shows theorems that can reduce the computation time, discusses an algorithm for efficient J relation computation whose design reflects those theorems and gives its execution results.