• Title/Summary/Keyword: Matrix Multiplication

Search Result 169, Processing Time 0.019 seconds

Implementation of Neural Network Accelerator for Rendering Noise Reduction on OpenCL (OpenCL을 이용한 랜더링 노이즈 제거를 위한 뉴럴 네트워크 가속기 구현)

  • Nam, Kihun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.4 no.4
    • /
    • pp.373-377
    • /
    • 2018
  • In this paper, we propose an implementation of a neural network accelerator for reducing the rendering noise using OpenCL. Among the rendering algorithms, we selects a ray tracing to assure a high quality graphics. Ray tracing rendering uses ray to render, less use of the ray will result in noise. Ray used more will produce a higher quality image but will take operation time longer. To reduce operation time whiles using fewer rays, Learning Base Filtering algorithm using neural network was applied. it's not always produce optimize result. In this paper, a new approach to Matrix Multiplication that is based on General Matrix Multiplication for improved performance. The development environment, we used specialized in high speed parallel processing of OpenCL. The proposed architecture was verified using Kintex UltraScale XKU6909T-2FDFG1157C FPGA board. The time it takes to calculate the parameters is about 1.12 times fast than that of Verilog-HDL structure.

Resolving Memory Bottlenecks in Hardware Accelerators with Data Prefetch

  • Hyein Lee;Jinoo Joung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.1-12
    • /
    • 2024
  • Deep learning with faster and more accurate results requires large amounts of storage space and large computations. Accordingly, many studies are using hardware accelerators for quick and accurate calculations. However, the performance bottleneck is due to data movement between the hardware accelerators and the CPU. In this paper, we propose a data prefetch strategy that can efficiently reduce such operational bottlenecks. The core idea of the data prefetch strategy is to predict the data needed for the next task and upload it to local memory while the hardware accelerator (Matrix Multiplication Unit, MMU) performs a task. This strategy can be enhanced by using a dual buffer to perform read and write operations simultaneously. This reduces latency and execution time of data transfer. Through simulations, we demonstrate a 24% improvement in the performance of hardware accelerators by maximizing parallel processing with dual buffers and bottlenecks between memories with data prefetch.

Alternative Optimal Threshold Criteria: MFR (대안적인 분류기준: 오분류율곱)

  • Hong, Chong Sun;Kim, Hyomin Alex;Kim, Dong Kyu
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.5
    • /
    • pp.773-786
    • /
    • 2014
  • We propose the multiplication of false rates (MFR) which is a classification accuracy criteria and an area type of rectangle from ROC curve. Optimal threshold obtained using MFR is compared with other criteria in terms of classification performance. Their optimal thresholds for various distribution functions are also found; consequently, some properties and advantages of MFR are discussed by comparing FNR and FPR corresponding to optimal thresholds. Based on general cost function, cost ratios of optimal thresholds are computed using various classification criteria. The cost ratios for cost curves are observed so that the advantages of MFR are explored. Furthermore, the de nition of MFR is extended to multi-dimensional ROC analysis and the relations of classification criteria are also discussed.

Efficient short-length running convolution algorithm using filter banks (필터 뱅크를 사용한 효율적인 short-length running convolution 알고리즘)

  • Jang Young-Beom;Oh Se-Man;Lee Won-Sang
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.6
    • /
    • pp.187-194
    • /
    • 2005
  • In this paper, an efficient and fast algerian to reduce calculation amount of FIR(Finite Impulse Responses) filtering is proposed. Proposed algorithm enables arbitrary size of parallel processing, and their structures are also easily derived. Furthermore, it is shown that the number of multiplication/sample is reduced, and number of instructions using MAC(Multiplication and Accumulation) processor are also reduced. For theoretical improvement numbers of sub filters are compared with those of conventional algorithm. In addition to the theoretical improvement, it is shown that number of element for hardwired implementation are reduced comparison to those of the conventional algorithm.

Geometry Optimization of Dispersed U-Mo Fuel for Light Water Reactors

  • Ondrej Novak;Pavel Suk;Dusan Kobylka;Martin Sevecek
    • Nuclear Engineering and Technology
    • /
    • v.55 no.9
    • /
    • pp.3464-3471
    • /
    • 2023
  • The Uranium/Molybdenum metallic fuel has been proposed as promising advanced fuel concept especially in the dispersed fuel geometry. The fuel is manufactured in the form of small fuel droplets (particles) placed in a fuel pin covered by a matrix. In addition to fuel particles, the pin contains voids necessary to compensate material swelling and release of fission gases from the fuel particles. When investigating this advanced fuel design, two important questions were raised. Can the dispersed fuel performance be analyzed using homogenization without significant inaccuracy and what size of fuel drops should be used for the fuel design to achieve optimal utilization? To answer, 2D burnup calculations of fuel assemblies with different fuel particle sizes were performed. The analysis was supported by an additional 3D fuel pin calculation with the dispersed fuel particle size variations. The results show a significant difference in the multiplication factor between the homogenized calculation and the detailed calculation with precise fuel particle geometry. The recommended fuel particle size depends on the final burnup to be achieved. As shown in the results, for lower burnup levels, larger fuel drops offer better multiplication factor. However, when higher burnup levels are required, then smaller fuel drops perform better.

A Study on the Convergence Characteristics Improvement of the Modified-Multiplication Free Adaptive Filer (변형 비적 적응 필터의 수렴 특성 개선에 관한 연구)

  • 김건호;윤달환;임제탁
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.6
    • /
    • pp.815-823
    • /
    • 1993
  • In this paper, the structure of modified multiplication-free adaptive filter(M-MADF) and convergence analysis are presented. To evaluate the performance of proposed M-MADF algorithm, fractionally spaced equalizer (FSE) is used. The input signals are quantized using DPCM and the reference signals is processed using a first-order linear prediction filter, and the outputs are processed by a conventional adaptive filter. The filter coefficients are updated using the Sign algorithm. Under the assumption that the primary and reference signals are zero mean, wide-sense stationary and Gaussian, theoretical results for the coefficient misalignment vector and its autocorrelation matrix of the filter are driven. The convergence properties of Sign. MADF and M-MADF algorithm for updating of the coefficients of a digital filter of the fractionally spaced equalizer (FSE) are investigated and compared with one another. The convergence properties are characterized by the steady state error and the convergence speed. It is shown that the convergence speed of M-MADF is almost same as Sign algorithm and is faster that MADF in the condition of same steady error. Especially it is very useful for high correlated signals.

  • PDF

On Implementations of Algorithms for Fast Generation of Normal Bases and Low Cost Arithmetics over Finite Fields (유한체위에서 정규기저의 고속생성과 저비용 연산 알고리즘의 구현에 관한 연구)

  • Kim, Yong-Tae
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.4
    • /
    • pp.621-628
    • /
    • 2017
  • The efficiency of implementation of the arithmetic operations in finite fields depends on the choice representation of elements of the field. It seems that from this point of view normal bases are the most appropriate, since raising to the power 2 in $GF(2^n)$ of characteristic 2 is reduced in these bases to a cyclic shift of the coordinates. We, in this paper, introduce our algorithm to transform fastly the conventional bases to normal bases and present the result of H/W implementation using the algorithm. We also propose our algorithm to calculate the multiplication and inverse of elements with respect to normal bases in $GF(2^n)$ and present the programs and the results of H/W implementations using the algorithm.

Pole Placement Method of a Double Poles Using LQ Control and Pole's Moving-Range (LQ 제어와 근의 이동범위를 이용한 중근의 극배치 방법)

  • Park, Minho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.1
    • /
    • pp.20-27
    • /
    • 2020
  • In general, a nonlinear system is linearized in the form of a multiplication of the 1st and 2nd order system. This paper reports a design method of a weighting matrix and control law of LQ control to move the double poles that have a Jordan block to a pair of complex conjugate poles. This method has the advantages of pole placement and the guarantee of stability, but this method cannot position the poles correctly, and the matrix is chosen using a trial and error method. Therefore, a relation function (𝜌, 𝜃) between the poles and the matrix was derived under the condition that the poles are the roots of the characteristic equation of the Hamiltonian system. In addition, the Pole's Moving-range was obtained under the condition that the state weighting matrix becomes a positive semi-definite matrix. This paper presents examples of how the matrix and control law is calculated.

PCB Board Impedance Analysis Using Similarity Transform for Transmission Matrix (전송선로행열에 대한 유사변환을 이용한 PCB기판 임피던스 해석)

  • Suh, Young-Suk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.10
    • /
    • pp.2052-2058
    • /
    • 2009
  • As the operating frequency of digital system increases and voltage swing decreases, an accurate and high speed analysis of PCB board becomes very important. Transmission matrix method, which use the multiple products of unit column matrix, is the highest speedy method in PCB board analysis. In this paper a new method to reduce the calculation time of PCB board impedances is proposed. First, in this method the eigenvalue and eigenvectors of the transmission matrix for unit column of PCB are calculated and the transmission matrix for the unit column is transformed using similarity transform to reduce the number of multiplication on the matrix elements. This method using the similarity transform can reduce the calculation time greatly comparing the previous method. The proposed method is applied to the 1.3 inch by 1.9 inch board and shows about 10 times reduction of calculation time. This method can be applied to the PCB design which needs a lots of repetitive calculation of board impedances.

An Efficient Computation of Matrix Triple Products (삼중 행렬 곱셈의 효율적 연산)

  • Im, Eun-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.3
    • /
    • pp.141-149
    • /
    • 2006
  • In this paper, we introduce an improved algorithm for computing matrix triple product that commonly arises in primal-dual optimization method. In computing $P=AHA^{t}$, we devise a single pass algorithm that exploits the block diagonal structure of the matrix H. This one-phase scheme requires fewer floating point operations and roughly half the memory of the generic two-phase algorithm, where the product is computed in two steps, computing first $Q=HA^{t}$ and then P=AQ. The one-phase scheme achieved speed-up of 2.04 on Intel Itanium II platform over the two-phase scheme. Based on memory latency and modeled cache miss rates, the performance improvement was evaluated through performance modeling. Our research has impact on performance tuning study of complex sparse matrix operations, while most of the previous work focused on performance tuning of basic operations.

  • PDF