• Title/Summary/Keyword: Matrix Multiplication

Search Result 167, Processing Time 0.021 seconds

New Memristor-Based Crossbar Array Architecture with 50-% Area Reduction and 48-% Power Saving for Matrix-Vector Multiplication of Analog Neuromorphic Computing

  • Truong, Son Ngoc;Min, Kyeong-Sik
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.3
    • /
    • pp.356-363
    • /
    • 2014
  • In this paper, we propose a new memristor-based crossbar array architecture, where a single memristor array and constant-term circuit are used to represent both plus-polarity and minus-polarity matrices. This is different from the previous crossbar array architecture which has two memristor arrays to represent plus-polarity and minus-polarity connection matrices, respectively. The proposed crossbar architecture is tested and verified to have the same performance with the previous crossbar architecture for applications of character recognition. For areal density, however, the proposed crossbar architecture is twice better than the previous architecture, because only single memristor array is used instead of two crossbar arrays. Moreover, the power consumption of the proposed architecture can be smaller by 48% than the previous one because the number of memristors in the proposed crossbar architecture is reduced to half compared to the previous crossbar architecture. From the high areal density and high energy efficiency, we can know that this newly proposed crossbar array architecture is very suitable to various applications of analog neuromorphic computing that demand high areal density and low energy consumption.

Construction of Highly Performance Switching Circuit (고효율 스위칭회로)

  • Park, Chun-Myoung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.88-93
    • /
    • 2016
  • This paper presents a method of constructing the highly performance switching circuit(HPSC) over finite fields. The proposed method is as following. First of all, we extract the input/output relationship of linear characteristics for the given digital switching functions, Next, we convert the input/output relationship to Directed Cyclic Graph using basic gates adder and coefficient multiplier that are defined by mathematical properties in finite fields. Also, we propose the new factorization method for matrix characteristics equation that represent the relationship of the input/output characteristics. The proposed method have properties of generalization and regularity. Also, the proposed method is possible to any prime number multiplication expression.

High-throughput and low-area implementation of orthogonal matching pursuit algorithm for compressive sensing reconstruction

  • Nguyen, Vu Quan;Son, Woo Hyun;Parfieniuk, Marek;Trung, Luong Tran Nhat;Park, Sang Yoon
    • ETRI Journal
    • /
    • v.42 no.3
    • /
    • pp.376-387
    • /
    • 2020
  • Massive computation of the reconstruction algorithm for compressive sensing (CS) has been a major concern for its real-time application. In this paper, we propose a novel high-speed architecture for the orthogonal matching pursuit (OMP) algorithm, which is the most frequently used to reconstruct compressively sensed signals. The proposed design offers a very high throughput and includes an innovative pipeline architecture and scheduling algorithm. Least-squares problem solving, which requires a huge amount of computations in the OMP, is implemented by using systolic arrays with four new processing elements. In addition, a distributed-arithmetic-based circuit for matrix multiplication is proposed to counterbalance the area overhead caused by the multi-stage pipelining. The results of logic synthesis show that the proposed design reconstructs signals nearly 19 times faster while occupying an only 1.06 times larger area than the existing designs for N = 256, M = 64, and m = 16, where N is the number of the original samples, M is the length of the measurement vector, and m is the sparsity level of the signal.

A Study on GPGPU Performance Improvement Technique on GCN Architecture Using OpenCL API (GCN 아키텍쳐 상에서의 OpenCL을 이용한 GPGPU 성능향상 기법 연구)

  • Woo, DongHee;Kim, YoonHo
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.1
    • /
    • pp.37-45
    • /
    • 2018
  • The current system upon which a variety of programs are in operation has continuously expanded its domain from conventional single-core and multi-core system to many-core and heterogeneous system. However, existing researches have focused mostly on parallelizing programs based CUDA framework and rarely on AMD based GCN-GPU optimization. In light of the aforementioned problems, our study focuses on the optimization techniques of the GCN architecture in a GPGPU environment and achieves a performance improvement. Specifically, by using performance techniques we propose, we have reduced more then 30% of the computation time of matrix multiplication and convolution algorithm in GPGPU. Also, we increase the kernel throughput by more then 40%.

Compression of 3D Mesh Geometry and Vertex Attributes for Mobile Graphics

  • Lee, Jong-Seok;Choe, Sung-Yul;Lee, Seung-Yong
    • Journal of Computing Science and Engineering
    • /
    • v.4 no.3
    • /
    • pp.207-224
    • /
    • 2010
  • This paper presents a compression scheme for mesh geometry, which is suitable for mobile graphics. The main focus is to enable real-time decoding of compressed vertex positions while providing reasonable compression ratios. Our scheme is based on local quantization of vertex positions with mesh partitioning. To prevent visual seams along the partitioning boundaries, we constrain the locally quantized cells of all mesh partitions to have the same size and aligned local axes. We propose a mesh partitioning algorithm to minimize the size of locally quantized cells, which relates to the distortion of a restored mesh. Vertex coordinates are stored in main memory and transmitted to graphics hardware for rendering in the quantized form, saving memory space and system bus bandwidth. Decoding operation is combined with model geometry transformation, and the only overhead to restore vertex positions is one matrix multiplication for each mesh partition. In our experiments, a 32-bit floating point vertex coordinate is quantized into an 8-bit integer, which is the smallest data size supported in a mobile graphics library. With this setting, the distortions of the restored meshes are comparable to 11-bit global quantization of vertex coordinates. We also apply the proposed approach to compression of vertex attributes, such as vertex normals and texture coordinates, and show that gains similar to vertex geometry can be obtained through local quantization with mesh partitioning.

Refined fixed granularity algorithm on Networks of Workstations (NOW 환경에서 개선된 고정 분할 단위 알고리즘)

  • Gu, Bon-Geun
    • The KIPS Transactions:PartA
    • /
    • v.8A no.2
    • /
    • pp.117-124
    • /
    • 2001
  • At NOW (Networks Of Workstations), the load sharing is very important role for improving the performance. The known load sharing strategy is fixed-granularity, variable-granularity and adaptive-granularity. The variable-granularity algorithm is sensitive to the various parameters. But Send algorithm, which implements the fixed-granularity strategy, is robust to task granularity. And the performance difference between Send and variable-granularity algorithm is not substantial. But, in Send algorithm, the computing time and the communication time are not overlapped. Therefore, long latency time at the network has influence on the execution time of the parallel program. In this paper, we propose the preSend algorithm. In the preSend algorithm, the master node can send the data to the slave nodes in advance without the waiting for partial results from the slaves. As the master node sent the next data to the slaves in advance, the slave nodes can process the data without the idle time. As stated above, the preSend algorithm can overlap the computing time and the communication time. Therefore we reduce the influence of the long latency time at the network and the execution time of the parallel program on the NOW. To compare the execution time of two algorithms, we use the $320{\times}320$ matrix multiplication. The comparison results of execution times show that the preSend algorithm has the shorter execution time than the Send algorithm.

  • PDF

Bit-level Array Structure Representation of Weight and Optimization Method to Design Pre-Trained Neural Network (학습된 신경망 설계를 위한 가중치의 비트-레벨 어레이 구조 표현과 최적화 방법)

  • Lim, Guk-Chan;Kwak, Woo-Young;Lee, Hyun-Soo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.9
    • /
    • pp.37-44
    • /
    • 2002
  • This paper proposes efficient digital hardware design method by using fixed weight of pre-trained neural network. For this, arithmetic operations of PEs(Processing Elements) are represented with matrix-vector multiplication. The relationship of fixed weight and input data present bit-level array structure architecture which is consisted operation node. To minimize the operation node, this paper proposes node elimination method and setting common node depend on bit pattern of weight. The result of FPGA simulation shows the efficiency on hardware cost and operation speed with full precision. And proposed design method makes possibility that many PEs are implemented to on-chip.

Improved Computation of L-Classes for Efficient Computation of J Relations (효율적인 J 관계 계산을 위한 L 클래스 계산의 개선)

  • Han, Jae-Il;Kim, Young-Man
    • Journal of Information Technology Services
    • /
    • v.9 no.4
    • /
    • pp.219-229
    • /
    • 2010
  • The Green's equivalence relations have played a fundamental role in the development of semigroup theory. They are concerned with mutual divisibility of various kinds, and all of them reduce to the universal equivalence in a group. Boolean matrices have been successfully used in various areas, and many researches have been performed on them. Studying Green's relations on a monoid of boolean matrices will reveal important characteristics about boolean matrices, which may be useful in diverse applications. Although there are known algorithms that can compute Green relations, most of them are concerned with finding one equivalence class in a specific Green's relation and only a few algorithms have been appeared quite recently to deal with the problem of finding the whole D or J equivalence relations on the monoid of all $n{\times}n$ Boolean matrices. However, their results are far from satisfaction since their computational complexity is exponential-their computation requires multiplication of three Boolean matrices for each of all possible triples of $n{\times}n$ Boolean matrices and the size of the monoid of all $n{\times}n$ Boolean matrices grows exponentially as n increases. As an effort to reduce the execution time, this paper shows an isomorphism between the R relation and L relation on the monoid of all $n{\times}n$ Boolean matrices in terms of transposition. introduces theorems based on it discusses an improved algorithm for the J relation computation whose design reflects those theorems and gives its execution results.

A Construction of the Linear Digital Switching Function over Finite Fields (유한체상에서의 선형디지털스위칭함수 구성)

  • Park, Chun-Myoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.12
    • /
    • pp.2201-2206
    • /
    • 2008
  • This paper presents a method of constructing the Linear Digital Switching Function(LDSF) over finite fields. The proposed method is as following. First of all, we extract the input/output relationship of linear characteristics for the given digital switching functions, Next, we convert the input/output relationship to Directed Cyclic Graph(DCG) using basic gates adder and coefficient multiplier that are defined by mathematical properties in finite fields. Also, we propose the new factorization method for matrix characteristics equation that represent the relationship of the input/output characteristics. The proposed method have properties of generalization and regularity. Also, the proposed method is possible to any prime number multiplication expression.

An Acoustic Echo Canceler for Hands-Free Telephony, Considering Double Talk and Environment Noise (동시통화 및 주변 잡음을 고려한 핸즈프리 환경의 반향제거기)

  • Kim, Hyun-tae;Lee, Chan-Hee;Park, Jang-sik
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.471-473
    • /
    • 2009
  • In this paper, we propose a double talk and noise robust acoustic echo canceler for hands-free telephony applications. The proposed system includes a double-talk detection method that detects the double-talk durations, which uses covariance between microphone input signa and estimated microphone input signal. And proposed adaptive algorithm for estimating acoustic echo path, uses normalized auto-covariance matrix of input signal with multiplication of residual error power and projection order of AP(affine projeciton) algorithm. It is confirmed that the proposed algorithm shows better performance from acoustic interference cancellation (AIC) viewpoint in double talk and noisy environments.

  • PDF