• 제목/요약/키워드: Matrix Multiplication

검색결과 167건 처리시간 0.022초

New Memristor-Based Crossbar Array Architecture with 50-% Area Reduction and 48-% Power Saving for Matrix-Vector Multiplication of Analog Neuromorphic Computing

  • Truong, Son Ngoc;Min, Kyeong-Sik
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제14권3호
    • /
    • pp.356-363
    • /
    • 2014
  • In this paper, we propose a new memristor-based crossbar array architecture, where a single memristor array and constant-term circuit are used to represent both plus-polarity and minus-polarity matrices. This is different from the previous crossbar array architecture which has two memristor arrays to represent plus-polarity and minus-polarity connection matrices, respectively. The proposed crossbar architecture is tested and verified to have the same performance with the previous crossbar architecture for applications of character recognition. For areal density, however, the proposed crossbar architecture is twice better than the previous architecture, because only single memristor array is used instead of two crossbar arrays. Moreover, the power consumption of the proposed architecture can be smaller by 48% than the previous one because the number of memristors in the proposed crossbar architecture is reduced to half compared to the previous crossbar architecture. From the high areal density and high energy efficiency, we can know that this newly proposed crossbar array architecture is very suitable to various applications of analog neuromorphic computing that demand high areal density and low energy consumption.

고효율 스위칭회로 (Construction of Highly Performance Switching Circuit)

  • 박춘명
    • 전자공학회논문지
    • /
    • 제53권12호
    • /
    • pp.88-93
    • /
    • 2016
  • 본 논문에서는 유한체의 수학적 성질과 그래프이론을 바탕으로 GF(P)상의 선형디지털스위칭함수구성을 효과적으로 구성하는 한가지 방법을 제안하였다. 제안한 방법은 주어진 임의의 디지털스위칭함수의 입출력 사이의 연관관계특성으로 부터 DCG를 도출한 후에 노드의 개수를 인수분해한다. 이때 행렬방정식을 해당 차수보다 낮은 기약다항식으로 인수분해하여 그 결과를 부분회로실현한 다음 선형결합함으로써 최종 선형디지털스위칭함수를 구성하였다. 그 결과 기존의 방법에 비해 선형디지털스위칭함수구성을 상당히 간단화 할 수 있었으며 회로구성은 유한체 GF(P)내에서 정의된 가산기와 계수곱셈기를 사용하여 용이하게 실현 할 수 있다.

High-throughput and low-area implementation of orthogonal matching pursuit algorithm for compressive sensing reconstruction

  • Nguyen, Vu Quan;Son, Woo Hyun;Parfieniuk, Marek;Trung, Luong Tran Nhat;Park, Sang Yoon
    • ETRI Journal
    • /
    • 제42권3호
    • /
    • pp.376-387
    • /
    • 2020
  • Massive computation of the reconstruction algorithm for compressive sensing (CS) has been a major concern for its real-time application. In this paper, we propose a novel high-speed architecture for the orthogonal matching pursuit (OMP) algorithm, which is the most frequently used to reconstruct compressively sensed signals. The proposed design offers a very high throughput and includes an innovative pipeline architecture and scheduling algorithm. Least-squares problem solving, which requires a huge amount of computations in the OMP, is implemented by using systolic arrays with four new processing elements. In addition, a distributed-arithmetic-based circuit for matrix multiplication is proposed to counterbalance the area overhead caused by the multi-stage pipelining. The results of logic synthesis show that the proposed design reconstructs signals nearly 19 times faster while occupying an only 1.06 times larger area than the existing designs for N = 256, M = 64, and m = 16, where N is the number of the original samples, M is the length of the measurement vector, and m is the sparsity level of the signal.

GCN 아키텍쳐 상에서의 OpenCL을 이용한 GPGPU 성능향상 기법 연구 (A Study on GPGPU Performance Improvement Technique on GCN Architecture Using OpenCL API)

  • 우동희;김윤호
    • 한국전자거래학회지
    • /
    • 제23권1호
    • /
    • pp.37-45
    • /
    • 2018
  • 현재 프로그램이 운용되는 시스템은 기존의 싱글코어 및 멀티코어 환경을 넘어서 매니코어, 부가 프로세스 및 이기종 환경까지 그 영역이 확장되고 있는 중이다. 하지만, 기존 연구의 경우 NVIDIA 벤더에서 나온 아키텍쳐 및 CUDA로의 병렬화가 주로 이루어졌고 AMD에서 나온 범용 GPU 아키텍쳐인 GCN 아키텍쳐에 대한 성능향상에 관한 연구는 제한적으로 이루어졌다. 이런 점을 고려해 본 논문에서는 GCN 아키텍쳐의 GPGPU 환경인 OpenCL 내에서의 성능향상 기법에 대해 연구하고 실질적인 성능향상을 보였다. 구체적으로, 행렬 곱셈과 컨볼루션을 적용한 GPGPU 프로그램을 본 논문에서 제시한 성능향상 기법을 통해 최대 30% 이상의 실행시간을 감소시켰으며, 커널 이용률 또한 40% 이상 높였다.

Compression of 3D Mesh Geometry and Vertex Attributes for Mobile Graphics

  • Lee, Jong-Seok;Choe, Sung-Yul;Lee, Seung-Yong
    • Journal of Computing Science and Engineering
    • /
    • 제4권3호
    • /
    • pp.207-224
    • /
    • 2010
  • This paper presents a compression scheme for mesh geometry, which is suitable for mobile graphics. The main focus is to enable real-time decoding of compressed vertex positions while providing reasonable compression ratios. Our scheme is based on local quantization of vertex positions with mesh partitioning. To prevent visual seams along the partitioning boundaries, we constrain the locally quantized cells of all mesh partitions to have the same size and aligned local axes. We propose a mesh partitioning algorithm to minimize the size of locally quantized cells, which relates to the distortion of a restored mesh. Vertex coordinates are stored in main memory and transmitted to graphics hardware for rendering in the quantized form, saving memory space and system bus bandwidth. Decoding operation is combined with model geometry transformation, and the only overhead to restore vertex positions is one matrix multiplication for each mesh partition. In our experiments, a 32-bit floating point vertex coordinate is quantized into an 8-bit integer, which is the smallest data size supported in a mobile graphics library. With this setting, the distortions of the restored meshes are comparable to 11-bit global quantization of vertex coordinates. We also apply the proposed approach to compression of vertex attributes, such as vertex normals and texture coordinates, and show that gains similar to vertex geometry can be obtained through local quantization with mesh partitioning.

NOW 환경에서 개선된 고정 분할 단위 알고리즘 (Refined fixed granularity algorithm on Networks of Workstations)

  • 구본근
    • 정보처리학회논문지A
    • /
    • 제8A권2호
    • /
    • pp.117-124
    • /
    • 2001
  • At NOW (Networks Of Workstations), the load sharing is very important role for improving the performance. The known load sharing strategy is fixed-granularity, variable-granularity and adaptive-granularity. The variable-granularity algorithm is sensitive to the various parameters. But Send algorithm, which implements the fixed-granularity strategy, is robust to task granularity. And the performance difference between Send and variable-granularity algorithm is not substantial. But, in Send algorithm, the computing time and the communication time are not overlapped. Therefore, long latency time at the network has influence on the execution time of the parallel program. In this paper, we propose the preSend algorithm. In the preSend algorithm, the master node can send the data to the slave nodes in advance without the waiting for partial results from the slaves. As the master node sent the next data to the slaves in advance, the slave nodes can process the data without the idle time. As stated above, the preSend algorithm can overlap the computing time and the communication time. Therefore we reduce the influence of the long latency time at the network and the execution time of the parallel program on the NOW. To compare the execution time of two algorithms, we use the $320{\times}320$ matrix multiplication. The comparison results of execution times show that the preSend algorithm has the shorter execution time than the Send algorithm.

  • PDF

학습된 신경망 설계를 위한 가중치의 비트-레벨 어레이 구조 표현과 최적화 방법 (Bit-level Array Structure Representation of Weight and Optimization Method to Design Pre-Trained Neural Network)

  • 임국찬;곽우영;이현수
    • 대한전자공학회논문지SD
    • /
    • 제39권9호
    • /
    • pp.37-44
    • /
    • 2002
  • 학습된 신경망(Pre-trained neural network)은 고정된 가중치(weight)를 갖는다. 이 논문에서는 이러한 특성을 이용하여 신경망의 효과적인 디지털 하드웨어의 설계방법을 제안한다. 이를 위해 신경망의 PEs(Processing Elements)연산은 행렬-벡터 곱셈으로 표하고 고정된 가중치와 입력 데이터의 관계를 비트-레벨 어레이(array) 구조로 표현하여, 노드 소거와 가중치 비트 패턴에 따른 공유 노드 설정을 통한 최적화로 연산에 필요한 노드를 최소화한다. FPGA 시뮬레이션 결과, 완전한 정확성에 기반한 하드웨어를 설계하는 경우, 하드웨어 비용을 상당부분 줄였고 동작 주파수가 높다는 것을 확인하였다. 또한, 제안한 설계방법은 한정된 공간 내에서 많은 수의 PEs 구현이 가능함으로, 큰 신경망 모델에 대한 온-칩(on-chip) 구현이 가능하다.

효율적인 J 관계 계산을 위한 L 클래스 계산의 개선 (Improved Computation of L-Classes for Efficient Computation of J Relations)

  • 한재일;김영만
    • 한국IT서비스학회지
    • /
    • 제9권4호
    • /
    • pp.219-229
    • /
    • 2010
  • The Green's equivalence relations have played a fundamental role in the development of semigroup theory. They are concerned with mutual divisibility of various kinds, and all of them reduce to the universal equivalence in a group. Boolean matrices have been successfully used in various areas, and many researches have been performed on them. Studying Green's relations on a monoid of boolean matrices will reveal important characteristics about boolean matrices, which may be useful in diverse applications. Although there are known algorithms that can compute Green relations, most of them are concerned with finding one equivalence class in a specific Green's relation and only a few algorithms have been appeared quite recently to deal with the problem of finding the whole D or J equivalence relations on the monoid of all $n{\times}n$ Boolean matrices. However, their results are far from satisfaction since their computational complexity is exponential-their computation requires multiplication of three Boolean matrices for each of all possible triples of $n{\times}n$ Boolean matrices and the size of the monoid of all $n{\times}n$ Boolean matrices grows exponentially as n increases. As an effort to reduce the execution time, this paper shows an isomorphism between the R relation and L relation on the monoid of all $n{\times}n$ Boolean matrices in terms of transposition. introduces theorems based on it discusses an improved algorithm for the J relation computation whose design reflects those theorems and gives its execution results.

유한체상에서의 선형디지털스위칭함수 구성 (A Construction of the Linear Digital Switching Function over Finite Fields)

  • 박춘명
    • 한국정보통신학회논문지
    • /
    • 제12권12호
    • /
    • pp.2201-2206
    • /
    • 2008
  • 본 논문에서는 유한체의 수학적 성질과 그래프이론을 바탕으로 GF(P)상의 선형디지털스위칭함수구성을 효과적으로 구성하는 한가지 방법을 제안하였다. 제안한 방법은 주어진 임의의 디지털스위칭함수의 입출력 사이의 연관관계특성으로 부터 DCG를 도출한 후에 노드의 개수를 인수분해한다. 이때 행렬방정식을 해당 차수보다 낮은 기약다항식으로 인수분해하여 그 결과를 부분회로실현한 다음 선형결합함으로써 최종 선형디지털스위칭함수를 구성하였다. 그 결과 기존의 방법에 비해 선형디지털스위칭함수구성을 상당히 간단화 할 수 있었으며 회로구성은 유한체 GF(P)내에서 정의된 가산기와 계수곱셈기를 사용하여 용이하게 실현 할 수 있다.

동시통화 및 주변 잡음을 고려한 핸즈프리 환경의 반향제거기 (An Acoustic Echo Canceler for Hands-Free Telephony, Considering Double Talk and Environment Noise)

  • 김현태;이찬희;박장식
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2009년도 추계학술대회
    • /
    • pp.471-473
    • /
    • 2009
  • 본 논문에서는 핸즈프리 전화통신을 위한 동시통화(double-talk) 및 잡음에 강건한 반향제거 시스템을 제안한다. 제안하는 반향제거 시스템은 동시통화 상황을 판별하기 위해 마이크 입력신호와 추정한 마이크 입력신호의 분산을 기반으로한 동시통화 검출 알고리즘을 사용하고 반향 경로 추정을 위한 적응 필터는 잔여반향 오차 전력과 AP 알고리즘의 투영차수를 곱하여 입력 신호의 자기공분산 행렬에 더해 정규화시킨 알고리즘을 적용한다. 컴퓨터 시뮬레이션을 통한 동시통화 및 주변 잡음이 큰 핸즈프리 환경에서 제안하는 방법이 AIC(acoustic interference cancellation) 측면에서 우수함을 보인다.

  • PDF