• Title/Summary/Keyword: 시스토릭 어레이

Search Result 29, Processing Time 0.019 seconds

A Design and the Efficient Operation of Systolic Array for Polyadic-Nonserial Dynamic Programming Processing (Polyadic-Nonserial 동적 프로그래밍 처리를 위한 시스토릭 어레이의 설계 및 효율적인 운영)

  • 우종호;한광선
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.8
    • /
    • pp.1178-1186
    • /
    • 1989
  • In this paper, a systolic array for polyadic-nonserial DP problems is designed, the performance is analyzed and the efficient operating method is proposed. The algorithm is transformed to remove the broadcasting and global communication paths in the data dependence step by step. The transformed algorithm is mapping to the systolic array using the method proposed by D. I. Moldovan. The designed array is homogenous, had the processing elements of (n+1)/2 and 2n computation time ( n is the size of problem). In case of being many problems to process, the efficiency of array can be upward by inputing the problems successively. The interval between the initiations of two successive proboem instances is [n/2]+1 and the speed-up is about 4. The processor utilizations of each case are calculated.

  • PDF

Design of One-Dimensional Systolic Array for Recognition of Context-Free Language (Context-Free 언어의 인식을 위한 일차원 시스토릭 어레이의 설계)

  • 우종호;한광선
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.1
    • /
    • pp.30-36
    • /
    • 1990
  • Context-free language can be recognized by Cocke-Younger-Kasami algorithm. This algorithm is a class of polyadic-nonserial dynamic programming technique and has the O(n**3) time complexity. In this paper, a one-dimensional systolic array for recognition of context-free language is designed. The designed triangle type two-dimensional array is projected and transformed to an one-dimensional array. The designed one-dimensional array has n processing elements and \ulcornern+1)/2\ulcorner(n-1)+3n-1 time units to process the algorithm (n is the length of input string). The time complexity is O(n\ulcorner.

  • PDF

Systolic arry archtecture for full-search mothion estimation (완전탐색에 의한 움직임 추정기 시스토릭 어레이 구조)

  • 백종섭;남승현;이문기
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.12
    • /
    • pp.27-34
    • /
    • 1994
  • Block matching motion estimation is the most widely used method for motion compensated coding of image sequences. Based on a two dimensional systolic array, VLSI architecture and implementation of the full search block matching algorithm are described in this paper. The proposed architecture improves conventional array architecture by designing efficient processing elements that can control the data prodeuced by efficient search window division method. The advantages are that 1) it allows serial input to reduce pin counts for efficient composition of local memories but performs parallel processing. 2) It is flexible and can adjust to dimensional changes of search windows with simple control logic. 3) It has no idel time during the operation. 4) It can operate in real/time for low and main level in MPEG-2 standard. 5) It has modular and regular structure and thus is sutiable for VLSI implementation.

  • PDF

Bit-Level Systolic Array for Modular Multiplication (모듈러 곱셈연산을 위한 비트레벨 시스토릭 어레이)

  • 최성욱
    • Proceedings of the Korea Institutes of Information Security and Cryptology Conference
    • /
    • 1995.11a
    • /
    • pp.163-172
    • /
    • 1995
  • In this paper, the bit-level 1-dimensionl systolic array for modular multiplication are designed. First of all, the parallel algorithms and data dependence graphs from Walter's Iwamura's methods based on Montgomery Algorithm for modular multiplication are derived and compared. Since Walter's method has the smaller computational index points in data dependence graph than Iwamura's, it is selected as the base algorithm. By the systematic procedure for systolic array design, four 1-dimensional systolic arrays ale obtained and then are evaluated by various criteria. Modifying the array derived from 〔0,1〕 projection direction by adding a control logic and serializing the communication paths of data A, optimal 1-dimensional systolic array is designed. It has constant I/O channels for modular expandable and is good for fault tolerance due to unidirectional paths. And so, it is suitable for RSA Cryptosystem which deals with the large size and many consecutive message blocks.

  • PDF

VLSI Design of Soft Decision Viterbi Decoder Using Systolic Array Architecture (역추적 방식의 시스토릭 어레이 구조를 가진 연판정 비터비 복호기의 설계)

  • Kim, Ki-Bo;Kim, Jong-Tae
    • Proceedings of the KIEE Conference
    • /
    • 1999.07g
    • /
    • pp.3199-3201
    • /
    • 1999
  • Convolutional coding with Viterbi decoding is known as a powerful method for forward error correction among many kinds of channel coding methods. This paper presents a soft decision Viterbi decoder which has systolic array trace-back architecture[1]. Soft decision is known as more effective method than hard decision and most of digital communication systems use soft decision. The advantage of using a systolic array decoder is that the trace-back operation can be accomplished continuously in an array of registers in a pipe-line fashion, instead of waiting for the entire trace-back procedure to be completed at each iteration. Therefore it may be suitable for faster communication system. We described operations of each module of the decoder and showed results of the logic synthesis and functional simulation.

  • PDF

Minimum-Distance Classified Vector Quantizer and Its Systolic Array Architecture (최소거리 분류벡터 양자기와 시스토릭 어레이 구조)

  • Kim, Dong Sic
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.5
    • /
    • pp.77-86
    • /
    • 1995
  • In this paper in order to reduce the encoding complexity required in the full search vector quantization(VQ), a new classified vector quantization(CVQ) technique is described employing the minimum-distance classifier. The determination of the optimal subcodebook sizes for each class is an important task in CVQ designs and is not an easy work. Therefore letting the subcodebook sizes be equal. A CVQ technique. Which satisties the optimal CVQ condition approximately, is proposed. The proposed CVQ is a kind of the partial search VQ because it requires a search process within each subcodebook only, and the minimum encoding complexity since the subcodebook sizes are the same in each class. But simulation results reveal while the encoding complexity is only O(N$^{1/2}$) comparing with O(N) of the full-search VQ. A simple systolic array, which has the through-put of k, is also proposed for the implementation of the VQ. Since the operation of the classifier is identical with that of the VQ, the proposed array is applied to both the classifier and the VQ in the proposed CVQ, which shows the usefulness of the proposed CVQ.

  • PDF

VLSI Implementation of Low-Power Motion Estimation Using Reduced Memory Accesses and Computations (메모리 호출과 연산횟수 감소기법을 이용한 저전력 움직임추정 VLSI 구현)

  • Moon, Ji-Kyung;Kim, Nam-Sub;Kim, Jin-Sang;Cho, Won-Kyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.5A
    • /
    • pp.503-509
    • /
    • 2007
  • Low-power motion estimation is required for video coding in portable information devices. In this paper, we propose a low-power motion estimation algorithm and 1-D systolic may VLSI architecture using full search block matching algorithm (FSBMA). Main power dissipation sources of FSBMA are complex computations and frequent memory accesses for data in the search area. In the proposed algorithm, memory accesses and computations are reduced by using 1D PE (processing array) array architecture performing motion estimation of two neighboring blocks in parallel and by skipping unnecessary computations during motion estimation. The VLSI implementation results of the algorithm show that the proposed VLSI architecture can save 9.3% power dissipation and can operate two times faster than an existing low-power motion estimator.

A VLSI Architecture of Systolic Array for FET Computation (고속 퓨리어 변환 연산용 VLSI 시스토릭 어레이 아키텍춰)

  • 신경욱;최병윤;이문기
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.9
    • /
    • pp.1115-1124
    • /
    • 1988
  • A two-dimensional systolic array for fast Fourier transform, which has a regular and recursive VLSI architecture is presented. The array is constructed with identical processing elements (PE) in mesh type, and due to its modularity, it can be expanded to an arbitrary size. A processing element consists of two data routing units, a butterfly arithmetic unit and a simple control unit. The array computes FFT through three procedures` I/O pipelining, data shuffling and butterfly arithmetic. By utilizing parallelism, pipelining and local communication geometry during data movement, the two-dimensional systolic array eliminates global and irregular commutation problems, which have been a limiting factor in VLSI implementation of FFT processor. The systolic array executes a half butterfly arithmetic based on a distributed arithmetic that can carry out multiplication with only adders. Also, the systolic array provides 100% PE activity, i.e., none of the PEs are idle at any time. A chip for half butterfly arithmetic, which consists of two BLC adders and registers, has been fabricated using a 3-um single metal P-well CMOS technology. With the half butterfly arithmetic execution time of about 500 ns which has been obtained b critical path delay simulation, totla FFT execution time for 1024 points is estimated about 16.6 us at clock frequency of 20MHz. A one-PE chip expnsible to anly size of array is being fabricated using a 2-um, double metal, P-well CMOS process. The chip was layouted using standard cell library and macrocell of BLC adder with the aid of auto-routing software. It consists of around 6000 transistors and 68 I/O pads on 3.4x2.8mm\ulcornerarea. A built-i self-testing circuit, BILBO (Built-In Logic Block Observation), was employed at the expense of 3% hardware overhead.

  • PDF

FPGA-Based Acceleration of Range Doppler Algorithm for Real-Time Synthetic Aperture Radar Imaging (실시간 SAR 영상 생성을 위한 Range Doppler 알고리즘의 FPGA 기반 가속화)

  • Jeong, Dongmin;Lee, Wookyung;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.634-643
    • /
    • 2021
  • In this paper, an FPGA-based acceleration scheme of range Doppler algorithm (RDA) is proposed for the real time synthetic aperture radar (SAR) imaging. Hardware architectures of matched filter based on systolic array architecture and a high speed sinc interpolator to compensate range cell migration (RCM) are presented. In addition, the proposed hardware was implemented and accelerated on Xilinx Alveo FPGA. Experimental results for 4096×4096-size SAR imaging showed that FPGA-based implementation achieves 2 times acceleration compared to GPU-based design. It was also confirmed the proposed design can be implemented with 60,247 CLB LUTs, 103,728 CLB registers, 20 block RAM tiles and 592 DPSs at the operating frequency of 312 MHz.