• Title/Summary/Keyword: VLSI Architecture

Search Result 277, Processing Time 0.022 seconds

Parallel Processing Implementation of Discrete Hartley Transform using Systolic Array Processor Architecture (Systolic Array Processor Architecture를 이용한 Discrete Hartley Transform 의 병렬 처리 실행)

  • Kang, J.K.;Joo, C.H.;Choi, J.S.
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.14-16
    • /
    • 1988
  • With the development of VLSI technology, research on special processors for high-speed processing is on the increase and studies are focused on designing VLSI-oriented processors for signal processing. This paper processes a one-dimensional systolic array for Discrete Hartley Transform implementation and also processes processing element which is well described for algorithm. The discrete Hartley Transform(DHT) is a real-valued transform closely related to the DFT of a real-valued sequence can be exploited to reduce both the storage and the computation requried to produce the transform of real-valued sequence to a real-valued spectrum while preserving some of the useful properties of the DFT is something preferred. Finally, the architecture of one-dimensional 8-point systolic array, the detailed diagram of PE, total time units concept on implementation this arrays, and modularity are described.

  • PDF

New Multiplier using Montgomery Algorithm over Finite Fields (유한필드상에서 몽고메리 알고리즘을 이용한 곱셈기 설계)

  • 하경주;이창순
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2002.06a
    • /
    • pp.190-194
    • /
    • 2002
  • Multiplication in Galois Field GF(2/sup m/) is a primary operation for many applications, particularly for public key cryptography such as Diffie-Hellman key exchange, ElGamal. The current paper presents a new architecture that can process Montgomery multiplication over GF(2/sup m/) in m clock cycles based on cellular automata. It is possible to implement the modular exponentiation, division, inversion /sup 1)/architecture, etc. efficiently based on the Montgomery multiplication proposed in this paper. Since cellular automata architecture is simple, regular, modular and cascadable, it can be utilized efficiently for the implementation of VLSI.

  • PDF

Systolic arry archtecture for full-search mothion estimation (완전탐색에 의한 움직임 추정기 시스토릭 어레이 구조)

  • 백종섭;남승현;이문기
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.12
    • /
    • pp.27-34
    • /
    • 1994
  • Block matching motion estimation is the most widely used method for motion compensated coding of image sequences. Based on a two dimensional systolic array, VLSI architecture and implementation of the full search block matching algorithm are described in this paper. The proposed architecture improves conventional array architecture by designing efficient processing elements that can control the data prodeuced by efficient search window division method. The advantages are that 1) it allows serial input to reduce pin counts for efficient composition of local memories but performs parallel processing. 2) It is flexible and can adjust to dimensional changes of search windows with simple control logic. 3) It has no idel time during the operation. 4) It can operate in real/time for low and main level in MPEG-2 standard. 5) It has modular and regular structure and thus is sutiable for VLSI implementation.

  • PDF

Modular Multiplier based on Cellular Automata Over $GF(2^m)$ (셀룰라 오토마타를 이용한 $GF(2^m)$ 상의 곱셈기)

  • 이형목;김현성;전준철;유기영
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.1_2
    • /
    • pp.112-117
    • /
    • 2004
  • In this paper, we propose a suitable multiplication architecture for cellular automata in a finite field $GF(2^m)$. Proposed least significant bit first multiplier is based on irreducible all one Polynomial, and has a latency of (m+1) and a critical path of $ 1-D_{AND}+1-D{XOR}$.Specially it is efficient for implementing VLSI architecture and has potential for use as a basic architecture for division, exponentiation and inverses since it is a parallel structure with regularity and modularity. Moreover our architecture can be used as a basic architecture for well-known public-key information service in $GF(2^m)$ such as Diffie-Hellman key exchange protocol, Digital Signature Algorithm and ElGamal cryptosystem.

Systolic Architecture for Digit Level Modular Multiplication/Squaring over GF($2^m$) (GF($2^m$)상에서 디지트 단위 모듈러 곱셈/제곱을 위한 시스톨릭 구조)

  • Lee, Jin-Ho;Kim, Hyun-Sung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.1
    • /
    • pp.41-47
    • /
    • 2008
  • This paper presents a new digit level LSB-first multiplier for computing a modular multiplication and a modular squaring simultaneously over finite field GF($2^m$). To derive $L{\times}L$ digit level architecture when digit size is set to L, the previous algorithm is used and index transformation and merging the cell of the architecture are proposed. The proposed architecture can be utilized for the basic architecture for the crypto-processor and it is well suited to VLSI implementation because of its simplicity, regularity, and concurrency.

Sphere Decoding Algorithm and VLSI Implementation Using Two-Level Search (2 레벨 탐색을 이용한 스피어 디코딩 알고리즘과 VLSI 구현)

  • Huynh, Tronganh;Cho, Jong-Min;Kim, Jin-Sang;Cho, Won-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.6
    • /
    • pp.104-110
    • /
    • 2008
  • In this paper, a novel 2-level-search sphere decoding algorithm for multiple-input multiple-output (MIMO) detection and its VLSI implementation are presented. The proposed algorithm extends the search space by concurrently performing symbol detection on 2 level of the tree search. Therefore, the possibility of discarding good candidates can be avoided. Simulation results demonstrate the good performance of the proposed algorithm in terms of bit-error-rate (BER). From the proposed algorithm, an efficient very large scale integration (VLSI) architecture which incorporates low-complexity and fixed throughput features is proposed. The proposed architecture supports many modulation techniques such as BPSK, QPSK, 16-QAM and 64-QAM. The sorting block, which occupies a large portion of hardware utilization, is shared for different operating modes to reduce the area. The proposed hardware implementation results show the improvement in terms of area and BER performance compared with existing architectures.

An Efficient VLSI Architecture of Deblocking Filter in H.264 Advanced Video Coding (H.264/AVC를 위한 디블록킹 필터의 효율적인 VLSI 구조)

  • Lee, Sung-Man;Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.7
    • /
    • pp.52-60
    • /
    • 2008
  • The deblocking filter in the H.264/AVC video coding standard helps to reduce the blocking artifacts produced in the decoding process. But it consumes one third of the computational complexity in H.624/AVC decoder, which advocates an efficient design of a hardware accelerator for filtering. This paper proposes an architecture of deblocking filter using two filters and some registers for data reuse. Our architecture improves the throughput and minimize the number of external memory access by increasing data reuse. After initialization, two filters are able to perform filtering operation simultaneously. It takes only 96 clocks to complete filtering for one macroblock. We design and synthesis our architecture using Dongbuanam $0.18{\mu}m$ standard cell library and the maximum clock frequency is 200MHz.

A New Parallelizing Algorithm and Cell-based Hardware Architecture for High-speed Generation of Digital Hologram (디지털 홀로그램의 고속 생성을 위한 병렬화 알고리즘 및 셀 기반의 하드웨어 구조)

  • Seo, Young-Ho;Choi, Hyun-Jun;Yoo, Ji-Sang;Kim, Dong-Wook
    • Journal of Broadcast Engineering
    • /
    • v.16 no.1
    • /
    • pp.54-63
    • /
    • 2011
  • This paper proposes a new equation to calculate computer-generated hologram (CGH) in a high speed and its cell-based VLSI (veri large scale integrated circuit) architecture. After finding the calculational regularity in the horizontal or vertical direction from the basic CGH equation, we induce the new equation to calculate the horizontal or vertical hologram pixel values in parallel. We also propose the architecture of the CGH cell consisting of a initial parameter calculator and update-phase calculator(s) on the basis of the equation and implement them in hardware. Also we show a hardware architecture to parallelize the calculation in the horizontal direction by extending CGH. In the experiments we analyze the used hardware resources. These analyses makes it possible to select the amount of hardware to the precision of the results. Here, for the CGH kernel and the structure of the processor, we used the platform from our previous works.

Design of an Efficient VLSI Architecture of SADCT Based on Systolic Array (시스톨릭 어레이에 기반한 SADCT의 효율적 VLSl 구조설계)

  • Gang, Tae-Jun;Jeong, Ui-Yun;Gwon, Sun-Gyu;Ha, Yeong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.3
    • /
    • pp.282-291
    • /
    • 2001
  • In this paper, an efficient VLSI architecture of Shape Adaptive Discrete Cosine Transform(SADCT) based on systolic array is proposed. Since transform size in SADCT is varied according to the shape of object in each block, it are dropped that both usability of processing elements(PE´s) and throughput rate in time-recursive SADCT structure. To overcome these disadvantages, it is proposed that the architecture based on a systolic way structure which doesn´t need memory. In the proposed architecture, throughput rate is improved by consecutive processing of one-dimensional SADCT without memory and PE´s in the first column are connected to that in the last one for improvement of usability of PE. And input data are put into each column of PE in parallel according to the maximum data number in each rearranged block. The proposed architecture is described by VHDL. Also, its function is evaluated by MentorTM. Even though the hardware complexity is somewhat increased, the throughput rate is improved about twofold.

  • PDF

Reduction of Input Pins in VLSI Array for High Speed Fractal Image Compression (고속 프랙탈 영상압축을 위한 VLSI 어레이의 입력핀의 감소)

  • 성길영;전상현;이수진;우종호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.12A
    • /
    • pp.2059-2066
    • /
    • 2001
  • In this paper, we proposed a method to reduce the number of input pins in one-dimensional VLSI array for fractal image compression. We use quad-tree partition scheme and can reduce the number of the input pins up to 50% by sharing the domain\`s and the range\`s data input pins in the proposed VLSI array architecture. Also, we can reduce the input pins and simplify the internal operation circuit of the processing elements by eliminating a few number of bits of the least significant bits of the input data. We simulated using the 256$\times$256 and 512$\times$512 Lena images to verify performance of the proposed method. As the result of simulation, we can decompress the original image with about 32dB(PSNR) in spite of elimination of the least significant 2-bit in the original input data, and additionally reduce the number of input pins up to 25% compared to VLSI array sharing input pins of range and domain.

  • PDF