• Title/Summary/Keyword: Systolic array

Search Result 144, Processing Time 0.026 seconds

UVM-based Verification of Equalizer Module for Telecommunication System (통신시스템용 등화기 모듈을 위한 UVM 기반 검증)

  • Dae-Won Moon;Dae-Ki Hong
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.1
    • /
    • pp.25-35
    • /
    • 2024
  • In the present modern day, as the complexity and size of SoC(System on Chip) increase, the importance of design verification are increasing, Therefore it takes a lot of time to verify the design. There is an emerging need to manage the verification environment faster and more efficiently by reusing the existing verification environment. UVM-based verification is a standardized and highly reliable verification method widely adopted and used in the semiconductor industry. This paper presents a UVM-based verification for the 4 tap equalizer module with a systolic array structure. Through the constraints randomization, it was confirmed that various test scenarios stimulus were generated. In addition, by verifying a simulation comparing the actual DUT outputs with the MATLAB reference outputs, the reuse and efficiency of the UVM test bench could be confirmed.

  • PDF

Design and Fabrication of a Processing Element for 2-D Systolic FFT Array (고속 퓨리어변환용 2차원 시스토릭 어레이를 위한 처리요소의 설계 및 제작)

  • Lee, Moon-Key;Shin, Kyung-Wook;Choi, Byeong-Yoon;,
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.3
    • /
    • pp.108-115
    • /
    • 1990
  • This paper describes the design and fabrication of a processing element that will be used as a component in the construction of a two dimensional systolic for FFT. The chip performs data shuffling and radix-2 decimation-in-time (DIT) butterfly arithmetic. It consists of a data routing unit, internal control logic and HBA unit which computes butterfly arithmetic. The 6.5K transistors processing element designed with standard cells has been fabricated with a 2u'm double metal CMOS process, and evaluated by wafer probing measurements. The measured characteristics show that a HBA can be computed in 0.5 usec with a 20MHz clok, and it is estimated that the FFT of length 1024 can be transformed in 11.2 usec.

  • PDF

High-Speed Reed-Solomon Decoder Using New Degree Computationless Modified Euclid´s Algorithm (새로운 DCME 알고리즘을 사용한 고속 Reed-Solomon 복호기)

  • 백재현;선우명훈
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.6
    • /
    • pp.459-468
    • /
    • 2003
  • This paper proposes a novel low-cost and high-speed Reed-Solomon (RS) decoder based on a new degree computationless modified Euclid´s (DCME) algorithm. This architecture has quite low hardware complexity compared with conventional modified Euclid´s (ME) architectures, since it can remove completely the degree computation and comparison circuits. The architecture employing a systolic away requires only the latency of 2t clock cycles to solve the key equation without initial latency. In addition, the DCME architecture using 3t+2 basic cells has regularity and scalability since it uses only one processing element. The RS decoder has been synthesized using the 0.25${\mu}{\textrm}{m}$. Faraday CMOS standard cell library and operates at 200MHz and its data rate suppots up to 1.6Gbps. For tile (255, 239, 8) RS code, the gate counts of the DCME architecture and the whole RS decoder excluding FIFO memory are only 21,760 and 42,213, respectively. The proposed RS decoder can reduce the total fate count at least 23% and the total latency at least 10% compared with conventional ME architectures.

A Novel Arithmetic Unit Over GF(2$^{m}$) for Reconfigurable Hardware Implementation of the Elliptic Curve Cryptographic Processor (타원곡선 암호프로세서의 재구성형 하드웨어 구현을 위한 GF(2$^{m}$)상의 새로운 연산기)

  • 김창훈;권순학;홍춘표;유기영
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.8
    • /
    • pp.453-464
    • /
    • 2004
  • In order to solve the well-known drawback of reduced flexibility that is associate with ASIC implementations, this paper proposes a novel arithmetic unit over GF(2$^{m}$ ) for field programmable gate arrays (FPGAs) implementations of elliptic curve cryptographic processor. The proposed arithmetic unit is based on the binary extended GCD algorithm and the MSB-first multiplication scheme, and designed as systolic architecture to remove global signals broadcasting. The proposed architecture can perform both division and multiplication in GF(2$^{m}$ ). In other word, when input data come in continuously, it produces division results at a rate of one per m clock cycles after an initial delay of 5m-2 in division mode and multiplication results at a rate of one per m clock cycles after an initial delay of 3m in multiplication mode respectively. Analysis shows that while previously proposed dividers have area complexity of Ο(m$^2$) or Ο(mㆍ(log$_2$$^{m}$ )), the Proposed architecture has area complexity of Ο(m), In addition, the proposed architecture has significantly less computational delay time compared with the divider which has area complexity of Ο(mㆍ(log$_2$$^{m}$ )). FPGA implementation results of the proposed arithmetic unit, in which Altera's EP2A70F1508C-7 was used as the target device, show that it ran at maximum 121MHz and utilized 52% of the chip area in GF(2$^{571}$ ). Therefore, when elliptic curve cryptographic processor is implemented on FPGAs, the proposed arithmetic unit is well suited for both division and multiplication circuit.

A Fast Inversion for Low-Complexity System over GF(2 $^{m}$) (경량화 시스템에 적합한 유한체 $GF(2^m)$에서의 고속 역원기)

  • Kim, So-Sun;Chang, Nam-Su;Kim, Chang-Han
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.9 s.339
    • /
    • pp.51-60
    • /
    • 2005
  • The design of efficient cryptosystems is mainly appointed by the efficiency of the underlying finite field arithmetic. Especially, among the basic arithmetic over finite field, the rnultiplicative inversion is the most time consuming operation. In this paper, a fast inversion algerian in finite field $GF(2^m)$ with the standard basis representation is proposed. It is based on the Extended binary gcd algorithm (EBGA). The proposed algorithm executes about $18.8\%\;or\;45.9\%$ less iterations than EBGA or Montgomery inverse algorithm (MIA), respectively. In practical applications where the dimension of the field is large or may vary, systolic array sDucture becomes area-complexity and time-complexity costly or even impractical in previous algorithms. It is not suitable for low-weight and low-power systems, i.e., smartcard, the mobile phone. In this paper, we propose a new hardware architecture to apply an area-efficient and a synchronized inverter on low-complexity systems. It requires the number of addition and reduction operation less than previous architectures for computing the inverses in $GF(2^m)$ furthermore, the proposed inversion is applied over either prime or binary extension fields, more specially $GF(2^m)$ and GF(P) .

VLSI Implementation of Low-Power Motion Estimation Using Reduced Memory Accesses and Computations (메모리 호출과 연산횟수 감소기법을 이용한 저전력 움직임추정 VLSI 구현)

  • Moon, Ji-Kyung;Kim, Nam-Sub;Kim, Jin-Sang;Cho, Won-Kyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.5A
    • /
    • pp.503-509
    • /
    • 2007
  • Low-power motion estimation is required for video coding in portable information devices. In this paper, we propose a low-power motion estimation algorithm and 1-D systolic may VLSI architecture using full search block matching algorithm (FSBMA). Main power dissipation sources of FSBMA are complex computations and frequent memory accesses for data in the search area. In the proposed algorithm, memory accesses and computations are reduced by using 1D PE (processing array) array architecture performing motion estimation of two neighboring blocks in parallel and by skipping unnecessary computations during motion estimation. The VLSI implementation results of the algorithm show that the proposed VLSI architecture can save 9.3% power dissipation and can operate two times faster than an existing low-power motion estimator.

Characteristic analysis of Modular Multipliers and Squarers for GF($2^m$) (유한 필드 GF($2^m$)상의 모듈러 곱셈기 및 제곱기 특성 분석)

  • 한상덕;김창훈;홍춘표
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.7 no.5
    • /
    • pp.167-174
    • /
    • 2002
  • This paper analyzes the characteristics of three multipliers and squarers in finite fields GF(2/sup m/) from the point of view of processing time and area complexity. First, we analyze structures of three multipliers and squarers: 1) Systolic array structure, 2), LFSR structure, and 3) CA structure. To make performance analysis, each multiplier and squarer was modeled in VHDL and was synthesized for FPGA implementation. The simulation results show that CA structure is the best from the point view of processing time, and LFSR structure is the best from the point of view of area complexity.

  • PDF

Efficient DFT/DCT Computation for OFDM in Cognitive Radio System (Cognitive Radio 시스템의 OFDM을 위한 효율적 DCT/DFT 계산에 관한 연구)

  • Chen, Zhu;Kim, Jeong-Ki;Yan, Yi-Er;Lee, Moon-Ho
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.45 no.2
    • /
    • pp.97-102
    • /
    • 2008
  • In this paper, we address the OFDM based on DFT or DCT in Cognitive Radio system. An adaptive OFDM based on DFT or DCT in Cognitive Radio system has the capacity to nullify individual carriers to avoid interference to the licensed users. Therefore, there could be a considerably large number of zero-valued inputs/outputs for the IDFT/DFT or IDCT/DCT on the OFDM transceiver. Hence, the standard methods of DFT and DCT are no longer efficient due to the wasted operations on zero. Based on this observation, we present a transform decomposition on two dimensional(2-D) systolic array for IDFT/DFT and IDCT/DCT, this algorithm can achieve an efficient computation for OFDM in Cognitive Radio system

Low-latency Montgomery AB2 Multiplier Using Redundant Representation Over GF(2m)) (GF(2m) 상의 여분 표현을 이용한 낮은 지연시간의 몽고메리 AB2 곱셈기)

  • Kim, Tai Wan;Kim, Kee-Won
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.12 no.1
    • /
    • pp.11-18
    • /
    • 2017
  • Finite field arithmetic has been extensively used in error correcting codes and cryptography. Low-complexity and high-speed designs for finite field arithmetic are needed to meet the demands of wider bandwidth, better security and higher portability for personal communication device. In particular, cryptosystems in GF($2^m$) usually require computing exponentiation, division, and multiplicative inverse, which are very costly operations. These operations can be performed by computing modular AB multiplications or modular $AB^2$ multiplications. To compute these time-consuming operations, using $AB^2$ multiplications is more efficient than AB multiplications. Thus, there are needs for an efficient $AB^2$ multiplier architecture. In this paper, we propose a low latency Montgomery $AB^2$ multiplier using redundant representation over GF($2^m$). The proposed $AB^2$ multiplier has less space and time complexities compared to related multipliers. As compared to the corresponding existing structures, the proposed $AB^2$ multiplier saves at least 18% area, 50% time, and 59% area-time (AT) complexity. Accordingly, it is well suited for VLSI implementation and can be easily applied as a basic component for computing complex operations over finite field, such as exponentiation, division, and multiplicative inverse.

A Design of Cellular Array Parallel Multiplier on Finite Fields GF(2m) (유한체 GF(2m)상의 셀 배열 병렬 승산기의 설계)

  • Seong, Hyeon-Kyeong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.1
    • /
    • pp.1-10
    • /
    • 2004
  • A cellular array parallel multiplier with parallel-inputs and parallel-outputs for performing the multiplication of two polynomials in the finite fields GF$(2^m)$ is presented in this paper. The presented cellular way parallel multiplier consists of three operation parts: the multiplicative operation part (MULOP), the irreducible polynomial operation part (IPOP), and the modular operation part (MODOP). The MULOP and the MODOP are composed if the basic cells which are designed with AND Bates and XOR Bates. The IPOP is constructed by XOR gates and D flip-flops. This multiplier is simulated by clock period l${\mu}\textrm{s}$ using PSpice. The proposed multiplier is designed by 24 AND gates, 32 XOR gates and 4 D flip-flops when degree m is 4. In case of using AOP irreducible polynomial, this multiplier requires 24 AND gates and XOR fates respectively. and not use D flip-flop. The operating time of MULOP in the presented multiplier requires one unit time(clock time), and the operating time of MODOP using IPOP requires m unit times(clock times). Therefore total operating time is m+1 unit times(clock times). The cellular array parallel multiplier is simple and regular for the wire routing and have the properties of concurrency and modularity. Also, it is expansible for the multiplication of two polynomials in the finite fields with very large m.