• Title/Summary/Keyword: systolic architecture

Search Result 96, Processing Time 0.024 seconds

Design of a systolic radix-4 finite-field multiplier for the elliptic curve cryptography (타원곡선 암호를 위한 시스톨릭 Radix-4 유한체 곱셈기 설계)

  • Park Tae-Geun;Kim Ju-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.3 s.345
    • /
    • pp.40-47
    • /
    • 2006
  • The finite-field multiplication can be applied to the elliptic curve cryptosystems. However, an efficient algorithm and the hardware design are required since the finite-field multiplication takes much time to compute. In this paper, we propose a radix-4 systolic multiplier on $GF(2^m)$ with comparative area and performance. The algorithm of the proposed standard-basis multiplier is mathematically developed to map on low-cost systolic cells, so that the proposed systolic architecture is suitable for VLSI design. Compared to the bit-parallel, bit-serial and systolic multipliers, the proposed multiplier has relatively effective high performance and low cost. We design and synthesis $GF(2^{193})$ finite-field multiplier using Hynix $0.35{\mu}m$ standard cell library and the maximum clock frequency is 400MHz.

High-Performance Givens Rotation-based QR Decomposition Architecture Applicable for MIMO Receiver (MIMO 수신기에 적용 가능한 고성능 기븐스 회전 기반의 QR 분해 하드웨어 구조)

  • Yoon, Ji-Hwan;Lee, Min-Woo;Park, Jong-Sun
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.49 no.3
    • /
    • pp.31-37
    • /
    • 2012
  • This paper presents an efficient hardware architecture to enable the high-speed Givens rotation-based QR decomposition. The proposed architecture achieves a highly parallel givens rotation process by maximizing the number of pivots selected for parallel zero-insertions. Sign-select lookahed (SSL)-CORDIC is also efficiently used for the high-speed givens rotation. The performance of QR decomposition hardware considerably increases compared to the conventional triangular systolic array (TSA) architecture. Moreover, the circuit area of QR decomposition hardware was reduced by decreasing the number of flip-flops for holding the pre-computed results during the decomposition process. The proposed QR decomposition hardware was implemented using TSMC $0.25{\mu}m$ technology. The experimental results show that the proposed architecture achieves up to 70 % speed-up over the TACR/TSA-based architecture for the $8{\times}8$ matrix decomposition.

Design of an Area-Efficient Reed-Solomon Decoder using Pipelined Recursive Technique (파이프라인 재귀적인 기술을 이용한 면적 효율적인 Reed-Solomon 복호기의 설계)

  • Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.7 s.337
    • /
    • pp.27-36
    • /
    • 2005
  • This paper presents an area-efficient architecture to implement the high-speed Reed-Solomon(RS) decoder, which is used in a variety of communication systems such as wireless and very high-speed optical communications. We present the new pipelined-recursive Modified Euclidean(PrME) architecture to achieve high-throughput rate and reducing hardware-complexity using folding technique. The proposed pipelined recursive architecture can reduce the hardware complexity about 80$\%$ compared to the conventional systolic-array and fully-parallel architecture. The proposed RS decoder has been designed and implemented with the 0.13um CMOS technology in a supply voltage of 1.2 V. The result show that total number of gate is 393 K and it has a data processing rate of S Gbits/s at clock frequency of 625 MHz. The proposed area-efficient architecture can be readily applied to the next generation FEC devices for high-speed optical communications as well as wireless communications.

New Enhanced Degree Computationless Modified Euclid's Algorithm and its Architecture for Reed-Solomon decoders (Reed-Solomon 복호기를 위한 새로운 E-DCME 알고리즘 및 하드웨어 구조)

  • Baek, Jae-Hyun;SunWoo, Myung-Hoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.8A
    • /
    • pp.820-826
    • /
    • 2007
  • This paper proposes an enhanced degree computationless modified Euclid's(E-DCME) algorithm and its architecture for Reed-Solomon decoders. The proposed E-DCME algorithm has shorter critical path delay that is $T_{mult}+T_{add}+T_{mux}$ compared with the existing modified Euclid's algorithm and the degree computationless modified Euclid's(DCME) algorithm since it uses new initial conditions. The proposed E-DCME architecture employing a systolic array requires only 2t-1 clock cycles to solve the key equation without initial latency. In addition, the E-DCME architecture consisting of 3t basic cells has regularity and scalability since it uses only one processing element. The E-DCME architecture using the $0.18{\mu}m$ Samsung standard cell library consists of 18,000 gates.

MOEPE: Merged Odd-Even PE Architecture for Stereo Matching Hardware (MOEPE: 스테레오 정합 하드웨어를 위한 Merged Odd-Even PE 구조)

  • 한필우;양영일
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1137-1140
    • /
    • 1998
  • In this paper, we propose the new hardware architecture which implements the stereo matching algorithm using the dynamic programming method. The dynamic programming method is used in finding the corresponding pixels between the left image and the right image. The proposed MOEPE(Merged Odd-Even PE) architecture operates in the systolic manner and finds the disparities from the intensities of the pixels on the epipolar line. The number of PEs used in the MOEPE architecture is the number of the range constraint, which reduced the number of the necessary PEs dramatically compared to the traditional method which uses the PEs with the number of pixels on the epipolar line. For the normal method by 25 times. The proposed architecture is modeled with the VHDL code and simulated by the SYNOPSYS tool.

  • PDF

Design and Analysis of a $AB^2$ Systolic Arrays for Division/Inversion in$GF(2^m)$ ($GF(2^m)$상에서 나눗셈/역원 연산을 위한 $AB^2$ 시스톨릭 어레이 설계 및 분석)

  • 김남연;고대곤;유기영
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.1
    • /
    • pp.50-58
    • /
    • 2003
  • Among finite field arithmetic operations, the $AB^2$ operation is known as an efficient basic operation for public key cryptosystems over $GF(2^m)$,Division/Inversion is computed by performing the repetitive AB$^2$ multiplication. This paper presents two new $AB^2$algorithms and their systolic realizations in finite fields $GF(2^m)$.The proposed algorithms are based on the MSB-first scheme using standard basis representation and the proposed systolic architectures for $AB^2$ multiplication have a low hardware complexity and small latency compared to the conventional approaches. Additionally, since the proposed architectures incorporate simplicity, regularity, modularity, and pipelinability, they are well suited to VLSI implementation and can be easily applied to inversion architecture. Furthermore, these architectures will be utilized for the basic architecture of crypto-processor.

Bit-Parallel Systolic Divider in Finite Field GF(2m) (유한 필드 GF(2m)상의 비트-패러럴 시스톨릭 나눗셈기)

  • 김창훈;김종진;안병규;홍춘표
    • The KIPS Transactions:PartA
    • /
    • v.11A no.2
    • /
    • pp.109-114
    • /
    • 2004
  • This paper presents a high-speed bit-parallel systolic divider for computing modular division A($\chi$)/B($\chi$) mod G($\chi$) in finite fields GF$(2^m)$. The presented divider is based on the binary GCD algorithm and verified through FPGA implementation. The proposed architecture produces division results at a rate of one every 1 clock cycles after an initial delay of 5m-2. Analysis shows that the proposed divider provides a significant reduction in both chip area and computational delay time compared to previously proposed systolic dividers with the same I/O format. In addition, since the proposed architecture does not restrict the choice of irreducible polynomials and has regularity and modularity, it provides a high flexibility and Scalability with respect to the field size m. Therefore, the proposed divider is well suited to VLSI implementation.

Design of MSB-First Digit-Serial Multiplier for Finite Fields GF(2″) (유한 필드 $GF(2^m)$상에서의 MSB 우선 디지트 시리얼 곱셈기 설계)

  • 김창훈;한상덕;홍춘표
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.6C
    • /
    • pp.625-631
    • /
    • 2002
  • This paper presents a MSB-first digit-serial systolic array for computing modular multiplication of A(x)B(x) mod G(x) in finite fields $GF(2^m)$. From the MSB-first multiplication algorithm in $GF(2^m)$, we obtain a new data dependence graph and design an efficient digit-serial systolic multiplier. For circuit synthesis, we obtain VHDL code for multiplier, If input data come in continuously, the implemented multiplier can produce multiplication results at a rate of one every [m/L] clock cycles, where L is the selected digit size. The analysis results show that the proposed architecture leads to a reduction of computational delay time and it has much more simple structure than existing digit-serial systolic multiplier. Furthermore, since the propose architecture has the features of unidirectional data flow and regularity, it shows good extension characteristics with respect to m and L.

A Decoder Design for High-Speed RS code (RS 코드를 이용한 복호기 설계)

  • 박화세;김은원
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.35T no.1
    • /
    • pp.59-66
    • /
    • 1998
  • In this paper, the high-speed decoder for RS(Reed-Solomon) code, one of the most popular error correcting code, is implemented using VHDL. This RS decoder is designed in transform domain instead of most time domain. Because of the simplicity in structure, transform decoder can be easily realized VLSI chip. Additionally the pipeline architecture, which is similar to a systolic array is applied for all design. Therefore, This transform RS decoder is suitable for high-rate data transfer. After synthesis with FPGA technology, the decoding rate is more 43 Mbytes/s and the area is 1853 LCs(Logic Cells). To compare with other product with pipeline architecture, this result is admirable. Error correcting ability and pipeline performance is certified by computer simulation.

  • PDF

High-Speed Reed-Solomon Decoder Using New Degree Computationless Modified Euclid´s Algorithm (새로운 DCME 알고리즘을 사용한 고속 Reed-Solomon 복호기)

  • 백재현;선우명훈
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.6
    • /
    • pp.459-468
    • /
    • 2003
  • This paper proposes a novel low-cost and high-speed Reed-Solomon (RS) decoder based on a new degree computationless modified Euclid´s (DCME) algorithm. This architecture has quite low hardware complexity compared with conventional modified Euclid´s (ME) architectures, since it can remove completely the degree computation and comparison circuits. The architecture employing a systolic away requires only the latency of 2t clock cycles to solve the key equation without initial latency. In addition, the DCME architecture using 3t+2 basic cells has regularity and scalability since it uses only one processing element. The RS decoder has been synthesized using the 0.25${\mu}{\textrm}{m}$. Faraday CMOS standard cell library and operates at 200MHz and its data rate suppots up to 1.6Gbps. For tile (255, 239, 8) RS code, the gate counts of the DCME architecture and the whole RS decoder excluding FIFO memory are only 21,760 and 42,213, respectively. The proposed RS decoder can reduce the total fate count at least 23% and the total latency at least 10% compared with conventional ME architectures.