• Title/Summary/Keyword: Systolic Array

Search Result 144, Processing Time 0.034 seconds

Efficient Semi-systolic Montgomery multiplier over GF(2m)

  • Keewon, Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.2
    • /
    • pp.69-75
    • /
    • 2023
  • Finite field arithmetic operations play an important role in a variety of applications, including modern cryptography and error correction codes. In this paper, we propose an efficient multiplication algorithm over finite fields using the Montgomery multiplication algorithm. Existing multipliers can be implemented using AND and XOR gates, but in order to reduce time and space complexity, we propose an algorithm using NAND and NOR gates. Also, based on the proposed algorithm, an efficient semi-systolic finite field multiplier with low space and low latency is proposed. The proposed multiplier has a lower area-time complexity than the existing multipliers. Compared to existing structures, the proposed multiplier over finite fields reduces space-time complexity by about 71%, 66%, and 33% compared to the multipliers of Chiou et al., Huang et al., and Kim-Jeon. As a result, our multiplier is proper for VLSI and can be successfully implemented as an essential module for various applications.

Design of Systolic Multipliers in GF(2$^{m}$ ) Using an Irreducible All One Polynomial (기약 All One Polynomial을 이용한 유한체 GF(2$^{m}$ )상의 시스톨릭 곱셈기 설계)

  • Gwon, Sun Hak;Kim, Chang Hun;Hong, Chun Pyo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8C
    • /
    • pp.1047-1054
    • /
    • 2004
  • In this paper, we present two systolic arrays for computing multiplications in CF(2$\^$m/) generated by an irreducible all one polynomial (AOP). The proposed two systolic mays have parallel-in parallel-out structure. The first systolic multiplier has area complexity of O(㎡) and time complexity of O(1). In other words, the multiplier consists of m(m+1)/2 identical cells and produces multiplication results at a rate of one every 1 clock cycle, after an initial delay of m/2+1 cycles. Compared with the previously proposed related multiplier using AOP, our design has 12 percent reduced hardware complexity and 50 percent reduced computation delay time. The other systolic multiplier, designed for cryptographic applications, has area complexity of O(m) and time complexity of O(m), i.e., it is composed of m+1 identical cells and produces multiplication results at a rate of one every m/2+1 clock cycles. Compared with other linear systolic multipliers, we find that our design has at least 43 percent reduced hardware complexity, 83 percent reduced computation delay time, and has twice higher throughput rate Furthermore, since the proposed two architectures have a high regularity and modularity, they are well suited to VLSI implementations. Therefore, when the proposed architectures are used for GF(2$\^$m/) applications, one can achieve maximum throughput performance with least hardware requirements.

Remote speech recognition preprocessing system for intelligent robot in noisy environment (지능로봇에 적합한 잡음 환경에서의 원거리 음성인식 전처리 시스템)

  • Gwon, Se-Do;Jeong, Hong
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.365-366
    • /
    • 2006
  • This paper describes a pre-processing methodology which can apply to remote speech recognition system of service robot in noisy environment. By combining beamforming and blind source separation, we can overcome the weakness of beamforming (reverberation) and blind source separation (distributed noise, permutation ambiguity). As this method is designed to be implemented with hardware, we can achieve real-time execution with FPGA by using systolic array architecture.

  • PDF

VLSI Design of SOVA Decoder for Turbo Decoder (터보복호기를 위한 SOVA 복호기의 설계)

  • Kim, Ki-Bo;Kim, Jong-Tae
    • Proceedings of the KIEE Conference
    • /
    • 2000.07d
    • /
    • pp.3157-3159
    • /
    • 2000
  • Soft Output Viterbi Algorithm is modification of Viterbi algorithm to deliver not only the decoded codewords but also a posteriori probability for each bit. This paper presents SOVA decoder which can be used for component decoder of turbo decoder. We used two-step SMU architectures combined with systolic array traceback methods to reduce the complexity of the design. We followed the specification of CDMA2000 system for SOVA decoder design.

  • PDF

Design of Adaptive Filter for Muscle Response Suppression and FPGA Implementation (근 반응제거를 위한 적응필터 설계와 FPGA 구현)

  • 염호준;박영철;윤형로
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.52 no.12
    • /
    • pp.708-716
    • /
    • 2003
  • The surface EMG signal detected from voluntarily activated muscles can be used as a control signal for functional electrical stimulation. To use the voluntary EMG signal, it is necessary to eliminate the muscle response evoked by the electrical stimulation and enable to process the algorithm in real time. In this paper, we propose the Gram-Schmidt(GS) algorithm and implement it in FPGA(field programmable gate array). GS algorithm is efficient to eliminate periodic signals like muscle response, and is more stable and suitable to FPGA implementations than the conventional least-square approach, due to the systolic array structure.

A Full- Search Block-Matching Algorithm With Early Retirement of Processing Elements (단위 처리기를 조기 은퇴시키는 완전탐색 블록정합 알고리듬)

  • 남기철;채수익
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.11
    • /
    • pp.1417-1423
    • /
    • 1995
  • In this paper, we propose a full-search block-matching algorithm with early retirement, which can be applied to a 1-D systolic array of processing elements (PE's) for fast motion estimation. In the proposed algorithm, a PE is retired when its current accumulated sum is equal to or larger than the current minimum MAD. If all PE's are retired, the MAD calculation is stopped for the current array position and is started for the next one in the search window. Simulation results show that the optimum motion vector is always found with less computation, the total computation cycles for motion estimation are decreased to about 60%, and the power dissipation in the PE's is reduced to about 40-60%.

  • PDF

Design of an Area-Efficient Reed-Solomon Decoder using Pipelined Recursive Technique (파이프라인 재귀적인 기술을 이용한 면적 효율적인 Reed-Solomon 복호기의 설계)

  • Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.7 s.337
    • /
    • pp.27-36
    • /
    • 2005
  • This paper presents an area-efficient architecture to implement the high-speed Reed-Solomon(RS) decoder, which is used in a variety of communication systems such as wireless and very high-speed optical communications. We present the new pipelined-recursive Modified Euclidean(PrME) architecture to achieve high-throughput rate and reducing hardware-complexity using folding technique. The proposed pipelined recursive architecture can reduce the hardware complexity about 80$\%$ compared to the conventional systolic-array and fully-parallel architecture. The proposed RS decoder has been designed and implemented with the 0.13um CMOS technology in a supply voltage of 1.2 V. The result show that total number of gate is 393 K and it has a data processing rate of S Gbits/s at clock frequency of 625 MHz. The proposed area-efficient architecture can be readily applied to the next generation FEC devices for high-speed optical communications as well as wireless communications.

High-Performance Givens Rotation-based QR Decomposition Architecture Applicable for MIMO Receiver (MIMO 수신기에 적용 가능한 고성능 기븐스 회전 기반의 QR 분해 하드웨어 구조)

  • Yoon, Ji-Hwan;Lee, Min-Woo;Park, Jong-Sun
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.49 no.3
    • /
    • pp.31-37
    • /
    • 2012
  • This paper presents an efficient hardware architecture to enable the high-speed Givens rotation-based QR decomposition. The proposed architecture achieves a highly parallel givens rotation process by maximizing the number of pivots selected for parallel zero-insertions. Sign-select lookahed (SSL)-CORDIC is also efficiently used for the high-speed givens rotation. The performance of QR decomposition hardware considerably increases compared to the conventional triangular systolic array (TSA) architecture. Moreover, the circuit area of QR decomposition hardware was reduced by decreasing the number of flip-flops for holding the pre-computed results during the decomposition process. The proposed QR decomposition hardware was implemented using TSMC $0.25{\mu}m$ technology. The experimental results show that the proposed architecture achieves up to 70 % speed-up over the TACR/TSA-based architecture for the $8{\times}8$ matrix decomposition.

New Enhanced Degree Computationless Modified Euclid's Algorithm and its Architecture for Reed-Solomon decoders (Reed-Solomon 복호기를 위한 새로운 E-DCME 알고리즘 및 하드웨어 구조)

  • Baek, Jae-Hyun;SunWoo, Myung-Hoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.8A
    • /
    • pp.820-826
    • /
    • 2007
  • This paper proposes an enhanced degree computationless modified Euclid's(E-DCME) algorithm and its architecture for Reed-Solomon decoders. The proposed E-DCME algorithm has shorter critical path delay that is $T_{mult}+T_{add}+T_{mux}$ compared with the existing modified Euclid's algorithm and the degree computationless modified Euclid's(DCME) algorithm since it uses new initial conditions. The proposed E-DCME architecture employing a systolic array requires only 2t-1 clock cycles to solve the key equation without initial latency. In addition, the E-DCME architecture consisting of 3t basic cells has regularity and scalability since it uses only one processing element. The E-DCME architecture using the $0.18{\mu}m$ Samsung standard cell library consists of 18,000 gates.

A Systolic Array Structured Decision Feedback Equalizer based on Extended QR-RLS Algorithm (확장 QR-RLS 알고리즘을 이용한 시스토릭 어레이 구조의 결정 궤환 등화기)

  • Lee Won Cheol
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.11C
    • /
    • pp.1518-1526
    • /
    • 2004
  • In this paper, an algorithm using wavelet transform for detecting a cut that is a radical scene transition point, and fade and dissolve that are gradual scene transition points is proposed. The conventional methods using wavelet transform for this purpose is using features in both spatial and frequency domain. But in the proposed algorithm, the color space of an input image is converted to YUV and then luminance component Y is transformed in frequency domain using 2-level lifting. Then, the histogram of only low frequency subband that may contain some spatial domain features is compared with the previous one. Edges obtained from other higher bands can be divided into global, semi-global and local regions and the histogram of each edge region is compared. The experimental results show the performance improvement of about 17% in recall and 18% in precision and also show a good performance in fade and dissolve detection.