• Title/Summary/Keyword: clock cycles

Search Result 148, Processing Time 0.024 seconds

High Performance Hardware Implementation of the 128-bit SEED Cryptography Algorithm (128비트 SEED 암호 알고리즘의 고속처리를 위한 하드웨어 구현)

  • 전신우;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.11 no.1
    • /
    • pp.13-23
    • /
    • 2001
  • This paper implemented into hardware SEED which is the KOREA standard 128-bit block cipher. First, at the respect of hardware implementation, we compared and analyzed SEED with AES finalist algorithms - MARS, RC6, RIJNDAEL, SERPENT, TWOFISH, which are secret key block encryption algorithms. The encryption of SEED is faster than MARS, RC6, TWOFISH, but is as five times slow as RIJNDAEL which is the fastest. We propose a SEED hardware architecture which improves the encryption speed. We divided one round into three parts, J1 function block, J2 function block J3 function block including key mixing block, because SEED repeatedly executes the same operation 16 times, then we pipelined one round into three parts, J1 function block, J2 function block, J3 function block including key mixing block, because SEED repeatedly executes the same operation 16 times, then we pipelined it to make it more faster. G-function is implemented more easily by xoring four extended 4 byte SS-boxes. We tested it using ALTERA FPGA with Verilog HDL. If the design is synthesized with 0.5 um Samsung standard cell library, encryption of ECB and decryption of ECB, CBC, CFB, which can be pipelined would take 50 clock cycles to encrypt 384-bit plaintext, and hence we have 745.6 Mbps assuming 97.1 MHz clock frequency. Encryption of CBC, OFB, CFB and decryption of OFB, which cannot be pipelined have 258.9 Mbps under same condition.

Design of Bit Manipulation Accelerator fo Communication DSP (통신용 DSP를 위한 비트 조작 연산 가속기의 설계)

  • Jeong Sug H.;Sunwoo Myung H.
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.8 s.338
    • /
    • pp.11-16
    • /
    • 2005
  • This paper proposes a bit manipulation accelerator (BMA) having application specific instructions, which efficiently supports scrambling, convolutional encoding, puncturing, and interleaving. Conventional DSPs cannot effectively perform bit manipulation functions since かey have multiply accumulate (MAC) oriented data paths and word-based functions. However, the proposed accelerator can efficiently process bit manipulation functions using parallel shift and Exclusive-OR (XOR) operations and bit jnsertion/extraction operations on multiple data. The proposed BMA has been modeled by VHDL and synthesized using the SEC $0.18\mu m$ standard cell library and the gate count of the BMA is only about 1,700 gates. Performance comparisons show that the number of clock cycles can be reduced about $40\%\sim80\%$ for scrambling, convolutional encoding and interleaving compared with existing DSPs.

New Division Circuit for GF(2m) Applications (유한체 GF(2m)의 응용을 위한 새로운 나눗셈 회로)

  • Kim Chang Hoon;Lee Nam Gon;Kwon Soonhak;Hong Chun Pyo
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.235-242
    • /
    • 2005
  • In this paper, we propose a new division circuit for $GF(2^m)$ applications. The proposed division circuit is based on a modified the binary GCD algorithm and produce division results at a rate of one per 2m-1 clock cycles. Analysis shows that the proposed circuit gives $47\%$ and $20\%$ improvements in terms of speed and hardware respectively. In addition, since the proposed circuit does not restrict the choice of irreducible polynomials and has regularity and modularity, it provides a high flexibility and scalability with respect to the field size m. Thus, the proposed divider. is well suited to low-area $GF(2^m)$ applications.

A Digit Serial Multiplier Over GF(2m)Based on the MSD-first Algorithm (GF(2m)상의 MSD 우선 알고리즘 기반 디지트-시리얼 곱셈기)

  • Kim, Chang-Hoon;Kim, Soon-Cheol
    • The KIPS Transactions:PartA
    • /
    • v.15A no.3
    • /
    • pp.161-166
    • /
    • 2008
  • In this paper, an efficient digit-serial systolic array is proposed for multiplication in finite field GF($2^m$) using the polynomial basis representation. The proposed systolic array is based on the most significant digit first (MSD-first) multiplication algorithm and produces multiplication results at a rate of one every "m/D" clock cycles, where D is the selected digit size. Since the inner structure of the proposed multiplier is tree-type, critical path increases logarithmically proportional to D. Therefore, the computation delay of the proposed architecture is significantly less than previously proposed digit-serial systolic multipliers whose critical path increases proportional to D. Furthermore, since the new architecture has the features of a high regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementation.

Design and Verification of Pipelined Face Detection Hardware (파이프라인 구조의 얼굴 검출 하드웨어 설계 및 검증)

  • Kim, Shin-Ho;Jeong, Yong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.10
    • /
    • pp.1247-1256
    • /
    • 2012
  • There are many filter based image processing algorithms and they usually require a huge amount of computations and memory accesses making it hard to attain a real-time performance, expecially in embedded applications. In this paper, we propose a pipelined hardware structure of the filter based face detection algorithm to show that the real time performance can be achieved by hardware design. In our design, the whole computation is divided into three pipeline stages: resizing the image (Resize), Transforming the image (ICT), and finding candidate area (Find Candidate). Each stage is optimized by considering the parallelism of the computation to reduce the number of cycles and utilizing the line memory to minimize the memory accesses. The resulting hardware uses 507 KB internal SRAM and occupies 9,039 LUTs when synthesized and configured on Xilinx Virtex5LX330 FPGA. It can operate at maximum 165MHz clock, giving the performance of 108 frame/sec, while detecting up to 20 faces.

A 200-MHZ@2.5-V Dual-Mode Multiplier for Single / Double -Precision Multiplications (단정도/배정도 승산을 위한 200-MHZ@2.5-V 이중 모드 승산기)

  • 이종남;박종화;신경욱
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.5
    • /
    • pp.1143-1150
    • /
    • 2000
  • A dual-mode multiplier (DMM) that performs single- and double-precision multiplications has been designed using a $0.25-\mum$ 5-metal CMOS technology. An algorithm for efficiently implementing double-precision multiplication with a single-precision multiplier was proposed, which is based on partitioning double-precision multiplication into four single-precision sub-multiplications and computing them with sequential accumulations. When compared with conventional double-precision multipliers, our approach reduces the hardware complexity by about one third resulting in small silicon area and low-power dissipation at the expense of increased latency and throughput cycles. The DMM consists of a $28-b\times28-b$ single-precision multiplier designed using radix-4 Booth receding and redundant binary (RB) arithmetic, an accumulator and a simple control logic for mode selection. It contains about 25,000 transistors on the area of about $0.77\times0.40-m^2$. The HSPICE simulation results show that the DMM core can safely operate with 200-MHZ clock at 2.5-V, and its estimated power dissipation is about 130-㎽ at double-precision mode.

  • PDF

Efficient VLSI Architectures for the Two-Dimensional Discrete Wavelet Transform (2차원 이산 웨이브렛 변환을 위한 효율적인 VLSI 구조)

  • Pan, Sung-Bum;Park, Rae-Hong;Jee, Yong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.37 no.1
    • /
    • pp.59-68
    • /
    • 2000
  • This paper proposes efficient VLSI architectures for computation of the 2- D discrete wavelet transform (DWT). The two proposed VLSI architectures for the 2- D DWT are constructed based on block-based computation Each $M{\times}N$ ($N{\times}M$) block DWT is performed along the row (column) direction simultaneously, where M and N denote the number of filter taps and the number of columns (rows), respectively The proposed architectures compute the lowpass and highpass output sequences of the 1 - DWT along the row and column directions using a single architecture In alternate clock cycles Therefore the extra processing units required for the proposed architectures are much smaller than those of the conventional architectures They are modeled In very high speed Integrated circuit hardware description language (HIDL) and Simulated to show their functional validity.

  • PDF

Hardware Design of Arccosine Function for Mobile Vector Graphics Processor (모바일 벡터 그래픽 프로세서용 역코사인 함수의 하드웨어 설계)

  • Choi, Byeong-Yoon;Lee, Jong-Hyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.4
    • /
    • pp.727-736
    • /
    • 2009
  • In this paper, the $arccos(cos^{-1})$ arithmetic unit for mobile graphics accelerator is designed. The mobile vector graphics applications need tight area, execution time, power dissipation, and accuracy constraints compared to desktop PC applications. The designed processor adopts 2nd-order polynomial approximation scheme based on IEEE floating point data format to satisfy speed and accuracy conditions and reduces area via hardware sharing structure. The arccosine processor consists of 15,280 gates and its estimated operating frequency is about 125Mhz at operating condition of $0.35{\mu}m$ CMOS technology. Because the processor can execute arccosine function within 7 clock cycles, it has about 17 MOPS(million arccos operations per second) execution rate and can be applicable to mobile OpenVG processor. And because of its flexible architecture, it can be applicable to the various transcendental functions such as exponential, trigonometric and logarithmic functions via replacement of ROM and minor hardware modification.

Implementation of a LSB-First Digit-Serial Multiplier for Finite Fields GF(2m) (유한 필드 GF(2m)상에서의 LSB 우선 디지트 시리얼 곱셈기 구현)

  • Kim, Chang-Hun;Hong, Chun-Pyo;U, Jong-Jeong
    • The KIPS Transactions:PartA
    • /
    • v.9A no.3
    • /
    • pp.281-286
    • /
    • 2002
  • In this paper we, implement LSB-first digit-serial systolic multiplier for computing modular multiplication $A({\times})B$mod G ({\times})in finite fields GF $(2^m)$. If input data come in continuously, the implemented multiplier can produce multiplication results at a rate of one every [m/L] clock cycles, where L is the selected digit size. The analysis results show that the proposed architecture leads to a reduction of computational delay time and it has more simple structure than existing digit-serial systolic multiplier. Furthermore, since the propose architecture has the features of regularity, modularity, and unidirectional data flow, it shows good extension characteristics with respect to m and L.

Full-Search Block-Matching Motion Estimation Circuit with Hybrid Architecture for MPEG-4 Encoder (하이브리드 구조를 갖는 MPEG-4 인코더용 전역 탐색 블록 정합 움직임 추정 회로)

  • Shim, Jae-Oh;Lee, Seon-Young;Cho, Kyeong-Soon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.2
    • /
    • pp.85-92
    • /
    • 2009
  • This paper proposes a full-search block-matching motion estimation circuit with hybrid architecture combining systolic arrays and adder trees for an MPEG-4 encoder. The proposed circuit uses systolic arrays for motion estimation with a small number of clock cycles and adder trees to reduce required circuit resources. The interpolation circuit for 1/2 pixel motion estimation consists of six adders, four subtracters and ten registers. We improved the circuit performance by resource sharing and efficient scheduling techniques. We described the motion estimation circuit for integer and 1/2 pixels at RTL in Verilog HDL. The logic-level circuit synthesized by using 130nm standard cell library contains 218,257 gates and can process 94 D1($720{\times}480$) image frames per second.