• Title/Summary/Keyword: Standard cell library

Search Result 196, Processing Time 0.028 seconds

A Lightweight Hardware Accelerator for Public-Key Cryptography (공개키 암호 구현을 위한 경량 하드웨어 가속기)

  • Sung, Byung-Yoon;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1609-1617
    • /
    • 2019
  • Described in this paper is a design of hardware accelerator for implementing public-key cryptographic protocols (PKCPs) based on Elliptic Curve Cryptography (ECC) and RSA. It supports five elliptic curves (ECs) over GF(p) and three key lengths of RSA that are defined by NIST standard. It was designed to support four point operations over ECs and six modular arithmetic operations, making it suitable for hardware implementation of ECC- and RSA-based PKCPs. In order to achieve small-area implementation, a finite field arithmetic circuit was designed with 32-bit data-path, and it adopted word-based Montgomery multiplication algorithm, the Jacobian coordinate system for EC point operations, and the Fermat's little theorem for modular multiplicative inverse. The hardware operation was verified with FPGA device by implementing EC-DH key exchange protocol and RSA operations. It occupied 20,800 gate equivalents and 28 kbits of RAM at 50 MHz clock frequency with 180-nm CMOS cell library, and 1,503 slices and 2 BRAMs in Virtex-5 FPGA device.

Design of a High-Speed Data Packet Allocation Circuit for Network-on-Chip (NoC 용 고속 데이터 패킷 할당 회로 설계)

  • Kim, Jeonghyun;Lee, Jaesung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.459-461
    • /
    • 2022
  • One of the big differences between Network-on-Chip (NoC) and the existing parallel processing system based on an off-chip network is that data packet routing is performed using a centralized control scheme. In such an environment, the best-effort packet routing problem becomes a real-time assignment problem in which data packet arriving time and processing time is the cost. In this paper, the Hungarian algorithm, a representative computational complexity reduction algorithm for the linear algebraic equation of the allocation problem, is implemented in the form of a hardware accelerator. As a result of logic synthesis using the TSMC 0.18um standard cell library, the area of the circuit designed through case analysis for the cost distribution is reduced by about 16% and the propagation delay of it is reduced by about 52%, compared to the circuit implementing the original operation sequence of the Hungarian algorithm.

  • PDF

A Unified ARIA-AES Cryptographic Processor Supporting Four Modes of Operation and 128/256-bit Key Lengths (4가지 운영모드와 128/256-비트 키 길이를 지원하는 ARIA-AES 통합 암호 프로세서)

  • Kim, Ki-Bbeum;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.4
    • /
    • pp.795-803
    • /
    • 2017
  • This paper describes a dual-standard cryptographic processor that efficiently integrates two block ciphers ARIA and AES into a unified hardware. The ARIA-AES crypto-processor was designed to support 128-b and 256-b key sizes, as well as four modes of operation including ECB, CBC, OFB, and CTR. Based on the common characteristics of ARIA and AES algorithms, our design was optimized by sharing hardware resources in substitution layer and in diffusion layer. It has on-the-fly key scheduler to process consecutive blocks of plaintext/ciphertext without reloading key. The ARIA-AES crypto-processor that was implemented with a $0.18{\mu}m$ CMOS cell library occupies 54,658 gate equivalents (GEs), and it can operate up to 95 MHz clock frequency. The estimated throughputs at 80 MHz clock frequency are 787 Mbps, 602 Mbps for ARIA with key size of 128-b, 256-b, respectively. In AES mode, it has throughputs of 930 Mbps, 682 Mbps for key size of 128-b, 256-b, respectively. The dual-standard crypto-processor was verified by FPGA implementation using Virtex5 device.

LDPC Decoder for WiMAX/WLAN using Improved Normalized Min-Sum Algorithm (개선된 정규화 최소합 알고리듬을 적용한 WiMAX/WLAN용 LDPC 복호기)

  • Seo, Jin-Ho;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.4
    • /
    • pp.876-884
    • /
    • 2014
  • A hardware design of LDPC decoder which is based on the improved normalized min-sum(INMS) decoding algorithm is described in this paper. The designed LDPC decoder supports 19 block lengths(576~2304) and 6 code rates(1/2, 2/3A, 2/3B, 3/4A, 3/4B, 5/6) of IEEE 802.16e mobile WiMAX standard and 3 block lengths(648, 1296, 1944) and 4 code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n WLAN standard. The decoding function unit(DFU) which is a main arithmetic block is implemented using sign-magnitude(SM) arithmetic and INMS decoding algorithm to optimize hardware complexity and decoding performance. The LDPC decoder synthesized using a 0.18-${\mu}m$ CMOS cell library with 100 MHz clock has 284,409 gates and RAM of 62,976 bits, and it is verified by FPGA implementation. The estimated performance depending on code rate and block length is about 82~218 Mbps at 100 MHz@1.8V.

High Performance Hardware Implementation of the 128-bit SEED Cryptography Algorithm (128비트 SEED 암호 알고리즘의 고속처리를 위한 하드웨어 구현)

  • 전신우;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.11 no.1
    • /
    • pp.13-23
    • /
    • 2001
  • This paper implemented into hardware SEED which is the KOREA standard 128-bit block cipher. First, at the respect of hardware implementation, we compared and analyzed SEED with AES finalist algorithms - MARS, RC6, RIJNDAEL, SERPENT, TWOFISH, which are secret key block encryption algorithms. The encryption of SEED is faster than MARS, RC6, TWOFISH, but is as five times slow as RIJNDAEL which is the fastest. We propose a SEED hardware architecture which improves the encryption speed. We divided one round into three parts, J1 function block, J2 function block J3 function block including key mixing block, because SEED repeatedly executes the same operation 16 times, then we pipelined one round into three parts, J1 function block, J2 function block, J3 function block including key mixing block, because SEED repeatedly executes the same operation 16 times, then we pipelined it to make it more faster. G-function is implemented more easily by xoring four extended 4 byte SS-boxes. We tested it using ALTERA FPGA with Verilog HDL. If the design is synthesized with 0.5 um Samsung standard cell library, encryption of ECB and decryption of ECB, CBC, CFB, which can be pipelined would take 50 clock cycles to encrypt 384-bit plaintext, and hence we have 745.6 Mbps assuming 97.1 MHz clock frequency. Encryption of CBC, OFB, CFB and decryption of OFB, which cannot be pipelined have 258.9 Mbps under same condition.

Implementation of WLAN Baseband Processor Based on Space-Frequency OFDM Transmit Diversity Scheme (공간-주파수 OFDM 전송 다이버시티 기법 기반 무선 LAN 기저대역 프로세서의 구현)

  • Jung Yunho;Noh Seungpyo;Yoon Hongil;Kim Jaeseok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.5 s.335
    • /
    • pp.55-62
    • /
    • 2005
  • In this paper, we propose an efficient symbol detection algorithm for space-frequency OFDM (SF-OFDM) transmit diversity scheme and present the implementation results of the SF-OFDM WLAN baseband processor with the proposed algorithm. When the number of sub-carriers in SF-OFDM scheme is small, the interference between adjacent sub-carriers may be generated. The proposed algorithm eliminates this interference in a parallel manner and obtains a considerable performance improvement over the conventional detection algorithm. The bit error rate (BER) performance of the proposed detection algorithm is evaluated by the simulation. In the case of 2 transmit and 2 receive antennas, at $BER=10^{-4}$ the proposed algorithm obtains about 3 dB gain over the conventional detection algorithm. The packet error rate (PER), link throughput, and coverage performance of the SF-OFDM WLAN with the proposed detection algorithm are also estimated. For the target throughput at $80\%$ of the peak data rate, the SF-OFDM WLAN achieves the average SNR gain of about 5.95 dB and the average coverage gain of 3.98 meter. The SF-OFDM WLAN baseband processor with the proposed algorithm was designed in a hardware description language and synthesized to gate-level circuits using 0.18um 1.8V CMOS standard cell library. With the division-free architecture, the total logic gate count for the processor is 945K. The real-time operation is verified and evaluated using a FPGA test system.

A Design of PRESENT Crypto-Processor Supporting ECB/CBC/OFB/CTR Modes of Operation and Key Lengths of 80/128-bit (ECB/CBC/OFB/CTR 운영모드와 80/128-비트 키 길이를 지원하는 PRESENT 암호 프로세서 설계)

  • Kim, Ki-Bbeum;Cho, Wook-Lae;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.6
    • /
    • pp.1163-1170
    • /
    • 2016
  • A hardware implementation of ultra-lightweight block cipher algorithm PRESENT which was specified as a standard for lightweight cryptography ISO/IEC 29192-2 is described. The PRESENT crypto-processor supports two key lengths of 80 and 128 bits, as well as four modes of operation including ECB, CBC, OFB, and CTR. The PRESENT crypto-processor has on-the-fly key scheduler with master key register, and it can process consecutive blocks of plaintext/ciphertext without reloading master key. In order to achieve a lightweight implementation, the key scheduler was optimized to share circuits for key lengths of 80 bits and 128 bits. The round block was designed with a data-path of 64 bits, so that one round transformation for encryption/decryption is processed in a clock cycle. The PRESENT crypto-processor was verified using Virtex5 FPGA device. The crypto-processor that was synthesized using a $0.18{\mu}m$ CMOS cell library has 8,100 gate equivalents(GE), and the estimated throughput is about 908 Mbps with a maximum operating clock frequency of 454 MHz.

A Hardware Design of Ultra-Lightweight Block Cipher Algorithm PRESENT for IoT Applications (IoT 응용을 위한 초경량 블록 암호 알고리듬 PRESENT의 하드웨어 설계)

  • Cho, Wook-Lae;Kim, Ki-Bbeum;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.7
    • /
    • pp.1296-1302
    • /
    • 2016
  • A hardware implementation of ultra-lightweight block cipher algorithm PRESENT that was specified as a block cipher standard for lightweight cryptography ISO/IEC 29192-2 is described in this paper. Two types of crypto-core that support master key size of 80-bit are designed, one is for encryption-only function, and the other is for encryption and decryption functions. The designed PR80 crypto-cores implement the basic cipher mode of operation ECB (electronic code book), and it can process consecutive blocks of plaintext/ciphertext without reloading master key. The PR80 crypto-cores were designed in soft IP with Verilog HDL, and they were verified using Virtex5 FPGA device. The synthesis results using $0.18{\mu}m$ CMOS cell library show that the encryption-only core has 2,990 GE and the encryption/decryption core has 3,687 GE, so they are very suitable for IoT security applications requiring small gate count. The estimated maximum clock frequency is 500 MHz for the encryption-only core and 444 MHz for the encryption/decryption core.

A Design of Fractional Motion Estimation Engine with 4×4 Block Unit of Interpolator & SAD Tree for 8K UHD H.264/AVC Encoder (8K UHD(7680×4320) H.264/AVC 부호화기를 위한 4×4블럭단위 보간 필터 및 SAD트리 기반 부화소 움직임 추정 엔진 설계)

  • Lee, Kyung-Ho;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.6
    • /
    • pp.145-155
    • /
    • 2013
  • In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Fractional Motion Estimation in 8K UHD($7680{\times}4320$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $10{\times}10$ reference data for interpolation, we design 2D cache buffer which consists of the $10{\times}10$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The gate count is 436.5Kgates. The proposed H.264/AVC Fractional Motion Estimation can support 8K UHD at 30 frames per second by running at 187MHz.

Adaptive Pipeline Architecture for an Asynchronous Embedded Processor (비동기식 임베디드 프로세서를 위한 적응형 파이프라인 구조)

  • Lee, Seung-Sook;Lee, Je-Hoon;Lim, Young-Il;Cho, Kyoung-Rok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.1
    • /
    • pp.51-58
    • /
    • 2007
  • This paper presented an adaptive pipeline architecture for a high-performance and low-power asynchronous processor. The proposed pipeline architecture employed a stage-skipping and a stage-combining scheme. The stage-skipping scheme can skip the operation of a bubble stage that is not used pipeline stage in an instruction execution. In the stage-combining scheme, two consecutive stages can be joined to form one stage if the latter stage is empty. The proposed pipeline architecture could reduce the processing time and power consumption. The proposed architecture supports multi-processing in the EX stage that executes parallel 4 instructions. We designed an asynchronous microprocessor to estimate the efficiency of the proposed pipeline architecture that was synthesized to a gate level design using a $0.35-{\mu}m$ CMOS standard cell library. We evaluated the performance of the target processor using SPEC2000 benchmark programs. The proposed architecture showed about 2.3 times higher speed than the asynchronous counterpart, AMULET3i. As a result, the proposed pipeline schemes and architecture can be used for asynchronous high-speed processor design