• Title/Summary/Keyword: clock cycles

Search Result 148, Processing Time 0.023 seconds

A design of compact and high-performance AES processor using composite field based S-Box and hardware sharing (합성체 기반의 S-Box와 하드웨어 공유를 이용한 저면적/고성능 AES 프로세서 설계)

  • Yang, Hyun-Chang;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.8
    • /
    • pp.67-74
    • /
    • 2008
  • A compact and high-performance AES(Advanced Encryption Standard) encryption/decryption processor is designed by applying various hardware sharing and optimization techniques. In order to achieve minimized hardware complexity, sharing the S-Boxes for round transformation with the key scheduler, as well as merging and reusing datapaths for encryption and decryption are utilized, thus the area of S-Boxes is reduced by 25%. Also, the S-Boxes which require the largest hardware in AES processor is designed by applying composite field arithmetic on $GF(((2^2)^2)^2)$, thus it further reduces the area of S-Boxes when compared to the design based on $GF(2^8)$ or $GF((2^4)^2)$. By optimizing the operation of the 64-bit round transformation and round key scheduling, the round transformation is processed in 3 clock cycles and an encryption of 128-bit data block is performed in 31 clock cycles. The designed AES processor has about 15,870 gates, and the estimated throughput is 412.9 Mbps at 100 MHz clock frequency.

A Public-key Cryptography Processor supporting P-224 ECC and 2048-bit RSA (P-224 ECC와 2048-비트 RSA를 지원하는 공개키 암호 프로세서)

  • Sung, Byung-Yoon;Lee, Sang-Hyun;Shin, Kyung-Wook
    • Journal of IKEEE
    • /
    • v.22 no.3
    • /
    • pp.522-531
    • /
    • 2018
  • A public-key cryptography processor EC-RSA was designed, which integrates a 224-bit prime field elliptic curve cryptography (ECC) defined in the FIPS 186-2 as well as RSA with 2048-bit key length into a single hardware structure. A finite field arithmetic core used in both scalar multiplication for ECC and exponentiation for RSA was designed with 32-bit data-path. A lightweight implementation was achieved by an efficient hardware sharing of the finite field arithmetic core and internal memory for ECC and RSA operations. The EC-RSA processor was verified by FPGA implementation. It occupied 11,779 gate equivalents (GEs) and 14 kbit RAM synthesized with a 180-nm CMOS cell library and the estimated maximum clock frequency was 133 MHz. It takes 867,746 clock cycles for ECC scalar multiplication resulting in the estimated throughput of 34.3 kbps, and takes 26,149,013 clock cycles for RSA decryption resulting in the estimated throughput of 10.4 kbps.

2,048 bits RSA public-key cryptography processor based on 32-bit Montgomery modular multiplier (32-비트 몽고메리 모듈러 곱셈기 기반의 2,048 비트 RSA 공개키 암호 프로세서)

  • Cho, Wook-Lae;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.8
    • /
    • pp.1471-1479
    • /
    • 2017
  • This paper describes a design of RSA public-key cryptography processor supporting key length of 2,048 bits. A modular multiplier that is core arithmetic function in RSA cryptography was designed using word-based Montgomery multiplication algorithm, and a modular exponentiation was implemented by using Left-to-Right (LR) binary exponentiation algorithm. A computation of a modular multiplication takes 8,386 clock cycles, and RSA encryption and decryption requires 185,724 and 25,561,076 clock cycles, respectively. The RSA processor was verified by FPGA implementation using Virtex5 device. The RSA cryptographic processor synthesized with 100 MHz clock frequency using a 0.18 um CMOS cell library occupies 12,540 gate equivalents (GEs) and 12 kbits memory. It was estimated that the RSA processor can operate up to 165 MHz, and the estimated time for RSA encryption and decryption operations are 1.12 ms and 154.91 ms, respectively.

Photoreception for Photoperiodism and Circadian Rhythms in the Blow Fly

  • Shiga, Sakiko;Numata, Hideharu
    • Journal of Photoscience
    • /
    • v.9 no.2
    • /
    • pp.13-16
    • /
    • 2002
  • A comparison of the functional components underlying photoperiodism and circadian rhythmicity in the same species is an interesting issue in the context of unravelling clock mechanisms. In the present study, covering or surgical removal of the compound eyes was performed to localize photoreceptors for photoperiodism to control reproductive diapause and for entrainment of circadian locomotor rhythms in the blow fly Protophormia terraenovae. Intact flies showed a long-day photoperiodic response. When the compound eyes were covered by silver paint, diapause incidence increased under diapause-averting conditions of a long-day photoperiod and constant light, as if flies were kept under constant darkness. Covering of a medial region of the head capsule or solvent painting of the compound eyes gave no significant effects. When the compound eyes were removed, flies did not distinguish the photoperiod, whereas removal of antennal lobes or ocelli did not affect the photoperiodism. Intact flies showed a freerunning rhythm under constant darkness. The rhythm entrained to light-dark (LD) cycles with light of high and low intensity. When the compound eyes and ocelli were surgically removed, the rhythm entrained to LD cycles with light of high intensity but freeran under LD cycles with light of low intensity. The results suggest the retinal pathways are involved in photoperiodism and that flies use both retinal and extraretinal pathways for rhythm entrainment. Under dim light-LD cycles, the retinal pathways mainly mediate rhythm entrainment. Retinal photoreceptors seem to be used both for photoperiodism and entrainment of the rhythm.

  • PDF

Design of inversion and division circuit over GF($2^{m}$) (유한체 $GF(2^{m})$상의 역원계산 회로 및 나눗셈 회로 설계)

  • 조용석;박상규
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.5
    • /
    • pp.1160-1164
    • /
    • 1998
  • In this paper, we propose a new algorithm for computing multiplicative inverses in $GF(2^{m})$ and design an inversion circuit and a division circuit using this algorithm. The algorithm used is based on Fermat's theorem. It takes around m/2 clock cycles. The hardware requirements of the inversion circuit and the division circuit using this algorithm are the same as traditional circuits except for the addition of multiplexers.

  • PDF

Performance Analysis of Multibuffered Multistage Interconnection Networks using Small Clock Cycle Scheme (작은 클럭 주기를 이용한 복수버퍼를 가지는 다단 상호연결 네트워크의 해석적 성능분석)

  • Mun, Young-Song
    • Journal of Internet Computing and Services
    • /
    • v.6 no.4
    • /
    • pp.141-147
    • /
    • 2005
  • Ding and Bhuyan, however, has shown that the performance of multistage interconnection networks(MIN's) can be significantly improved if the packet movements are confined within each pair of adjacent stages using small clock cycles. In this paper, an effective model for estimating the performance of multibuffered MIN's employing the approach is proposed. the relative effectiveness of the proposed model is identified compared to the traditional design.

  • PDF

Verification of System using Master-Slave Structure (Master-Slave 기법을 적용한 System Operation의 동작 검증)

  • Kim, In-Soo;Min, Hyoung-Bok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.1
    • /
    • pp.199-202
    • /
    • 2009
  • Scan design is currently the most widely used structured Design For Testability approach. In scan design, all storage elements are replaced with scan cells, which are then configured as one or more shift registers(also called scan chains) during the shift operation. As a result, all inputs to the combinational logic, including those driven by scan cells, can be controlled and all outputs from the combinational logic, including those driving scan cells, can be observed. The scan inserted design, called scan design, is operated in three modes: normal mode, shift mode, and capture mode. Circuit operations with associated clock cycles conducted in these three modes are referred to as normal operation, shift operation, and capture operation, respectively. In spite of these, scan design methodology has defects. They are power dissipation problem and test time during test application. We propose a new methodology about scan shift clock operation and present low power scan design and short test time.

Implemenation of an ASIP for acceleration SAD operation (SAD 연산의 가속을 위한 멀티미디어 코프로세서 구현)

  • Jo, Jung-Hyun;Jeong, Ha-Young
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.809-810
    • /
    • 2006
  • An H.264 algorithm is commonly used for video compression applications. This algorithm requires a large number of data computations, for example, the sum of absolute difference (SAD) operation. We analyzed H.264 reference encoding workloads. The H.264 encoding program has 8.78% SAD operation. The SAD operation is to sum up 16 difference-values in H.264 $4{\times}4$ sub-blocks. In order to accelerate SAD operations, we implemented an application specific instruction-set processor (ASIP) that can execute SAD and data transfer instructions. The proposed coprocessor has an absolute value generator and a carry save adder (CSA) unit to sum up 8 difference-values per one clock cycle. We completed SAD operation in 2 clock cycles. Experimental results show that the performance is improved by 34% of total execution time.

  • PDF

Register Controlled Delay-locked Loop using Delay Monitor Scheme (Delay Monitor Scheme을 사용한 Register Controlled Delay-locked Loop)

  • 이광희;노주영;손상희
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.17 no.2
    • /
    • pp.144-149
    • /
    • 2004
  • Register Controlled DLL with fast locking and low-power consumption, is described in this paper. Delay monitor scheme is proposed to achieve the fast locking and inverter is inserted in front of delay line to reduce the power consumption, also. Proposed DLL was fabricated in a 0.6${\mu}{\textrm}{m}$ 1-poly 3-metal CMOS technology. The proposed delay monitor scheme enables the DLL to lock to the external clock within 4 cycles. The power consumption is 36㎽ with 3V supply voltage at 34MHz clock frequency.

Virtual Prototyping of Area-Based Fast Image Stitching Algorithm

  • Mudragada, Lakshmi Kalyani;Lee, Kye-Shin;Kim, Byung-Gyu
    • Journal of Multimedia Information System
    • /
    • v.6 no.1
    • /
    • pp.7-14
    • /
    • 2019
  • This work presents a virtual prototyping design approach for an area-based image stitching hardware. The virtual hardware obtained from virtual prototyping is equivalent to the conceptual algorithm, yet the conceptual blocks are linked to the actual circuit components including the memory, logic gates, and arithmetic units. Through the proposed method, the overall structure, size, and computation speed of the actual hardware can be estimated in the early design stage. As a result, the optimized virtual hardware facilitates the hardware implementation by eliminating trail design and redundant simulation steps to optimize the hardware performance. In order to verify the feasibility of the proposed method, the virtual hardware of an image stitching platform has been realized, where it required 10,522,368 clock cycles to stitch two $1280{\times}1024$ sized images. Furthermore, with a clock frequency of 250MHz, the estimated computation time of the proposed virtual hardware is 0.877sec, which is 10x faster than the software-based image stitch platform using MATLAB.