• Title/Summary/Keyword: clock scheme

Search Result 213, Processing Time 0.028 seconds

A Scalable Word-based RSA Cryptoprocessor with PCI Interface Using Pseudo Carry Look-ahead Adder (가상 캐리 예측 덧셈기와 PCI 인터페이스를 갖는 분할형 워드 기반 RSA 암호 칩의 설계)

  • Gwon, Taek-Won;Choe, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.8
    • /
    • pp.34-41
    • /
    • 2002
  • This paper describes a scalable implementation method of a word-based RSA cryptoprocessor using pseudo carry look-ahead adder The basic organization of the modular multiplier consists of two layers of carry-save adders (CSA) and a reduced carry generation and Propagation scheme called the pseudo carry look-ahead adder for the high-speed final addition. The proposed modular multiplier does not need complicated shift and alignment blocks to generate the next word at each clock cycle. Therefore, the proposed architecture reduces the hardware resources and speeds up the modular computation. We implemented a single-chip 1024-bit RSA cryptoprocessor based on the word-based modular multiplier with 256 datapaths in 0.5${\mu}{\textrm}{m}$ SOG technology after verifying the proposed architectures using FPGA with PCI bus.

A New Complex-Number Multiplication Algorithm using Radix-4 Booth Recoding and RB Arithmetic, and a 10-bit CMAC Core Design (Radix-4 Booth Recoding과 RB 연산을 이용한 새로운 복소수 승산 알고리듬 및 10-bit CMAC코어 설계)

  • 김호하;신경욱
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.9
    • /
    • pp.11-20
    • /
    • 1998
  • High-speed complex-number arithmetic units are essential to baseband signal processing of modern digital communication systems such as channel equalization, timing recovery, modulation and demodulation. In this paper, a new complex-number multiplication algorithm is proposed, which is based on redundant binary (RB) arithmetic combined with radix-4 Booth recoding scheme. The proposed algorithm reduces the number of partial product by one-half as compared with the conventional direct method using real-number multipliers and adders. It also leads to a highly parallel architecture and simplified circuit, resulting in high-speed operation and low power dissipation. To demonstrate the proposed algorithm, a prototype complex-number multiplier-accumulator (CMAC) core with 10-bit operands has been designed using 0.8-$\mu\textrm{m}$ N-Well CMOS technology. The designed CMAC core contains about 18,000 transistors on the area of about 1.60 ${\times}$ 1.93 $\textrm{mm}^2$. The functional and speed test results show that it can operate with 120-MHz clock at V$\sub$DD/=3.3-V, and its power consumption is given to about 63-mW.

  • PDF

Low Power Clock Generator Based on An Area-Reduced Interleaved Synchronous Mirror Delay Scheme (면적을 감소시킨 중첩된 싱크러너스 미러 지연 소자를 이용한 저전력 클럭 발생기)

  • Seong, Gi-Hyeok;Park, Hyeong-Jun;Yang, Byeong-Do;Kim, Lee-Seop
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.8
    • /
    • pp.46-51
    • /
    • 2002
  • A new interleaved synchronous mirror delay(SMD) is proposed in order to reduce the circuit size and the power. The conventional interleaved SMD has multiple pairs of forward delay array(FDA) and backward delay away(BDA) in order to reduce the jitter. The proposed interleaved SMD. requires one FDA and one BDA by changing the position of multiplexer. Moreover, the proposed interleaved SMD solves the polarity problem with just one extra inverter. Simulation results show that about 30% power reduction and 40% area reduction are achieved in the proposed interleaved SMD. All circuit simulations and implementations are based on a 0.25um two-metal CMOS technology.

A Cryptoprocessor for AES-128/192/256 Rijndael Block Cipher Algorithm (AES-128/192/256 Rijndael 블록암호 알고리듬용 암호 프로세서)

  • 안하기;박광호;신경욱
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.05a
    • /
    • pp.257-260
    • /
    • 2002
  • This paper describes a design of cryptographic processor that implements the AES (Advanced Encryption Standard) block cipher algorithm“Rijndael”. To achieve high throughput rate, a sub-pipeline stage is inserted into the round transformation block, resulting that the second half of current round function and the first half of next round function are being simultaneously operated. For area-efficient and low-power implementation the round transformation block is designed to share the hardware resources in encryption and decryption. An efficient scheme for on-the-fly key scheduling, which supports the three master-key lengths of 128-b/192-b/256-b, is devised to generate round keys in the first sub-pipeline stage of each round processing. The cryptoprocessor designed in Verilog-HDL was verified using Xilinx FPGA board and test system. The core synthesized using 0.35-${\mu}{\textrm}{m}$ CMOS cell library consists of about 25,000 gates. Simulation results show that it has a throughput of about 520-Mbits/sec with 220-MHz clock frequency at 2.5-V supply.

  • PDF

A 1.8V 2-Gb/s SLVS Transmitter with 4-lane (4-lane을 가지는 1.8V 2-Gb/s SLVS 송신단)

  • Baek, Seung-Wuk;Jang, Young-Chan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.357-360
    • /
    • 2013
  • A 1.8V 2-Gb/s scalable low voltage signaling (SLVS) transmitter (TX) is designed for mobile applications requiring high speed and low power consumption. It consists of 4-lane TX for data transmission, 1-lane TX for a source synchronous clocking, and a 8-phase clock generator. The proposed SLVS TX has the scaling voltage swing from 50 mV to 650 mV and supports a high speed (HS) mode and a low power (LP) mode. An output impedance calibration scheme for the SVLS TX is proposed to improve the signal integrity. The proposed SLVS TX is implemented by using a $0.18-{\mu}m$ 1-poly 6-metal CMOS with a 1.8V supply. The simulated data jitter of the implemented SLVS TX is about 8.04 ps at the data rate of 2-Gbps. The area and power consumption of the 1-lane of the proposed SLVS TX are $422{\times}474{\mu}m^2$ and 5.35 mW/Gb/s, respectively.

  • PDF

Early Start Branch Prediction to Resolve Prediction Delay (분기 명령어의 조기 예측을 통한 예측지연시간 문제 해결)

  • Kwak, Jong-Wook;Kim, Ju-Hwan
    • The KIPS Transactions:PartA
    • /
    • v.16A no.5
    • /
    • pp.347-356
    • /
    • 2009
  • Precise branch prediction is a critical factor in the IPC Improvement of modern microprocessor architectures. In addition to the branch prediction accuracy, branch prediction delay have a profound impact on overall system performance as well. However, it tends to be overlooked when the architects design the branch predictor. To tolerate branch prediction delay, this paper proposes Early Start Prediction (ESP) technique. The proposed solution dynamically identifies the start instruction of basic block, called as Basic Block Start Address (BB_SA), and the solution uses BB_SA when predicting the branch direction, instead of branch instruction address itself. The performance of the proposed scheme can be further improved by combining short interval hiding technique between BB_SA and branch instruction. The simulation result shows that the proposed solution hides prediction latency, with providing same level of prediction accuracy compared to the conventional predictors. Furthermore, the combination with short interval hiding technique provides a substantial IPC improvement of up to 10.1%, and the IPC is actually same with ideal branch predictor, regardless of branch predictor configurations, such as clock frequency, delay model, and PHT size.

Design of Iterative Divider in GF(2163) Based on Improved Binary Extended GCD Algorithm (개선된 이진 확장 GCD 알고리듬 기반 GF(2163)상에서 Iterative 나눗셈기 설계)

  • Kang, Min-Sup;Jeon, Byong-Chan
    • The KIPS Transactions:PartC
    • /
    • v.17C no.2
    • /
    • pp.145-152
    • /
    • 2010
  • In this paper, we first propose a fast division algorithm in GF($2^{163}$) using standard basis representation, and then it is mapped into divider for GF($2^{163}$) with iterative hardware structure. The proposed algorithm is based on the binary ExtendedGCD algorithm, and the arithmetic operations for modular reduction are performed within only one "while-statement" unlike conventional approach which uses two "while-statement". In this paper, we use reduction polynomial $f(x)=x^{163}+x^7+x^6+x^3+1$ that is recommended in SEC2(Standards for Efficient Cryptography) using standard basis representation, where degree m = 163. We also have implemented the proposed iterative architecture in FPGA using Verilog HDL, and it operates at a clock frequency of 85 MHz on Xilinx-VirtexII XC2V8000 FPGA device. From implementation results, we will show that computation speed of the proposed scheme is significantly improved than the existing two approaches.

A Design of Multimedia Application SoC based with Processor using BTB (BTB를 이용한 프로세서 기반 멀티미디어 응용 SoC 설계)

  • Jung, Younjin;Lee, Byungyup;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.397-400
    • /
    • 2009
  • This paper describes ASIC design of Multimedia application SoC platform based RISC processor with BTB(Branch Target Buffer). For performance enhancement of platform, we use a simple branch prediction scheme, BTB structure, that stores a target address for branch instruction to remove pipeline harzard. Also, the platform includes a number of peripheral such as VGA controller, AC97 controller, UART controller, SRAM interface and Debug interface. The platform is designed and verified on a Xilinx VERTEX-4 FPGA using a number of test programs for functional tests and timing constraints. Finally, the platform is implemented into a single ASIC chip which can be operated at 100MHz clock frequency using the Chartered 0.18um process. As a result of performance estimation, the proposed platform shows about 5~9% performance improvement in comparison with the previous SoC Platform.

  • PDF

Implementation of RSA Exponentiator Based on Radix-$2^k$ Modular Multiplication Algorithm (Radix-$2^k$ 모듈라 곱셈 알고리즘 기반의 RSA 지수승 연산기 설계)

  • 권택원;최준림
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.35-44
    • /
    • 2002
  • In this paper, an implementation method of RSA exponentiator based on Radix-$2^k$ modular multiplication algorithm is presented and verified. We use Booth receding algorithm to implement Radix-$2^k$ modular multiplication and implement radix-16 modular multiplier using 2K-byte memory and CSA(carry-save adder) array - with two full adder and three half adder delays. For high speed final addition we use a reduced carry generation and propagation scheme called pseudo carry look-ahead adder. Furthermore, the optimum value of the radix is presented through the trade-off between the operating frequency and the throughput for given Silicon technology. We have verified 1,024-bit RSA processor using Altera FPGA EP2K1500E device and Samsung 0.3$\mu\textrm{m}$ technology. In case of the radix-16 modular multiplication algorithm, (n+4+1)/4 clock cycles are needed and the 1,024-bit modular exponentiation is performed in 5.38ms at 50MHz.

FPGA Implementation and Performance Analysis of High Speed Architecture for RC4 Stream Cipher Algorithm (RC4 스트림 암호 알고리즘을 위한 고속 연산 구조의 FPGA 구현 및 성능 분석)

  • 최병윤;이종형;조현숙
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.14 no.4
    • /
    • pp.123-134
    • /
    • 2004
  • In this paper a high speed architecture of the RC4 stream cipher is proposed and its FPGA implementation is presented. Compared to the conventional RC4 designs which have long initialization operation or use double or triple S-arrays to reduce latency delay due to S-array initialization phase, the proposed architecture for RC4 stream cipher eliminates the S-array initialization operation using 256-bit valid entry scheme and supports 40/128-bit key lengths with efficient modular arithmetic hardware. The proposed RC4 stream cipher is implemented using Xilinx XCV1000E-6H240C FPGA device. The designed RC4 stream cipher has about a throughput of 106 Mbits/sec at 40 MHz clock and thus can be applicable to WEP processor and RC4 key search processor.