• Title/Summary/Keyword: Booth Algorithm

Search Result 45, Processing Time 0.024 seconds

A High-Performance ECC Processor Supporting NIST P-521 Elliptic Curve (NIST P-521 타원곡선을 지원하는 고성능 ECC 프로세서)

  • Yang, Hyeon-Jun;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.548-555
    • /
    • 2022
  • This paper describes the hardware implementation of elliptic curve cryptography (ECC) used as a core operation in elliptic curve digital signature algorithm (ECDSA). The ECC processor supports eight operation modes (four point operations, four modular operations) on the NIST P-521 curve. In order to minimize computation complexity required for point scalar multiplication (PSM), the radix-4 Booth encoding scheme and modified Jacobian coordinate system were adopted, which was based on the complexity analysis for five PSM algorithms and four different coordinate systems. Modular multiplication was implemented using a modified 3-Way Toom-Cook multiplication and a modified fast reduction algorithm. The ECC processor was implemented on xczu7ev FPGA device to verify hardware operation. Hardware resources of 101,921 LUTs, 18,357 flip-flops and 101 DSP blocks were used, and it was evaluated that about 370 PSM operations per second were achieved at a maximum operation clock frequency of 45 MHz.

New VQB divide/square root operator that uses Booth algorithm (Booth 알고리즘을 이용한 새로운 VQB 제산/제곱근 연산기의 설계)

  • 이성연;이태영;이용석
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.380-383
    • /
    • 1999
  • 본 논문은 Booth 알고리즘을 사용하는 새로운 VQB제산기를 제안한다. 본 논문은 Macsorley의 제산 알고리즘에 기본 원리가 같은 제곱근 알고리즘을 추가하였으며, 이를 VQB 알고리즘이라고 명명하였다. 본 논문은 VQB 제산기의 두 가지 설계를 구현하였다. 하나는 계수를 사용하지 않는 설계 (A) 이며, 둘은 [1/2, 2]의 계수군을 사용하는 설계 (B)이다. 설계 (A)는 순환할때마다 2.54 비트의 부분 몫을 결정하며 설계 (B)는 2.74 비트를 결정한다. 본 논문은 VQB 제산기의 성능지표를 좌우하는 제곱근을 위주로 하여 SRT 제산기와의 비교를 시도하였다. VQB 는 처리량과 설계 노력 면에서 SRT를 앞서며, 면적과 임계지연 면에서는 SRT와 서로 견줄만한 수준이다. 표준셀 0.35㎛ CMOS 공정으로 구현될 때, 설계 (A)의 임계지연은 9.69㎱ 이며, 설계 (B)는 11.05㎱이다.

  • PDF

Design of a 323${\times}$2-Bit Modified Booth Multiplier Using Current-Mode CMOS Multiple-Valued Logic Circuits (전류모드 CMOS 다치 논리회로를 이용한 32${\times}$32-Bit Modified Booth 곱셈기 설계)

  • 이은실;김정범
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.12
    • /
    • pp.72-79
    • /
    • 2003
  • This paper proposes a 32${\times}$32 Modified Booth multiplier using CMOS multiple-valued logic circuits. The multiplier based on the radix-4 algorithm is designed with current mode CMOS quaternary logic circuits. Designed multiplier is reduced the transistor count by 67.1% and 37.3%, compared with that of the voltage mode binary multiplier and the previous multiple-valued logic multiplier, respectively. The multiplier is designed with a 0.35${\mu}{\textrm}{m}$ standard CMOS technology at a 3.3V supply voltage and unit current 10$mutextrm{A}$, and verified by HSPICE. The multiplier has 5.9㎱ of propagation delay time and 16.9mW of power dissipation. The performance is comparable to that of the fastest binary multiplier reported.

Design of a Truncated Floating-Point Multiplier for Graphic Accelerator of Mobile Devices (모바일 그래픽 가속기용 부동소수점 절사 승산기 설계)

  • Cho, Young-Sung;Lee, Yong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.3
    • /
    • pp.563-569
    • /
    • 2007
  • As the mobile communication and the semiconductor technology is improved continuously, mobile contents such as the multimedia service and the 2D/3D graphics which require high level graphics are serviced recently. Mobile chips should consume small die area and low power. In this paper, we design a truncated floating-point multiplier that is useful for the 2D/3D vector graphics in mobile devices. The truncated multiplier is based on the radix-4 Booth's encoding algorithm and a truncation algorithm is used to achieve small area and low power. The average percent error of the multiplier is as small as 0.00003% and neglectable for mobile applications. The synthesis result using 0.35um CMOS cell library shows that the number of gates for the truncated multiplier is only 33.8 percent of the conventional radix-4 Booth's multiplier.

Design of a Multi-Valued Arithmetic Processor with Encoder and Decoder (인코더, 디코오더를 가지는 다치 연산기 설계)

  • 박진우;양대영;송홍복
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.2 no.1
    • /
    • pp.147-156
    • /
    • 1998
  • In this paper, an arithmetic processor using multi-valued logic is designed. For implementing of multi-valued logic circuits, we use current-mode CMOS circuits and design encoder which change binary voltage-mode signals to multi-valued current-mode signals and decoder which change results of arithmetic to binary voltage-mode signals. To reduce the number of partial product we use 4-radix SD number partial product generation algorithm that is an extension of the modified Booth's algorithm. We demonstrate the effectiveness of the proposed arithmetic circuits through SPICE simulation and Hardware emulation using FPGA chip.

  • PDF

Design of a 64×64-Bit Modified Booth Multiplier Using Current-Mode CMOS Quarternary Logic Circuits (전류모드 CMOS 4치 논리회로를 이용한 64×64-비트 변형된 Booth 곱셈기 설계)

  • Kim, Jeong-Beom
    • The KIPS Transactions:PartA
    • /
    • v.14A no.4
    • /
    • pp.203-208
    • /
    • 2007
  • This paper proposes a $64{\times}64$ Modified Booth multiplier using CMOS multi-valued logic circuits. The multiplier based on the radix-4 algorithm is designed with current mode CMOS quaternary logic circuits. Designed multiplier is reduced the transistor count by 64.4% compared with the voltage mode binary multiplier. The multiplier is designed with Samsung $0.35{\mu}m$ standard CMOS process at a 3.3V supply voltage and unit current $5{\mu}m$. The validity and effectiveness are verified through the HSPICE simulation. The voltage mode binary multiplier is achieved the occupied area of $7.5{\times}9.4mm^2$, the maximum propagation delay time of 9.8ns and the average power consumption of 45.2mW. This multiplier is achieved the maximum propagation delay time of 11.9ns and the average power consumption of 49.7mW. The designed multiplier is reduced the occupied area by 42.5% compared with the voltage mode binary multiplier.

Architectural Design for Hardware Implementations of Parallelized Floating-point Rounding Algorithm (부동소수점 라운딩 병렬화 알고리즘의 하드웨어 구현을 위한 구조 설계)

  • 이원희;강준우
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1025-1028
    • /
    • 1998
  • Hardware to implement the parallelized Floating-point rounding algorithm is described. For parallelized additions, we propose an addition module which has carry selection logic to generate two results accoring to the input valuse. A multiplication module for parallelized multiplications is also proposed to generate Sum and Carry bits as intermediate results. Since these modules process data in IEEE standard Floatingpoint double precision format, they are designed for 53-bit significands including hidden bits. Multiplication module is designed with a Booth multiplier and an array multiplier.

  • PDF

A Design of the High-Speed Cipher VLSI Using IDEA Algorithm (IDEA 알고리즘을 이용한 고속 암호 VLSI 설계)

  • 이행우;최광진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.11 no.1
    • /
    • pp.64-72
    • /
    • 2001
  • This paper is on a design of the high-speed cipher IC using IDEA algorithm. The chip is consists of six functional blocks. The principal blocks are encryption and decryption key generator, input data circuit, encryption processor, output data circuit, operation mode controller. In subkey generator, the design goal is rather decrease of its area than increase of its computation speed. On the other hand, the design of encryption processor is focused on rather increase of its computation speed than decrease of its area. Therefore, the pipeline architecture for repeated processing and the modular multiplier for improving computation speed are adopted. Specially, there are used the carry select adder and modified Booth algorithm to increase its computation speed at modular multiplier. To input the data by 8-bit, 16-bit, 32-bit according to the operation mode, it is designed so that buffer shifts by 8-bit, 16-bit, 32-bit. As a result of simulation by 0.25 $\mu\textrm{m}$ process, this IC has achieved the throughput of 1Gbps in addition to its small area, and used 12,000gates in implementing the algorithm.

Variable Radix-Two Multibit Coding and Its VLSI Implementation of DCT/IDCT (가변길이 다중비트 코딩을 이용한 DCT/IDCT의 설계)

  • 김대원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.12
    • /
    • pp.1062-1070
    • /
    • 2002
  • In this paper, variable radix-two multibit coding algorithm is presented and applied in the implementation of discrete cosine transform(DCT) and inverse discrete cosine transform(IDCT). Variable radix-two multibit coding means the 2k SD (signed digit) representation of overlapped multibit scanning with variable shift method. SD represented by 2k generates partial products, which can be easily implemented with shifters and adders. This algorithm is most powerful for the hardware implementation of DCT/IDCT with constant coefficient matrix multiplication. This paper introduces the suggested algorithm, it's proof and the implementation of DCT/IDCT The implemented IDCT chip with 8 PEs(Processing Elements) and one transpose memory runs at a tate of 400 Mpixels/sec at 54MHz frequency for high speed parallel signal processing, and it's verified in HDTV and MPEG decoder.

Implementation of RSA Exponentiator Based on Radix-$2^k$ Modular Multiplication Algorithm (Radix-$2^k$ 모듈라 곱셈 알고리즘 기반의 RSA 지수승 연산기 설계)

  • 권택원;최준림
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.35-44
    • /
    • 2002
  • In this paper, an implementation method of RSA exponentiator based on Radix-$2^k$ modular multiplication algorithm is presented and verified. We use Booth receding algorithm to implement Radix-$2^k$ modular multiplication and implement radix-16 modular multiplier using 2K-byte memory and CSA(carry-save adder) array - with two full adder and three half adder delays. For high speed final addition we use a reduced carry generation and propagation scheme called pseudo carry look-ahead adder. Furthermore, the optimum value of the radix is presented through the trade-off between the operating frequency and the throughput for given Silicon technology. We have verified 1,024-bit RSA processor using Altera FPGA EP2K1500E device and Samsung 0.3$\mu\textrm{m}$ technology. In case of the radix-16 modular multiplication algorithm, (n+4+1)/4 clock cycles are needed and the 1,024-bit modular exponentiation is performed in 5.38ms at 50MHz.