• Title/Summary/Keyword: Montgomery Multiplication

Search Result 78, Processing Time 0.019 seconds

A Scalable ECC Processor for Elliptic Curve based Public-Key Cryptosystem (타원곡선 기반 공개키 암호 시스템 구현을 위한 Scalable ECC 프로세서)

  • Choi, Jun-Baek;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.8
    • /
    • pp.1095-1102
    • /
    • 2021
  • A scalable ECC architecture with high scalability and flexibility between performance and hardware complexity is proposed. For architectural scalability, a modular arithmetic unit based on a one-dimensional array of processing element (PE) that performs finite field operations on 32-bit words in parallel was implemented, and the number of PEs used can be determined in the range of 1 to 8 for circuit synthesis. A scalable algorithms for word-based Montgomery multiplication and Montgomery inversion were adopted. As a result of implementing scalable ECC processor (sECCP) using 180-nm CMOS technology, it was implemented with 100 kGEs and 8.8 kbits of RAM when NPE=1, and with 203 kGEs and 12.8 kbits of RAM when NPE=8. The performance of sECCP with NPE=1 and NPE=8 was analyzed to be 110 PSMs/sec and 610 PSMs/sec, respectively, on P256R elliptic curve when operating at 100 MHz clock.

An Area-efficient Design of ECC Processor Supporting Multiple Elliptic Curves over GF(p) and GF(2m) (GF(p)와 GF(2m) 상의 다중 타원곡선을 지원하는 면적 효율적인 ECC 프로세서 설계)

  • Lee, Sang-Hyun;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.254-256
    • /
    • 2019
  • 소수체 GF(p)와 이진체 $GF(2^m)$ 상의 다중 타원곡선을 지원하는 듀얼 필드 ECC (DF-ECC) 프로세서를 설계하였다. DF-ECC 프로세서의 저면적 설와 다양한 타원곡선의 지원이 가능하도록 워드 기반 몽고메리 곱셈 알고리듬을 적용한 유한체 곱셈기를 저면적으로 설계하였으며, 페르마의 소정리(Fermat's little theorem)를 유한체 곱셈기에 적용하여 유한체 나눗셈을 구현하였다. 설계된 DF-ECC 프로세서는 스칼라 곱셈과 점 연산, 그리고 모듈러 연산 기능을 가져 다양한 공개키 암호 프로토콜에 응용이 가능하며, 유한체 및 모듈러 연산에 적용되는 파라미터를 내부 연산으로 생성하여 다양한 표준의 타원곡선을 지원하도록 하였다. 설계된 DF-ECC는 FPGA 구현을 하드웨어 동작을 검증하였으며, 0.18-um CMOS 셀 라이브러리로 합성한 결과 22,262 GEs (gate equivalences)와 11 kbit RAM으로 구현되었으며, 최대 100 MHz의 동작 주파수를 갖는다. 설계된 DF-ECC 프로세서의 연산성능은 B-163 Koblitz 타원곡선의 경우 스칼라 곱셈 연산에 885,044 클록 사이클이 소요되며, B-571 슈도랜덤 타원곡선의 스칼라 곱셈에는 25,040,625 사이클이 소요된다.

  • PDF

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

VLSI Architecture for High Speed Implementation of Elliptic Curve Cryptographic Systems (타원곡선 암호 시스템의 고속 구현을 위한 VLSI 구조)

  • Kim, Chang-Hoon
    • The KIPS Transactions:PartC
    • /
    • v.15C no.2
    • /
    • pp.133-140
    • /
    • 2008
  • In this paper, we propose a high performance elliptic curve cryptographic processor over $GF(2^{163})$. The proposed architecture is based on a modified Lopez-Dahab elliptic curve point multiplication algorithm and uses Gaussian normal basis for $GF(2^{163})$ field arithmetic. To achieve a high throughput rates, we design two new word-level arithmetic units over $GF(2^{163})$ and derive a parallelized elliptic curve point doubling and point addition algorithm with uniform addressing based on the Lopez-Dahab method. We implement our design using Xilinx XC4VLX80 FPGA device which uses 24,263 slices and has a maximum frequency of 143MHz. Our design is roughly 4.8 times faster with 2 times increased hardware complexity compared with the previous hardware implementation proposed by Shu. et. al. Therefore, the proposed elliptic curve cryptographic processor is well suited to elliptic curve cryptosystems requiring high throughput rates such as network processors and web servers.

Design of high-speed RSA processor based on radix-4 Montgomery multiplier (래딕스-4 몽고메리 곱셈기 기반의 고속 RSA 연산기 설계)

  • Koo, Bon-Seok;Ryu, Gwon-Ho;Chang, Tae-Joo;Lee, Sang-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.17 no.6
    • /
    • pp.29-39
    • /
    • 2007
  • RSA is one of the most popular public-key crypto-system in various applications. This paper addresses a high-speed RSA crypto-processor with modified radix-4 modular multiplication algorithm and Chinese Remainder Theorem(CRT) using Carry Save Adder(CSA). Our design takes 0.84M clock cycles for a 1024-bit modular exponentiation and 0.25M cycles for a 512-bit exponentiations. With 0.18um standard cell library, the processor achieves 365Kbps for a 1024-bit exponentiation and 1,233Kbps for two 512-bit exponentiations at a 300MHz clock rate.

A Lightweight Hardware Accelerator for Public-Key Cryptography (공개키 암호 구현을 위한 경량 하드웨어 가속기)

  • Sung, Byung-Yoon;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1609-1617
    • /
    • 2019
  • Described in this paper is a design of hardware accelerator for implementing public-key cryptographic protocols (PKCPs) based on Elliptic Curve Cryptography (ECC) and RSA. It supports five elliptic curves (ECs) over GF(p) and three key lengths of RSA that are defined by NIST standard. It was designed to support four point operations over ECs and six modular arithmetic operations, making it suitable for hardware implementation of ECC- and RSA-based PKCPs. In order to achieve small-area implementation, a finite field arithmetic circuit was designed with 32-bit data-path, and it adopted word-based Montgomery multiplication algorithm, the Jacobian coordinate system for EC point operations, and the Fermat's little theorem for modular multiplicative inverse. The hardware operation was verified with FPGA device by implementing EC-DH key exchange protocol and RSA operations. It occupied 20,800 gate equivalents and 28 kbits of RAM at 50 MHz clock frequency with 180-nm CMOS cell library, and 1,503 slices and 2 BRAMs in Virtex-5 FPGA device.

Design and Hardware Implementation of High-Speed Variable-Length RSA Cryptosystem (가변길이 고속 RSA 암호시스템의 설계 및 하드웨어 구현)

  • 박진영;서영호;김동욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.9C
    • /
    • pp.861-870
    • /
    • 2002
  • In this paper, with targeting on the drawback of RSA of operation speed, a new 1024-bit RSA cryptosystem has been proposed and implemented in hardware to increase the operational speed and perform the variable-length encryption. The proposed cryptosystem mainly consists of the modular exponentiation part and the modular multiplication part. For the modular exponentiation, the RL-binary method, which performs squaring and modular multiplying in parallel, was improved, and then applied. And 4-stage CSA structure and radix-4 booth algorithm were applied to enhance the variable-length operation and reduce the number of partial product in modular multiplication arithmetic. The proposed RSA cryptosystem which can calculate at most 1024 bits at a tittle was mapped into the integrated circuit using the Hynix Phantom Cell Library for Hynix 0.35㎛ 2-Poly 4-Metal CMOS process. Also, the result of software implementation, which had been programmed prior to the hardware research, has been used to verify the operation of the hardware system. The size of the result from the hardware implementation was about 190k gate count and the operational clock frequency was 150㎒. By considering a variable-length of modulus number, the baud rate of the proposed scheme is one and half times faster than the previous works. Therefore, the proposed high speed variable-length RSA cryptosystem should be able to be used in various information security system which requires high speed operation.

Implementation of RSA modular exponentiator using Division Chain (나눗셈 체인을 이용한 RSA 모듈로 멱승기의 구현)

  • 김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.21-34
    • /
    • 2002
  • In this paper we propos a new hardware architecture of modular exponentiation using a division chain method which has been proposed in (2). Modular exponentiation using the division chain is performed by receding an exponent E as a mixed form of multiplication and addition with divisors d=2 or $d=2^I +1$ and respective remainders r. This calculates the modular exponentiation in about $1.4log_2$E multiplications on average which is much less iterations than $2log_2$E of conventional Binary Method. We designed a linear systolic array multiplier with pipelining and used a horizontal projection on its data dependence graph. So, for k-bit key, two k-bit data frames can be inputted simultaneously and two modular multipliers, each consisting of k/2+3 PE(Processing Element)s, can operate in parallel to accomplish 100% throughput. We propose a new encoding scheme to represent divisors and remainders of the division chain to keep regularity of the data path. When it is synthesized to ASIC using Samsung 0.5 um CMOS standard cell library, the critical path delay is 4.24ns, and resulting performance is estimated to be abort 140 Kbps for a 1024-bit data frame at 200Mhz clock In decryption process, the speed can be enhanced to 560kbps by using CRT(Chinese Remainder Theorem). Futhermore, to satisfy real time requirements we can choose small public exponent E, such as 3,17 or $2^{16} +1$, in encryption and verification process. in which case the performance can reach 7.3Mbps.