• Title/Summary/Keyword: 소수 곱셈

Search Result 83, Processing Time 0.034 seconds

Design of an Operator Architecture for Finite Fields in Constrained Environments (제약적인 환경에 적합한 유한체 연산기 구조 설계)

  • Jung, Seok-Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.3
    • /
    • pp.45-50
    • /
    • 2008
  • The choice of an irreducible polynomial and the representation of elements have influence on the efficiency of operators for finite fields. This paper suggests two serial multiplier for the extention field GF$(p^n)$ where p is odd prime. A serial multiplier using an irreducible binomial consists of (2n+5) resisters, 2 MUXs, 2 multipliers of GF(p), and 1 adder of GF(p). It obtains the mulitplication result after $n^2+n$ clock cycles. A serial multiplier using an AOP consists of (2n+5) resisters, 1 MUX, 1 multiplier of CF(p), and 1 adder of GF(p). It obtains the mulitplication result after $n^2$+3n+2 clock cycles.

A new efficient format of dynamic fixed-point number for texture mapping in mobile 3D graphics (모바일 3차원 그래픽 텍스처 매핑에 효율적인 새로운 유동형 고정 소수점 수 포맷)

  • Kim, Nam-Seok;Han, Jung-Hyun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10a
    • /
    • pp.135-138
    • /
    • 2006
  • 본 논문에서는 텍스처 매핑을 처리하기 위한 텍스처 유닛 하드웨어 설계에 효율적인 새로운 유동형 소수점 포맷을 제안한다. 기존 고정 소수점 포맷은 하드웨어가 간단한 반면 고품질 텍스처 처리를 수행할 경우 오버플로우/언더플로우가 발생하며 부동 소수점 포맷은 이를 해결할 수 있으나 하드웨어가 복잡하다. 제안한 방식은 오버플로우/언더플로우를 해결하면서 부동소수점보다 하드웨어 크기를 줄여서 본 포맷을 적용한 가산기는 부동소수점보다 26% 작으며 곱셈기는 고정/부동 소수점보다 절반 이상으로 작다. 따라서 제안한 포맷은 100Mhz 이상의 빠른 동작이 가능하며 모바일 3차원 그래픽 가속기의 텍스처 유닛 설계에 효과적이다.

  • PDF

A fast DCT algorithm with reduced propagation error in the fixed-point compuitation (고정 소수점 연산시 오차의 전파를 줄이는 고속 이산 여현 변환 알고리즘)

  • 정연식;이임건;최영호;박규태
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.9A
    • /
    • pp.2365-2371
    • /
    • 1998
  • Discrete cosine transform (DCT) has wide applications in speech and image coding. In this paper, we propose a novel fast dCT scheme with the property of reduced multiplication stages and the smaller number of additions and multiplications. This exploits the symmetry property of the DCT kernel to decompose the N-point dCT to N/2 point, and can be generally applied recursively to $2^{m}$-point. The proposed algorithm has a structure that most of multiplications tend to be performed at final stage, and this reduces propagation of truncation error which could occur in the fixed-point computation. Also the minimization of the multiplication stages further decreases the error.

  • PDF

A Variable Latency Newton-Raphson's Floating Point Number Reciprocal Square Root Computation (가변 시간 뉴톤-랍손 부동소수점 역수 제곱근 계산기)

  • Kim Sung-Gi;Cho Gyeong-Yeon
    • The KIPS Transactions:PartA
    • /
    • v.12A no.5 s.95
    • /
    • pp.413-420
    • /
    • 2005
  • The Newton-Raphson iterative algorithm for finding a floating point reciprocal square mot calculates it by performing a fixed number of multiplications. In this paper, a variable latency Newton-Raphson's reciprocal square root algorithm is proposed that performs multiplications a variable number of times until the error becomes smaller than a given value. To find the rediprocal square root of a floating point number F, the algorithm repeats the following operations: '$X_{i+1}=\frac{{X_i}(3-e_r-{FX_i}^2)}{2}$, $i\in{0,1,2,{\ldots}n-1}$' with the initial value is '$X_0=\frac{1}{\sqrt{F}}{\pm}e_0$'. The bits to the right of p fractional bits in intermediate multiplication results are truncated and this truncation error is less than '$e_r=2^{-p}$'. The value of p is 28 for the single precision floating point, and 58 for the double precision floating point. Let '$X_i=\frac{1}{\sqrt{F}}{\pm}e_i$, there is '$X_{i+1}=\frac{1}{\sqrt{F}}-e_{i+1}$, where '$e_{i+1}{<}\frac{3{\sqrt{F}}{{e_i}^2}}{2}{\mp}\frac{{Fe_i}^3}{2}+2e_r$'. If '$|\frac{\sqrt{3-e_r-{FX_i}^2}}{2}-1|<2^{\frac{\sqrt{-p}{2}}}$' is true, '$e_{i+1}<8e_r$' is less than the smallest number which is representable by floating point number. So, $X_{i+1}$ is approximate to '$\frac{1}{\sqrt{F}}$. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications Per an operation is derived from many reciprocal square root tables ($X_0=\frac{1}{\sqrt{F}}{\pm}e_0$) with varying sizes. The superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a reciprocal square root unit. Also, it can be used to construct optimized approximate reciprocal square root tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc.

HW Matrix Multiplier Implementation & Performance Measurement for Low Earth Orbit Satellite (저궤도 위성을 위한 HW 행렬 곱셈기의 구현과 성능 측정)

  • Lee, Yunki;Kim, Jihoon
    • Journal of Satellite, Information and Communications
    • /
    • v.10 no.2
    • /
    • pp.115-120
    • /
    • 2015
  • Until now, AOCS SW has used FPU which is one of CPU resources for satellite attitude control. And most of the SW Throughput was consumed to calculate Matrix Multiply. As SW throughput margin is decreasing seriously with shorter control period and more computational burden at next satellite programs, a dedicated HW matrix multiplier is absolutely required. This paper represents results of HW implementation & performance measurement and mentions several techniques for performance improvement, further works.

224-bit ECC Processor supporting the NIST P-224 elliptic curve (NIST P-224 타원곡선을 지원하는 224-비트 ECC 프로세서)

  • Park, Byung-Gwan;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.188-190
    • /
    • 2017
  • 투영(projective) 좌표계를 이용한 스칼라 곱셈(scalar multiplication) 연산을 지원하는 224-비트 타원곡선 암호(Elliptic Curve Cryptography; ECC) 프로세서의 설계에 대해 기술한다. 소수체 GF(p)상의 덧셈, 뺄셈, 곱셈 등의 유한체 연산을 지원하며, 연산량과 하드웨어 자원소모가 큰 나눗셈 연산을 제거함으로써 하드웨어 복잡도를 감소시켰다. 수정된 Montgomery ladder 알고리듬을 이용하여 스칼라 곱셈 연산을 제어하였으며, 단순 전력분석에 보다 안전하다. 스칼라 곱셈 연산은 최대 2,615,201 클록 사이클이 소요된다. 설계된 ECC-P224 프로세서는 Xilinx ISim을 이용한 기능검증을 하였다. Xilinx Virtex5 FPGA 디바이스 합성결과 7,078 슬라이스로 구현되었으며, 최대 79 MHz에서 동작하였다.

  • PDF

Scalable ECC Processor supporting multiple elliptic curves over prime field (소수체 상의 다중 타원곡선을 지원하는 Scalable ECC 프로세서)

  • Park, Byung-Gwan;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.247-249
    • /
    • 2017
  • NIST에서 표준으로 정의된 P-192, P-224, P-256, P-384 타원곡선 상의 스칼라 곱셈(scalar multiplication) 연산을 지원하는 Scalable 타원곡선 암호(Elliptic Curve Cryptography; ECC) 프로세서의 설계에 대해 기술한다. 투영(projective) 좌표계를 이용하여 하드웨어 자원 소모가 큰 나눗셈 연산을 제거하였으며, GF(p) 상의 덧셈, 뺄셈, 곱셈 등의 유한체 연산을 지원한다. 워드 기반 몽고메리 곱셈기를 이용하여 다양한 크기의 필드(field)에서 고정된 하드웨어 자원을 통하여 곱셈 연산을 수행하도록 하였으며, 필드의 크기에 따라 연산 사이클이 증가하거나 감소한다. 설계된 Scalable ECC 프로세서는 Verilog HDL로 모델링 되었으며, Modelsim을 이용한 기능검증을 하였다. Xilinx Virtex5 FPGA 디바이스 합성결과 5,376-비트 RAM과 970 슬라이스로 구현되었으며, 최대 55 MHz의 동작 주파수를 갖는다.

  • PDF

Montgomery Multiplier Supporting Dual-Field Modular Multiplication (듀얼 필드 모듈러 곱셈을 지원하는 몽고메리 곱셈기)

  • Kim, Dong-Seong;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.6
    • /
    • pp.736-743
    • /
    • 2020
  • Modular multiplication is one of the most important arithmetic operations in public-key cryptography such as elliptic curve cryptography (ECC) and RSA, and the performance of modular multiplier is a key factor influencing the performance of public-key cryptographic hardware. An efficient hardware implementation of word-based Montgomery modular multiplication algorithm is described in this paper. Our modular multiplier was designed to support eleven field sizes for prime field GF(p) and binary field GF(2k) as defined by SEC2 standard for ECC, making it suitable for lightweight hardware implementations of ECC processors. The proposed architecture employs pipeline scheme between the partial product generation and addition operation and the modular reduction operation to reduce the clock cycles required to compute modular multiplication by 50%. The hardware operation of our modular multiplier was demonstrated by FPGA verification. When synthesized with a 65-nm CMOS cell library, it was realized with 33,635 gate equivalents, and the maximum operating clock frequency was estimated at 147 MHz.

A High Performance Modular Multiplier for ECC (타원곡선 암호를 위한 고성능 모듈러 곱셈기)

  • Choe, Jun-Yeong;Shin, Kyung-Wook
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.961-968
    • /
    • 2020
  • This paper describes a design of high performance modular multiplier that is essentially used for elliptic curve cryptography. Our modular multiplier supports modular multiplications for five field sizes over GF(p), including 192, 224, 256, 384 and 521 bits as defined in NIST FIPS 186-2, and it calculates modular multiplication in two steps with integer multiplication and reduction. The Karatsuba-Ofman multiplication algorithm was used for fast integer multiplication, and the Lazy reduction algorithm was adopted for reduction operation. In addition, the Nikhilam division algorithm was used for the division operation included in the Lazy reduction. The division operation is performed only once for a given modulo value, and it was designed to skip division operation when continuous modular multiplications with the same modulo value are calculated. It was estimated that our modular multiplier can perform 6.4 million modular multiplications per second when operating at a clock frequency of 32 MHz. It occupied 456,400 gate equivalents (GEs), and the estimated clock frequency was 67 MHz when synthesized with a 180-nm CMOS cell library.

Efficient Multiplication of Boolean Matrices and Algorithm for D-Class Computation (D-클래스 계산을 위한 불리언 행렬의 효율적 곱셈 및 알고리즘)

  • Han, Jae-Il;Shin, Bum-Joo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.2
    • /
    • pp.68-78
    • /
    • 2007
  • D-class is defined as a set of equivalent $n{\times}n$ boolean matrices according to a given equivalence relation. The D-class computation requires the multiplication of three boolean matrices for each of all possible triples of $n{\times}n$ boolean matrices. However, almost all the researches on boolean matrices focused on the efficient multiplication of only two boolean matrices and a few researches have recently been shown to deal with the multiplication of all boolean matrices. The paper suggests a mathematical theory that enables the efficient multiplication for all possible boolean matrix triples and the efficient computation of all D-classes, and discusses algorithms designed with the theory and their execution results.

  • PDF