• Title/Summary/Keyword: 곱셈 알고리즘

Search Result 330, Processing Time 0.02 seconds

Design and Implementation of Lok-up Table for Pre-scaling in Very-High Radix Divider (높은 자릿수 나눗셈 연산기에서의 영역변환상수를 위한 검색테이블 설계 및 구현)

  • 이병석;송문식;이정아
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10c
    • /
    • pp.3-5
    • /
    • 1999
  • 나눗셈 알고리즘은 다른 덧셈이나 곱셈 알고리즘에 비해 복잡하고, 수행 빈도수가 적다는 이유로 그동안 고속 나눗셈의 하드웨어 연구는 활발하지 않았다. 그러나 멀티미디어의 발전 및 고성능의 그래픽 랜더링을 위한 보다 빠른 부동소수점연산기(FPU)가 필요하게 되었으며, 이에 따라서 고속의 나눗셈 연산기의 필요성이 증가하게 되었다. 특히, 전체의 수행 시간 향상을 위해서라도 고속 나눗셈 연산기의 중용성은 더욱 부각되고 있다. 그러나 고속 나눗셈 연산기는 연산 속도와 크기라는 서로 상반되는 요소를 가지고 있다. 즉, 연산 속도가 빠르면 크기는 늘어나고, 크기를 줄이면 연산 속도는 늦어지게 된다. 본 논문은 높은 자릿수(Very-High Radix) 나눗셈 알고리즘에서 영역변환상수를 구하는 방법으로 연산이 아닌 검색테이블(Look-up Table)을 이용한다. 그리고 검색테이블의 크기를 줄이는 방법으로 영역변환상수의 범위 분석 및 캐리 저장형을 이용한 검색테이블 분할 방법을 이용하였다. 전체적으로는 영역변환상수를 구하는 연산주기가 필요없게 되므로 나눗셈 연산기의 영역 크기의 변화가 적으면서 연산 속도는 빨라졌음을 알 수 있다.

  • PDF

Design of Inverse Square Root Unit Using 2-Stage Pipeline Architecture (2-Stage Pipeline 구조를 이용한 역제곱근 연산기의 설계)

  • Kim, Jung-Hoon;Kim, Ki-Chul
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10b
    • /
    • pp.198-201
    • /
    • 2007
  • 본 논문에서는 변형된 Newton-Raphson 알고리즘과 LUT(Look Up Table)를 사용하는 역제곱근 연산기를 제안한다. Newton-Raphson 부동소수점 역수 알고리즘은 일정한 횟수의 곱셈을 반복하여 역수 제곱근을 계산하는 방식이다. 변형된 Newton-Raphson 알고리즘은 하드웨어 구현에 적합하도록 변환되었으며, LUT는 오차를 줄이기 위해 개선되었다. 제안된 연산기는 LUT의 크기를 최소화하고, 순환적인 구조가 아닌 2-stage pipeline 구조를 가진다. 또한 IEEE-754 부동소수점 표준을 기초로 하는 24-bit 데이터 형식을 사용해 면적과 속도 향상에 유리하여 휴대용 기기의 멀티미디어 분야의 응용에 적합하다. 본 역제곱근 연산기는 소수점 이하 8-bit의 정확도를 가지며 VHDL을 이용하여 설계되었다. 그 크기는 $0.18{\mu}m$ CMOS 공정에서 약 4,000 gate의 크기를 보였으며 150MHz에서 동작이 가능하다.

  • PDF

An Efficient Architecture for Modified Karatsuba-Ofman Algorithm (불필요한 연산이 없는 카라슈바 알고리즘과 하드웨어 구조)

  • Chang Nam-Su;Kim Chang-Han
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.3 s.345
    • /
    • pp.33-39
    • /
    • 2006
  • In this paper we propose the Modified Karatsuba-Ofman algorithm for polynomial multiplication to polynomials of arbitrary degree. Leone proposed optimal stop condition for iteration of Karatsuba-Ofman algorithm(KO). In this paper, we propose a Non-Redundant Karatsuba-Ofman algorithm (NRKOA) with removing redundancy operations, and design a parallel hardware architecture based on the proposed algorithm. Comparing with existing related Karatsuba architectures with the same time complexity, the proposed architecture reduces the area complexity. Furthermore, the space complexity of the proposed multiplier is reduced by 43% in the best case.

New Simplified Sum-Product Algorithm for Low Complexity LDPC Decoding (복잡도를 줄인 LDPC 복호를 위한 새로운 Simplified Sum-Product 알고리즘)

  • Han, Jae-Hee;SunWoo, Myung-Hoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.3C
    • /
    • pp.322-328
    • /
    • 2009
  • This paper proposes new simplified sum-product (SSP) decoding algorithm to improve BER performance for low-density parity-check codes. The proposed SSP algorithm can replace multiplications and divisions with additions and subtractions without extra computations. In addition, the proposed SSP algorithm can simplify both the In[tanh(x)] and tanh-1 [exp(x)] by using two quantization tables which can reduce tremendous computational complexity. Moreover, the simulation results show that the proposed SSP algorithm can improve about $0.3\;{\sim}\;0.8\;dB$ of BER performance compared with the existing modified sum-product algorithms.

Scalable RSA public-key cryptography processor based on CIOS Montgomery modular multiplication Algorithm (CIOS 몽고메리 모듈러 곱셈 알고리즘 기반 Scalable RSA 공개키 암호 프로세서)

  • Cho, Wook-Lae;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.1
    • /
    • pp.100-108
    • /
    • 2018
  • This paper describes a design of scalable RSA public-key cryptography processor supporting four key lengths of 512/1,024/2,048/3,072 bits. The modular multiplier that is a core arithmetic block for RSA crypto-system was designed with 32-bit datapath, which is based on the CIOS (Coarsely Integrated Operand Scanning) Montgomery modular multiplication algorithm. The modular exponentiation was implemented by using L-R binary exponentiation algorithm. The scalable RSA crypto-processor was verified by FPGA implementation using Virtex-5 device, and it takes 456,051/3,496347/26,011,947/88,112,770 clock cycles for RSA computation for the key lengths of 512/1,024/2,048/3,072 bits. The RSA crypto-processor synthesized with a $0.18{\mu}m$ CMOS cell library occupies 10,672 gate equivalent (GE) and a memory bank of $6{\times}3,072$ bits. The estimated maximum clock frequency is 147 MHz, and the RSA decryption takes 3.1/23.8/177/599.4 msec for key lengths of 512/1,024/2,048/3,072 bits.

The Integer Number Divider Using Improved Reciprocal Algorithm (개선된 역수 알고리즘을 사용한 정수 나눗셈기)

  • Song, Hong-Bok;Park, Chang-Soo;Cho, Gyeong-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.7
    • /
    • pp.1218-1226
    • /
    • 2008
  • With the development of semiconductor integrated technology and with the increasing use of multimedia functions in computer, more functions have been implemented as hardware. Nowadays, most microprocessors beyond 32 bits generally implement an integer multiplier as hardware. However, as for a divider, only specific microprocessor implements traditional SRT algorithm as hardware due to complexity of implementation and slow speed. This paper suggested an algorithm that uses a multiplier, 'w bit $\times$ w bit = 2w bit', to process $\frac{N}{D}$ integer division. That is, the reciprocal number D is first calculated, and then multiply dividend N to process integer division. In this paper, when the divisor D is '$D=0.d{\times}2^L$, 0.5 < 0.d < 1.0', approximate value of ' $\frac{1}{D}$', '$1.g{\times}2^{-L}$', which satisfies ' $0.d{\times}1.g=1+e$, $e<2^{-w}$', is defined as over reciprocal number and then an algorithm for over reciprocal number is suggested. This algorithm multiplies over reciprocal number '$01.g{\times}2^{-L}$' by dividend N to process $\frac{N}{D}$ integer division. The algorithm suggested in this paper doesn't require additional revision, because it can calculate correct reciprocal number. In addition, this algorithm uses only multiplier, so additional hardware for division is not required to implement microprocessor. Also, it shows faster speed than the conventional SRT algorithm and performs operation by word unit, accordingly it is more suitable to make compiler than the existing division algorithm. In conclusion, results from this study could be used widely for implementation SOC(System on Chip) and etc. which has been restricted to microprocessor and size of the hardware.

Study on Parallelized Rounding Algorithm in Floating-point Addition and Multiplication (부동소수점 덧셈과 곱셈에서의 라운딩 병렬화 알고리즘 연구)

  • 이원희;강준우
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1017-1020
    • /
    • 1998
  • We propose an algorithm which processes the floating-point $n_{addition}$traction and rounding in parallel. It also processes multiplication and rounding in the same way. The hardware model is presented that minimizes the delay time to get results for all the rounding modes defined in the IEEE Standards. An unified method to get the three bits(L, G, S)for the rounding is described. We also propose an unified guide line to determine the 1-bit shift for the post-normalization in the Floating-point $n_{addition}$traction and multiplication.

  • PDF

Image Processing Using Multiplierless Binomial QMF-Wavelet Filters (곱셈기가 없는 이진수 QMF-웨이브렛 필터를 사용한 영상처리)

  • 신종홍;지인호
    • Journal of Broadcast Engineering
    • /
    • v.4 no.2
    • /
    • pp.144-154
    • /
    • 1999
  • The binomial sequences are family of orthogonal sequences that can be generated with remarkable simplicity-no multiplications are necessary. This paper introduces a class of non-recursive multidimensional filters for frequency-selective image processing without multiplication operations. The magnitude responses are narrow-band. approximately gaussian-shaped with center frequencies which can be positioned to yield low-pass. band-pass. or high-pass filtering. Algorithms for the efficient implementation of these filters in software or in hardware are described. Also. we show that the binomial QMFs are the maximally flat magnitude square Perfect Reconstruction paraunitary filters with good compression capability and these are shown to be wavelet filters as well. In wavelet transform the original image is decomposed at different scales using a pyramidal algorithm architecture. The decomposition is along the vertical and horizontal direction and maintains constant the number of pixels required to describe the images. An efficient perfect reconstruction binomial QMF-Wavelet signal decomposition structure is proposed. The technique provides a set of filter solutions with very good amplitude responses and band split. The proposed binomial QMF-filter structure is efficient, simple to implement on VLSl. and suitable for multi-resolution signal decomposition and coding applications.

  • PDF

Design of Modified MDS Block for Performance Improvement of Twofish Cryptographic Algorithm (Twofish 암호알고리즘의 성능향상을 위한개선 된 MDS 블록 설계)

  • Jeong Woo-Yeol;Lee Seon-Heun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.109-114
    • /
    • 2005
  • Twofish cryptographic algorithm is concise algorithm itself than Rijndael cryptographic algorithm as AES, and easy of implementation is good, but the processing speed has slow shortcoming. Therefore this paper designed improved MDS block to improve Twofish cryptographic algorithm's speed. Problem of speed decline by a bottle-neck Phenomenon of the Processing speed existed as block that existing MDS block occupies Twofish cryptosystem's critical path. To reduce multiplication that is used by operator in MDS block this Paper removed a bottle-neck phenomenon and low-speed about MDS itself using LUT operation and modulo-2 operation. Twofish cryptosystem including modified MDS block designed by these result confirmed that bring elevation of the processing speed about 10$\%$ than existing Twofish cryptosystem.

  • PDF

Approximated Soft-Decision Demapping Algorithm for Coded 4+12+16 APSK (부호화된 4+12+16 APSK를 위한 근사화된 연판정 디매핑 알고리즘)

  • Lee, Jaeyoon;Jang, Yeonsoo;Yoon, Dongweon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37A no.9
    • /
    • pp.738-745
    • /
    • 2012
  • This paper proposes an approximated soft decision demapping algorithm with low computational complexity for coded 4+12+16 amplitude phase shift keying (APSK) in an additive white Gaussian noise (AWGN) channel. To derive the proposed algorithm, we approximate the decision boundaries for 4+12+16 APSK symbols, and then decide the log likelihood ratio (LLR) value for each bit from the approximated decision boundaries. Although the proposed algorithm shows about 0.6~1.1dB degradation on the error performance compared with the conventional max-log algorithm, it gives a significant result in terms of the computational complexity.