• Title/Summary/Keyword: Booth's algorithm

Search Result 22, Processing Time 0.022 seconds

Design of a Booth's Multiplier Suitable for Embedded Systems (임베디드 시스템에 적용이 용이한 Booth 알고리즘 방식의 곱셈기 설계)

  • Moon, San-Gook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.838-841
    • /
    • 2007
  • In this study, we implemented a $17^*17b$ binary digital multiplier using radix-4 Booth's algorithm. Two stage pipeline architecture was applied to achieve higher throughput and 4:2 adders were used for regular layout structure in the Wallace tree partition. To evaluate the circuit, several MPW chips were fabricated using Hynix 0.6-um 3M N-well CMOS technology. Also we proposed an efficient test methodology and did fault simulations. The chip contains 9115 transistors and the core area occupies about $1135^*1545$ mm2. The functional tests using ATS-2 tester showed that it can operate with 24 MHz clock at 5.0 V at room temperature.

  • PDF

Elliptic Curve Scalar Point Multiplication Using Radix-4 Modified Booth's Algorithm (Radix-4 Modified Booth's 알고리즘을 응용한 타원곡선 스칼라 곱셈)

  • 문상국
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.6
    • /
    • pp.1212-1217
    • /
    • 2004
  • The main back-bone operation in elliptic curve cryptosystems is scalar point multiplication. The most frequently used method implementing the scalar point multiplication, which is performed in the upper level of GF multiplication and GF division, has been the double-and-add algorithm, which is recently challenged by NAF(Non-Adjacent Format) algorithm. In this paper, we propose a more efficient and novel scalar multiplication method than existing double-and-add by applying redundant receding which originates from radix-4 Booth's algorithm. After deriving the novel quad-and-add algorithm, we created a new operation, named point quadruple, and verified with real application calculation to utilize it. Derived numerical expressions were verified using both C programs and HDL (Hardware Description Language) in real applications. Proposed method of elliptic curve scalar point multiplication can be utilized in many elliptic curve security applications for handling efficient and fast calculations.

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.7
    • /
    • pp.53-61
    • /
    • 2004
  • In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.

Low-power Horizontal DA Filter Structure Using Radix-16 Modified Booth Algorithm (Radix-16 Modified Booth 알고리즘을 이용한 저전력 Horizontal DA 필터 구조)

  • Shin, Ji-Hye;Jang, Young-Beom
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.12
    • /
    • pp.31-38
    • /
    • 2010
  • In tins paper, a new DA(Distributed Arithmetic) tilter implementation technique has been proposed. Contrary to vertical directional calculation of input sample bit format in the conventional DA implementation technique, proposed implementation technique utilizes horizontal directional calculation of input sample bit format. Since proposed technique calculates in horizontal direction, it does not need ROM and utilizes the Modified Booth algorithm. Furthermore proposed technique can be applied to implement the variable coefficients filters in addition to the fixed coefficients filters. Using conventional and proposed techniques, a 20 tap filter is implemented by Verilog-HDL coding. Through Synopsis synthesis tool, it has been shown that 41.6% area reduction can be achieved.

Design of a Truncated Floating-Point Multiplier for Graphic Accelerator of Mobile Devices (모바일 그래픽 가속기용 부동소수점 절사 승산기 설계)

  • Cho, Young-Sung;Lee, Yong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.3
    • /
    • pp.563-569
    • /
    • 2007
  • As the mobile communication and the semiconductor technology is improved continuously, mobile contents such as the multimedia service and the 2D/3D graphics which require high level graphics are serviced recently. Mobile chips should consume small die area and low power. In this paper, we design a truncated floating-point multiplier that is useful for the 2D/3D vector graphics in mobile devices. The truncated multiplier is based on the radix-4 Booth's encoding algorithm and a truncation algorithm is used to achieve small area and low power. The average percent error of the multiplier is as small as 0.00003% and neglectable for mobile applications. The synthesis result using 0.35um CMOS cell library shows that the number of gates for the truncated multiplier is only 33.8 percent of the conventional radix-4 Booth's multiplier.

Design of a Multi-Valued Arithmetic Processor with Encoder and Decoder (인코더, 디코오더를 가지는 다치 연산기 설계)

  • 박진우;양대영;송홍복
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.2 no.1
    • /
    • pp.147-156
    • /
    • 1998
  • In this paper, an arithmetic processor using multi-valued logic is designed. For implementing of multi-valued logic circuits, we use current-mode CMOS circuits and design encoder which change binary voltage-mode signals to multi-valued current-mode signals and decoder which change results of arithmetic to binary voltage-mode signals. To reduce the number of partial product we use 4-radix SD number partial product generation algorithm that is an extension of the modified Booth's algorithm. We demonstrate the effectiveness of the proposed arithmetic circuits through SPICE simulation and Hardware emulation using FPGA chip.

  • PDF

Variable Radix-Two Multibit Coding and Its VLSI Implementation of DCT/IDCT (가변길이 다중비트 코딩을 이용한 DCT/IDCT의 설계)

  • 김대원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.12
    • /
    • pp.1062-1070
    • /
    • 2002
  • In this paper, variable radix-two multibit coding algorithm is presented and applied in the implementation of discrete cosine transform(DCT) and inverse discrete cosine transform(IDCT). Variable radix-two multibit coding means the 2k SD (signed digit) representation of overlapped multibit scanning with variable shift method. SD represented by 2k generates partial products, which can be easily implemented with shifters and adders. This algorithm is most powerful for the hardware implementation of DCT/IDCT with constant coefficient matrix multiplication. This paper introduces the suggested algorithm, it's proof and the implementation of DCT/IDCT The implemented IDCT chip with 8 PEs(Processing Elements) and one transpose memory runs at a tate of 400 Mpixels/sec at 54MHz frequency for high speed parallel signal processing, and it's verified in HDTV and MPEG decoder.

A 8192-Point FFT Processor Based on the CORDIC Algorithm for OFDM System (CORDIC 알고리듬에 기반 한 OFDM 시스템용 8192-Point FFT 프로세서)

  • Park, Sang-Yoon;Cho, Nam-Ik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.8B
    • /
    • pp.787-795
    • /
    • 2002
  • This paper presents the architecture and the implementation of a 2K/4K/8K-point complex Fast Fourier Transform(FFT) processor for Orthogonal Frequency-Division Multiplexing (OFDM) system. The architecture is based on the Cooley-Tukey algorithm for decomposing the long DFT into short length multi-dimensional DFTs. The transposition memory, shuffle memory, and memory mergence method are used for the efficient manipulation of data for multi-dimensional transforms. Booth algorithm and the COordinate Rotation DIgital Computer(CORDIC) processor are employed for the twiddle factor multiplications in each dimension. Also, for the CORDIC processor, a new twiddle factor generation method is proposed to obviate the ROM required for storing the twiddle factors. The overall 2K/4K/8K-FFT processor requires 600,000 gates, and it is implemented in 1.8 V, 0.18 ${\mu}m$ CMOS. The processor can perform 8K-point FFT in every 273 ${\mu}s$, 2K-point every 68.26 ${\mu}s$ at 30MHz, and the SNR is over 48dB, which are enough performances for the OFDM in DVB-T.

A Design of the High-Speed Cipher VLSI Using IDEA Algorithm (IDEA 알고리즘을 이용한 고속 암호 VLSI 설계)

  • 이행우;최광진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.11 no.1
    • /
    • pp.64-72
    • /
    • 2001
  • This paper is on a design of the high-speed cipher IC using IDEA algorithm. The chip is consists of six functional blocks. The principal blocks are encryption and decryption key generator, input data circuit, encryption processor, output data circuit, operation mode controller. In subkey generator, the design goal is rather decrease of its area than increase of its computation speed. On the other hand, the design of encryption processor is focused on rather increase of its computation speed than decrease of its area. Therefore, the pipeline architecture for repeated processing and the modular multiplier for improving computation speed are adopted. Specially, there are used the carry select adder and modified Booth algorithm to increase its computation speed at modular multiplier. To input the data by 8-bit, 16-bit, 32-bit according to the operation mode, it is designed so that buffer shifts by 8-bit, 16-bit, 32-bit. As a result of simulation by 0.25 $\mu\textrm{m}$ process, this IC has achieved the throughput of 1Gbps in addition to its small area, and used 12,000gates in implementing the algorithm.

A Study On the Design of a Floating Point Unit for MPEG-2 AAC Decoder (MPEG-2 AAC 복호기를 위한 부동소수점유닛 설계에 관한 연구)

  • 구대성;김필중;김종빈
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.4
    • /
    • pp.355-355
    • /
    • 2002
  • In this paper, we designed a FPU(floating point unit) that it is very important and requires of high density when digital audio is designed. Almost audio system must support the multi-channel and required for high quality. A floating point arithmetic function in MPEG-2 AAC that implemented by hardware is able to realtime decoding when DSP realization. The reason is that MPEG-2 AAC is compatible to the Audio field of MPEG-4 and afterwards. We designed a FPU by hardware to increase the speed of a floating point unit with much calculation part in the MPEG-2 AAC Decoder. A FPU is composed of a multiplier and an adder. A multiplier used the Radix-4 Booth algorithm and an adder adopted 1's complement method for speed up. A form of a floating point unit has 8bit of exponent part and 24bit of mantissa. It's compatible with the IEEE single precision format and adopted a pipeline architecture to increase the speed of a processor. All of sub blocks are based on ISO/IEC 13818-7 standard. The algorithm is tested by C language and the design does by use of VHDL(VHSIC Hardware Description Language). The maximum operation speed is 23.2MHz and the stable operation speed is 19MHz.