• Title/Summary/Keyword: Booth Multiplier

Search Result 59, Processing Time 0.021 seconds

A Design of the High-Speed Cipher VLSI Using IDEA Algorithm (IDEA 알고리즘을 이용한 고속 암호 VLSI 설계)

  • 이행우;최광진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.11 no.1
    • /
    • pp.64-72
    • /
    • 2001
  • This paper is on a design of the high-speed cipher IC using IDEA algorithm. The chip is consists of six functional blocks. The principal blocks are encryption and decryption key generator, input data circuit, encryption processor, output data circuit, operation mode controller. In subkey generator, the design goal is rather decrease of its area than increase of its computation speed. On the other hand, the design of encryption processor is focused on rather increase of its computation speed than decrease of its area. Therefore, the pipeline architecture for repeated processing and the modular multiplier for improving computation speed are adopted. Specially, there are used the carry select adder and modified Booth algorithm to increase its computation speed at modular multiplier. To input the data by 8-bit, 16-bit, 32-bit according to the operation mode, it is designed so that buffer shifts by 8-bit, 16-bit, 32-bit. As a result of simulation by 0.25 $\mu\textrm{m}$ process, this IC has achieved the throughput of 1Gbps in addition to its small area, and used 12,000gates in implementing the algorithm.

Implementation of RSA Exponentiator Based on Radix-$2^k$ Modular Multiplication Algorithm (Radix-$2^k$ 모듈라 곱셈 알고리즘 기반의 RSA 지수승 연산기 설계)

  • 권택원;최준림
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.35-44
    • /
    • 2002
  • In this paper, an implementation method of RSA exponentiator based on Radix-$2^k$ modular multiplication algorithm is presented and verified. We use Booth receding algorithm to implement Radix-$2^k$ modular multiplication and implement radix-16 modular multiplier using 2K-byte memory and CSA(carry-save adder) array - with two full adder and three half adder delays. For high speed final addition we use a reduced carry generation and propagation scheme called pseudo carry look-ahead adder. Furthermore, the optimum value of the radix is presented through the trade-off between the operating frequency and the throughput for given Silicon technology. We have verified 1,024-bit RSA processor using Altera FPGA EP2K1500E device and Samsung 0.3$\mu\textrm{m}$ technology. In case of the radix-16 modular multiplication algorithm, (n+4+1)/4 clock cycles are needed and the 1,024-bit modular exponentiation is performed in 5.38ms at 50MHz.

A Study On the Design of a Floating Point Unit for MPEG-2 AAC Decoder (MPEG-2 AAC 복호기를 위한 부동소수점유닛 설계에 관한 연구)

  • 구대성;김필중;김종빈
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.4
    • /
    • pp.355-355
    • /
    • 2002
  • In this paper, we designed a FPU(floating point unit) that it is very important and requires of high density when digital audio is designed. Almost audio system must support the multi-channel and required for high quality. A floating point arithmetic function in MPEG-2 AAC that implemented by hardware is able to realtime decoding when DSP realization. The reason is that MPEG-2 AAC is compatible to the Audio field of MPEG-4 and afterwards. We designed a FPU by hardware to increase the speed of a floating point unit with much calculation part in the MPEG-2 AAC Decoder. A FPU is composed of a multiplier and an adder. A multiplier used the Radix-4 Booth algorithm and an adder adopted 1's complement method for speed up. A form of a floating point unit has 8bit of exponent part and 24bit of mantissa. It's compatible with the IEEE single precision format and adopted a pipeline architecture to increase the speed of a processor. All of sub blocks are based on ISO/IEC 13818-7 standard. The algorithm is tested by C language and the design does by use of VHDL(VHSIC Hardware Description Language). The maximum operation speed is 23.2MHz and the stable operation speed is 19MHz.

Parameterized Soft IP Design of Complex-number Multiplier Core (복소수 승산기 코어의 파라미터화된 소프트 IP 설계)

  • 양대성;이승기;신경욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.10B
    • /
    • pp.1482-1490
    • /
    • 2001
  • 디지털 통신 시스템 및 신호처리 회로의 핵심 연산블록으로 사용될 수 있는 복소수 승산기 코어의 파라미터화된 소프트 IP (Intellectual Property)를 설계하였다. 승산기는 응용분야에 따라 요구되는 비트 수가 매우 다양하므로, 승산기 코어 IP는 비트 수를 파라미터화하여 설계하는 것이 필요하다. 본 논문에서는 복소수 승산기의 비트 수를 파라미터화 함으로써 사용자의 필요에 따라 승수와 피승수를 8-b∼24-b 범위에서 2-b 단위로 선택하여 사용할 수 있도록 하였으며, GUI 환경의 코어 생성기 PCMUL_GEN는 지정된 비트 크기를 갖는 복소수 승산기의 VHDL 모델을 생성한다. 복소수 승산기 코어 IP는 redundant binary (RB) 수치계와 본 논문에서 제안하는 새로운 radix-4 Booth 인코딩/디코딩 회로를 적용하여 설계되었으며, 이를 통해 기존의 방식보다 단순화된 내부 구조와 고속/저전력 특성을 갖는다. 설계된 IP는 Xilinx FPGA로 구현하여 기능을 검증하였다.

  • PDF

A Low-Complexity 128-Point Mixed-Radix FFT Processor for MB-OFDM UWB Systems

  • Cho, Sang-In;Kang, Kyu-Min
    • ETRI Journal
    • /
    • v.32 no.1
    • /
    • pp.1-10
    • /
    • 2010
  • In this paper, we present a fast Fourier transform (FFT) processor with four parallel data paths for multiband orthogonal frequency-division multiplexing ultra-wideband systems. The proposed 128-point FFT processor employs both a modified radix-$2^4$ algorithm and a radix-$2^3$ algorithm to significantly reduce the numbers of complex constant multipliers and complex booth multipliers. It also employs substructure-sharing multiplication units instead of constant multipliers to efficiently conduct multiplication operations with only addition and shift operations. The proposed FFT processor is implemented and tested using 0.18 ${\mu}m$ CMOS technology with a supply voltage of 1.8 V. The hardware- efficient 128-point FFT processor with four data streams can support a data processing rate of up to 1 Gsample/s while consuming 112 mW. The implementation results show that the proposed 128-point mixed-radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128-point FFT architectures.

Design of fast 16-bit multiplier with $0.35\mu m $ CMOS technology (fullcustom $0.35\mu m $ CMOS 공정을 이용한 16*16 bit 고속 승산기의 설계)

  • 박현규;신현철;김종진
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.229-232
    • /
    • 2000
  • 각종 범용 컴퓨터 및 디지탈 신호처리에서 중요한 역할을 하는 16비트 정수형, 2의 보수 형태의 곱셈연산을 수행하기 위한 고속 승산기구조를 설계하고 시뮬레이션 하였다. 부분곱을 합하는 부분은 일반적으로 전체 곱셈기 처리 지연시간의 절반정도를 차지하므로 이 부분의 설계방법이 곱셈기의 궁극적인 속도향상에 직접적인 영향을 미친다. 부분곱의 개수를 줄이기 위하여 Booth encoder를 사용하였고, partial product(부분곱)의 덧셈시간을 줄이기 위하여 4:2 CSA(can save adder)와 3:2 CSA로 CSA tree를 구성 하였으며, 최종결과는 carry look- ahead tree로 얻어진다. Hyundai CMOS 0.35$\mu\textrm{m}$ 1-poly 4-metal 공정으로 layout하여 설계하였으며, 곱셈시간은 2.7ns(tipical case)이하로 측정되었다.

  • PDF

Implementation of a Verification Environment using Layered Testbench (계층화된 테스트벤치를 이용한 검증 환경 구현)

  • Oh, Young-Jin;Song, Gi-Yong
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.12 no.2
    • /
    • pp.145-149
    • /
    • 2011
  • Recently, as the design of a system gets larger and more complex, functional verification method based on system-level becomes more important. The verification of a functional block mainly uses BFM(bus functional model). The larger the burden on functional verification is, the more the importance of configuring a proper verification environment increases rapidly. SystemVerilog unifies hardware design languages and verification languages in the form of extensions to the Veri log HDL. The processing of design description, function simulation and verification using same language has many advantages in system development. In this paper, we design DUT that is composed of AMBA bus and function blocks using SystemVerilog and verify the function of DUT in verification environment using layered testbench. Adaptive FIR filter and Booth's multiplier are chosen as function blocks. We confirm that verification environment can be reused through a minor adaptation of interface to verify functions of other DUT.

Design of AMBA AX I Slave Unit for Pipelined Arithmetic Unit (파이프라인 구조 연산회로를 위한 AMBA AXI Slave 설계)

  • Choi, Byeong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.712-713
    • /
    • 2011
  • In this paper, the AMBA AXI slave unit that can verify the pipelined arithmetic unit is proposed and the 2-stage 16-bit pipelined multiplier is introduced as design example. The proposed AXI slave unit consists of input buffer block memory, control registers, pipelined arithmetic unit, control unit, output buffer block memory, and AXI slave interface unit. The main operational procedures are divided into the following steps, such as burst-mode input data loading for the input buffer memory, programming of control registers, arithmetic operations for block data in the input buffer memory, and burst-mode output data unloading from output buffer memory to host processor. Because the proposed AXI slave unit is general structure, it can be efficiently applicable to AMBA AXI and AHB slave unit with pipelined arithmetic unit.

  • PDF

JPEG2000 IP Design and Implementation for SoC Design (SoC를 위한 JPEG2000 IP 설계 및 구현)

  • 정재형;한상균;홍성훈;김영철
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2002.11a
    • /
    • pp.63-68
    • /
    • 2002
  • JPEG2000은 기존의 정지영상압축부호화 방식에 비해 우수한 비트율-왜곡(Rate-Distortion)특성과 향상된 주관적 화질을 제공하며 인터넷, 디지털 영상카메라, 이동단말기, 의학영상 등 다양한 분야에서 적용될 수 있는 새로운 정지영상압축 표준이다. 본 논문에서는 SoC(System on a Chip)설계를 고려한 JPEG2000 인코더의 구조를 제안하고 IP(Intellectual Property)를 설계 및 검증하였다. 구현된 JPEG2000 IP는 DWT(Discrete Wavelet Transform)블록, 스칼라양자화블록, EBCOT(Embedded Block Coding with Optimized Truncation)블록으로 구성되어 있다. IP는 모의실험을 통해 구현 구조에 대한 타당성을 검증하였고, 반도체설계자산연구센터에서 제시한 'RTL Coding Guideline'에 따라 HDL을 설계하였다. 특히, DWT블록은 구현시 많은 연산과 메모리 용량이 필요하므로 영상을 저장할 외부 메모리를 사용하였고, 빠른 곱셈과 덧셈연산을 위한 3단 파이프라인 부스곱셈기(3-state pipeline booth multiplier)와 캐리예측 덧셈기(carry lookahead adder)를 사용하였다. 설계된 JPEG2000 IP들은 삼성 0.35$\mu\textrm{m}$ 라이브러리를 이용하여 Synopsys사 Design Analyzer 틀을 통해 논리 합성하였으며, Xillinx 100만 게이트 FPGA칩에 구현하여 그 동작을 검증하였다. 또한, Hard IP 설계를 위해 Avanti사의 Apollo툴을 이용하여 Layout을 수행하였다.

  • PDF