• Title/Summary/Keyword: Wallace 트리 곱셈기

Search Result 7, Processing Time 0.023 seconds

Design of 32-bit Floating Point Multiplier for FPGA (FPGA를 위한 32비트 부동소수점 곱셈기 설계)

  • Xuhao Zhang;Dae-Ik Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.2
    • /
    • pp.409-416
    • /
    • 2024
  • With the expansion of floating-point operation requirements for fast high-speed data signal processing and logic operations, the speed of the floating-point operation unit is the key to affect system operation. This paper studies the performance characteristics of different floating-point multiplier schemes, completes partial product compression in the form of carry and sum, and then uses a carry look-ahead adder to obtain the result. Intel Quartus II CAD tool is used for describing Verilog HDL and evaluating performance results of the floating point multipliers. Floating point multipliers are analyzed and compared based on area, speed, and power consumption. The FMAX of modified Booth encoding with Wallace tree is 33.96 Mhz, which is 2.04 times faster than the booth encoding, 1.62 times faster than the modified booth encoding, 1.04 times faster than the booth encoding with wallace tree. Furthermore, compared to modified booth encoding, the area of modified booth encoding with wallace tree is reduced by 24.88%, and power consumption of that is reduced by 2.5%.

A Bit-revel Arithmetic Optimization for Low-Power Circuits (저전력 회로를 위한 비트 단위의 연산 최 적화)

  • 엄준형
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04a
    • /
    • pp.16-18
    • /
    • 2002
  • 고속 회로 합성에 있어서, Wallace 트리 스타일은 연산을 위한 가장 효율적인 수행 방식의 하나로 인식 되어졌다. 그러나, 이러한 방법은 빠른 곱셈기의 수행이나 여러가지 연산수행 에 있어, 입력 시그널을 고려하지 않은 일반적인 구조로 수행되어졌다. 본 논문은 연산기에 있어서 이러한 제한점을 극복하는 문제를 다룬다. 우리는 캐리-세이브 방법을 덧셈, 뺄셈, 곱셈 이 혼합되어 있는 일반적인 연산 회로에 적용한다. 그 결과 효율적인 회로를 생성하며, 시그널 들의 임의의 시그널 스위칭 변화에 대해 회로의 전력 소모를 최적화 한다. 우리는 이러한 최적화 방법을 여러 디지털 필터에 적용시켜 보았고 이는 기존의 비트 단위가 아닌 캐리-세이브 수행방법보다 상당한 양의 전력 소모의 향상을 보였다.

  • PDF

Implementation of 2,048-bit RSA Based on RNS(Residue Number Systems) (RNS(Residue Number Systems) 기반의 2,048 비트 RSA 설계)

  • 권택원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.4
    • /
    • pp.57-66
    • /
    • 2004
  • This paper proposes the design of a 2,048-bit RSA based on RNS(residue number systems) Montgomery modular multiplier As the systems that RNS processes a fast parallel modular multiplication for a large word partitioned into small words, we introduce Montgomery reduction method(MRM)[1]based on Wallace tree modular multiplier and 33 RNS bases with 64-bit size for RNS Montgomery modular multiplication in this paper. Also, for fast RNS modular multiplication, a modified method based on Chinese remainder theorem(CRT)[2] is presented. We have verified 2,048-bit RSA based on RNS using Samsung 0.35${\mu}{\textrm}{m}$ technology and the 2,048-bit RSA is performed in 2.54㎳ at 100MHz.

High-Performance Multiplier Using Modified m-GDI(: modified Gate-Diffusion Input) Compressor (m-GDI 압축 회로를 이용한 고성능 곱셈기)

  • Si-Eun Lee;Jeong-Beom Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.2
    • /
    • pp.285-290
    • /
    • 2023
  • Compressors are widely used in high-speed electronic systems and are used to reduce the number of operands in multiplier. The proposed compressor is constructed based on the m-GDI(: modified gate diffusion input) to reduce the propagation delay time. This paper is compared the performance of compressors by applying 4-2, 5-2 and 6-2 m-GDI compressors to the multiplier, respectively. As a simulation results, compared to the 8-bit Dadda multiplier using the 4-2 and 6-2 compressor, the multiplier using the 5-2 compressor is reduced propagation delay time 13.99% and 16.26%, respectively. Also, the multiplier using the 5-2 compressor is reduced PDP(: Power Delay Product) 4.99%, 28.95% compared to 4-2 and 6-2 compressor, respectively. However, the multiplier using the 5-2 compression circuit is increased power consumption by 10.46% compared to the multiplier using the 4-2 compression circuit. In conclusion, the 8-bit Dadda multiplier using the 5-2 compressor is superior to the multipliers using the 4-2 and 6-2 compressors. The proposed circuit is implemented using TSMC 65nm CMOS process and its feasibility is verified through SPECTRE simulation.

Optimal Bit-level Arithmetic Optimization for High-Speed Circuits (고속 회로를 위한 비트 단위의 연산 최적화)

  • 엄준형;김영태;김태환;여준기;홍성백
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04a
    • /
    • pp.21-23
    • /
    • 2000
  • 고속 회로 합성에 있어서, Wallace 트리 스타일은 연산을 위한 가장 효율적인 수행방식의 하나로 인식되어 졌다. 그러나, 이러한 방법은 빠른 곱셈기의 수행이나 여러 가지 연산수행에 있어, 입력 시그널을 고려하지 않은 일반적인 구조로 수행되어졌다. 본 논문은 연산기에 있어서 이러한 제한점을 극복하는 문제를 다룬다. 우리는 캐리-세이브 방법을 덧셈, 뺄셈, 곱셈이 혼합되어 일T는 일반적인 연산 회로에 적용한다. 그 결과 효율적인 회로를 생성하며, 시그널들이 임의의 도달시간에 대해 회로의 도달시간을 최적화 한다. 또한, 우리는 최적 지연시간의 캐리-세이브 가산회로를 생성하는 효율적인 알고리즘을 제안하였다. 우리는 이러한 최적화 방법을 여러 고속 디지털 필터에 적용시켜 보았고 이는 기존의 비트 단위가 아닌 캐리-세이브 수행방법보다 5%에서 30%사이의 수행시간 향상을 가져왔다.

  • PDF

A 32${\times}$32-b Multiplier Using a New Method to Reduce a Compression Level of Partial Products (부분곱 압축단을 줄인 32${\times}$32 비트 곱셈기)

  • 홍상민;김병민;정인호;조태원
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.6
    • /
    • pp.447-458
    • /
    • 2003
  • A high speed multiplier is essential basic building block for digital signal processors today. Typically iterative algorithms in Signal processing applications are realized which need a large number of multiply, add and accumulate operations. This paper describes a macro block of a parallel structured multiplier which has adopted a 32$\times$32-b regularly structured tree (RST). To improve the speed of the tree part, modified partial product generation method has been devised at architecture level. This reduces the 4 levels of compression stage to 3 levels, and propagation delay in Wallace tree structure by utilizing 4-2 compressor as well. Furthermore, this enables tree part to be combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, multiplier architecture can be regularly laid out with same modules composed of Booth selectors, compressors and Modified Partial Product Generators (MPPG). At the circuit level new Booth selector with less transistors and encoder are proposed. The reduction in the number of transistors in Booth selector has a greater impact on the total transistor count. The transistor count of designed selector is 9 using PTL(Pass Transistor Logic). This reduces the transistor count by 50% as compared with that of the conventional one. The designed multiplier in 0.25${\mu}{\textrm}{m}$ technology, 2.5V, 1-poly and 5-metal CMOS process is simulated by Hspice and Epic. Delay is 4.2㎱ and average power consumes 1.81㎽/MHz. This result is far better than conventional multiplier with equal or better than the best one published.

An Implementation of Low Power MAC using Improvement of Multiply/Subtract Operation Method and PTL Circuit Design Methodology (승/감산 연산방법의 개선 및 PTL회로설계 기법을 이용한 저전력 MAC의 구현)

  • Sim, Gi-Hak;O, Ik-Gyun;Hong, Sang-Min;Yu, Beom-Seon;Lee, Gi-Yeong;Jo, Tae-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.4
    • /
    • pp.60-70
    • /
    • 2000
  • An 8$\times$8+20-bit MAC is designed with low power design methodologies at each of the system design levels. At algorithm level, a new method for multipl $y_tract operation is proposed, and it saves the transistor counts over conventional methods in hardware realization. A new Booth selector circuit using NMOS pass-transistor logic is also proposed at circuit level. It is superior to other circuits designed by CMOS in power-delay-product. And at architecture level, we adopted an ELM adder that is known to be the most efficient in power consumption, operating frequency, area and design regularity as the final adder. For registers, dynamic CMOS single-edge triggered flip-flops are used because they need less transistors per bit. To increase the operating frequency 2-stage pipeline architecture is adopted, and fast 4:2 compressors are applied in Wallace tree block. As a simulation result, the designed MAC in 0.6${\mu}{\textrm}{m}$ 1-poly 3-metal CMOS process is operated at 200MHz, 3.3V and consumed 35㎽ of power in multiply operation, and operated at 100MHz consuming 29㎽ in MAC operations, respectively.ly.

  • PDF