Search | Korea Science

Design of 32-bit Floating Point Multiplier for FPGA (FPGA를 위한 32비트 부동소수점 곱셈기 설계)

Xuhao Zhang;Dae-Ik Kim
- The Journal of the Korea institute of electronic communication sciences
- /
- v.19 no.2
- /
- pp.409-416
- /
- 2024
With the expansion of floating-point operation requirements for fast high-speed data signal processing and logic operations, the speed of the floating-point operation unit is the key to affect system operation. This paper studies the performance characteristics of different floating-point multiplier schemes, completes partial product compression in the form of carry and sum, and then uses a carry look-ahead adder to obtain the result. Intel Quartus II CAD tool is used for describing Verilog HDL and evaluating performance results of the floating point multipliers. Floating point multipliers are analyzed and compared based on area, speed, and power consumption. The FMAX of modified Booth encoding with Wallace tree is 33.96 Mhz, which is 2.04 times faster than the booth encoding, 1.62 times faster than the modified booth encoding, 1.04 times faster than the booth encoding with wallace tree. Furthermore, compared to modified booth encoding, the area of modified booth encoding with wallace tree is reduced by 24.88%, and power consumption of that is reduced by 2.5%.
https://doi.org/10.13067/JKIECS.2024.19.2.409 인용 PDF

Design of a Booth's Multiplier Suitable for Embedded Systems (임베디드 시스템에 적용이 용이한 Booth 알고리즘 방식의 곱셈기 설계)

Moon, San-Gook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2007.10a
- /
- pp.838-841
- /
- 2007
In this study, we implemented a $17^*17b$ binary digital multiplier using radix-4 Booth's algorithm. Two stage pipeline architecture was applied to achieve higher throughput and 4:2 adders were used for regular layout structure in the Wallace tree partition. To evaluate the circuit, several MPW chips were fabricated using Hynix 0.6-um 3M N-well CMOS technology. Also we proposed an efficient test methodology and did fault simulations. The chip contains 9115 transistors and the core area occupies about $1135^*1545$ mm2. The functional tests using ATS-2 tester showed that it can operate with 24 MHz clock at 5.0 V at room temperature.
PDF

Multiple-Valued Logic Multiplier for System-On-Panel (System-On-Panel을 위한 다치 논리 곱셈기 설계)

Hong, Moon-Pyo;Jeong, Ju-Young
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.2
- /
- pp.104-112
- /
- 2007
We developed a $7{\times}7$ parallel multiplier using LTPS-TFT. The proposed multiplier has multi-valued logic 7-3 Compressor with folding, 3-2 Compressor, and final carry propagation adder. Architecture minimized the carry propagation. And power consumption reduced by switching the current source to the circuit which is operated in current mode. The proposed multiplier improved PDP by 23%, EDP by 59%, and propagation delay time by 47% compared with Wallace Tree multiplier.
PDF KSCI

A Fast 64$\times$64-bit Multiplier for Crypto-Processor (암호 프로세서용 고속 64$\times$64 곱셈기)

서정욱;이상흥
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 1998.12a
- /
- pp.471-481
- /
- 1998
피승수를 승수로 곱하는 곱셈연산은 승수에 대한 많은 부분곱을 더하기 때문에 본질적으로 느린 연산이다. 특히, 큰 수를 사용하는 암호 프로세서에서는 매우 빠른 곱셈기가 요구된다. 현재까지 느린 연산의 개선책으로 radix 4, radix 8, 또는 radix 16의 변형 부스 알고리즘을 사용하여 부분곱의 수를 줄이려는 연구와 더불어 Wallace tree나 병렬 카운터를 사용하여 부분곱의 합을 빠르게 연산하는 방법이 연구되어 왔다. 본 논문에서는 암호 프로세서용 64$\times$64 비트 곱셈기를 구현하는데 있어서, 고속의 곱셈을 위하여 고속의 병렬 카운터를 제안하였으며, radix 4의 변형 부스 알고리즘을 이용하여 부분합을 만들고 부분합의 덧셈은 제안한 카운터를 사용하였다. 64$\times$64 비트 곱셈기를 구현함에 있어서 본 논문에서 제안된 카운터를 이용하는 것이 속도 면에서 Wallace scheme또는 Dadda scheme을 적용하여 구현하는 것 보다 31% 정도, Mehta의 카운터를 적용하여 구현하는 것 보다 21% 정도 개선되었다.
PDF

Implementation of 2,048-bit RSA Based on RNS(Residue Number Systems) (RNS(Residue Number Systems) 기반의 2,048 비트 RSA 설계)

권택원;최준림
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.41 no.4
- /
- pp.57-66
- /
- 2004
This paper proposes the design of a 2,048-bit RSA based on RNS(residue number systems) Montgomery modular multiplier As the systems that RNS processes a fast parallel modular multiplication for a large word partitioned into small words, we introduce Montgomery reduction method(MRM)［1］based on Wallace tree modular multiplier and 33 RNS bases with 64-bit size for RNS Montgomery modular multiplication in this paper. Also, for fast RNS modular multiplication, a modified method based on Chinese remainder theorem(CRT)［2］ is presented. We have verified 2,048-bit RSA based on RNS using Samsung 0.35${\mu}{\textrm}{m}$ technology and the 2,048-bit RSA is performed in 2.54㎳ at 100MHz.
PDF KSCI

An Efficient Test Method for a Full-Custom Design of a High-Speed Binary Multiplier (풀커스텀 (full-custom) 고속 곱셈기 회로의 효율적인 테스트 방안)

Moon, San-Gook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2007.10a
- /
- pp.830-833
- /
- 2007
In this paper, we implemented a $17{\times}17b$ binary digital multiplier using radix-4 Booth;s algorithmand proposed an efficient testing methodology for the full-custom design. A two-stage pipeline architecture was applied to achieve higher throughput and 4:2 adders were used for regular layout structure in the Wallace tree partition. Several chips were fabricated using LG Semicon 0.6-um 3-Metal N-well CMOS technology. We did fault simulations efficiently using the proposed test method resulting in the reduction of the number of faulty nodes by 88%. The chip contains 9115 transistors and the core area occupies $1135^*1545$ mm2. The functional tests using ATS-2 tester showed that it can operate with 24 MHz clock at 5.0 V at room temperature.
PDF

Design of a Correlator and an Access-code Generator for Bluetooth Baseband (블루투스 기저대역을 위한 상관기와 액세스 코드 생성 모듈의 설계)

Hwang Sun-Won;Lee Sang-Hoon;Shin Wee-Jae
- Journal of the Institute of Convergence Signal Processing
- /
- v.6 no.4
- /
- pp.206-211
- /
- 2005
We describe the design for a correlator and an access code generator in bluetooth system. These are used for a connection setting, a packet decision and a clock synchronization between Bluetooth units. The correlator consists of two blocks; carry save adder based on Wallace tree and threshold-value decision block. It determines on an useful packet and clock-synchronization for input signal of 1.0Mbps through the sliding-window correlating. The access-code generator also consists of two blocks; BCH(Bose-Chadhuri-Hocquenghem) cyclic encoder and control block. It generates the access-codes according to four steps' generation process based on Bluetooth standard. In order to solve synchronization problem, we make use of any memory as a pseudo random sequence. The proposed correlator and access-code generator were coded with VHDL. An FPGA Implementation of these modules and the simulation results are proved by Xilinx chip. The critical delay and correlative margin based on synthesis show the 4.689ns and the allowable correlation-error up to 7-bit.
PDF

A 32${\times}$32-b Multiplier Using a New Method to Reduce a Compression Level of Partial Products (부분곱 압축단을 줄인 32${\times}$32 비트 곱셈기)

홍상민;김병민;정인호;조태원
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.40 no.6
- /
- pp.447-458
- /
- 2003
A high speed multiplier is essential basic building block for digital signal processors today. Typically iterative algorithms in Signal processing applications are realized which need a large number of multiply, add and accumulate operations. This paper describes a macro block of a parallel structured multiplier which has adopted a 32$\times$32-b regularly structured tree (RST). To improve the speed of the tree part, modified partial product generation method has been devised at architecture level. This reduces the 4 levels of compression stage to 3 levels, and propagation delay in Wallace tree structure by utilizing 4-2 compressor as well. Furthermore, this enables tree part to be combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, multiplier architecture can be regularly laid out with same modules composed of Booth selectors, compressors and Modified Partial Product Generators (MPPG). At the circuit level new Booth selector with less transistors and encoder are proposed. The reduction in the number of transistors in Booth selector has a greater impact on the total transistor count. The transistor count of designed selector is 9 using PTL(Pass Transistor Logic). This reduces the transistor count by 50% as compared with that of the conventional one. The designed multiplier in 0.25${\mu}{\textrm}{m}$ technology, 2.5V, 1-poly and 5-metal CMOS process is simulated by Hspice and Epic. Delay is 4.2㎱ and average power consumes 1.81㎽/MHz. This result is far better than conventional multiplier with equal or better than the best one published.
PDF KSCI

A Design of 16${\times}$16-bit Redundant Binary MAC Using 0.25 ${\mu}{\textrm}{m}$ CMOS Technology

Kim, Tae-Min;Shin, Gun-Soon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.7 no.1
- /
- pp.122-128
- /
- 2003
In this paper, a 16${\times}$16-bit Multiplier and Accumulator (MAC) is designed using a Redundant Binary Adder (RBA) circuit so that it can make a fast addition of the Redundant Binary Partial Products (RB_PP's) by using Wallace-tree structure. Because a RBA adds two RB numbers, it acts as a 4-2 compressor, which reduces four inputs to two output signals. We propose a method to convert the Redundant Binary (RB) representation into the 2's complement binary representation. Instead of using the conventional full adders, a more efficient RB number to binary number converter can be designed with new conversion method.
PDF KSCI

Design of a Floating Point Multiplier for IEEE 754 Single-Precision Operations (IEEE 754 단정도 부동 소수점 연산용 곱셈기 설계)

Lee, Ju-Hun;Chung, Tae-Sang
- Proceedings of the KIEE Conference
- /
- 1999.11c
- /
- pp.778-780
- /
- 1999
Arithmetic unit speed depends strongly on the algorithms employed to realize the basic arithmetic operations.(add, subtract multiply, and divide) and on the logic design. Recent advances in VLSI have increased the feasibility of hardware implementation of floating point arithmetic units and microprocessors require a powerful floating-point processing unit as a standard option. This paper describes the design of floating-point multiplier for IEEE 754-1985 Single-Precision operation. Booth encoding algorithm method to reduce partial products and a Wallace tree of 4-2 CSA is adopted in fraction multiplication part to generate the $32{\times}32$ single-precision product. New scheme of rounding and sticky-bit generation is adopted to reduce area and timing. Also there is a true sign generator in this design. This multiplier have been implemented in a ALTERA FLEX EPF10K70RC240-4.
PDF

Search Result 15, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)