• Title/Summary/Keyword: Booth's algorithm

Search Result 22, Processing Time 0.022 seconds

New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm (Radix-2 MBA 기반 병렬 MAC의 VLSI 구조)

  • Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.4
    • /
    • pp.94-104
    • /
    • 2008
  • In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with $250{\mu}m,\;180{\mu}m,\;130{\mu}m$ and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.

A Low-Complexity 128-Point Mixed-Radix FFT Processor for MB-OFDM UWB Systems

  • Cho, Sang-In;Kang, Kyu-Min
    • ETRI Journal
    • /
    • v.32 no.1
    • /
    • pp.1-10
    • /
    • 2010
  • In this paper, we present a fast Fourier transform (FFT) processor with four parallel data paths for multiband orthogonal frequency-division multiplexing ultra-wideband systems. The proposed 128-point FFT processor employs both a modified radix-$2^4$ algorithm and a radix-$2^3$ algorithm to significantly reduce the numbers of complex constant multipliers and complex booth multipliers. It also employs substructure-sharing multiplication units instead of constant multipliers to efficiently conduct multiplication operations with only addition and shift operations. The proposed FFT processor is implemented and tested using 0.18 ${\mu}m$ CMOS technology with a supply voltage of 1.8 V. The hardware- efficient 128-point FFT processor with four data streams can support a data processing rate of up to 1 Gsample/s while consuming 112 mW. The implementation results show that the proposed 128-point mixed-radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128-point FFT architectures.

An Efficient Test Method for a Full-Custom Design of a High-Speed Binary Multiplier (풀커스텀 (full-custom) 고속 곱셈기 회로의 효율적인 테스트 방안)

  • Moon, San-Gook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.830-833
    • /
    • 2007
  • In this paper, we implemented a $17{\times}17b$ binary digital multiplier using radix-4 Booth;s algorithmand proposed an efficient testing methodology for the full-custom design. A two-stage pipeline architecture was applied to achieve higher throughput and 4:2 adders were used for regular layout structure in the Wallace tree partition. Several chips were fabricated using LG Semicon 0.6-um 3-Metal N-well CMOS technology. We did fault simulations efficiently using the proposed test method resulting in the reduction of the number of faulty nodes by 88%. The chip contains 9115 transistors and the core area occupies $1135^*1545$ mm2. The functional tests using ATS-2 tester showed that it can operate with 24 MHz clock at 5.0 V at room temperature.

  • PDF

A Study on Multiplier Architectures Optimized for 32-bit RISC Processor with 3-Stage Pipeline (32비트 3단 파이프라인을 가진 RISC 프로세서에 최적화된 Multiplier 구조에 관한 연구)

  • 정근영;박주성;김석찬
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.11
    • /
    • pp.123-130
    • /
    • 2004
  • This paper describes a multiplier architecture optimized for 32 bit RISC processor with 3-stage pipeline. The multiplier of ARM7, the target processor, is variably carried out on the execution stage of pipeline within 7 cycles. The included multiplier employs a modified Booth's algerian to produce 64 bit multiplication and addition product and it has 6 separate instructions. We analyzed several multiplication algorithm such as radix4-32${\times}$8, radix4-32${\times}$16 and radix8-32${\times}$32 to decide which multiplication architecture is most fit for a typical architecture of ARM7. VLSI area, cycle delay time and execution cycle number is the index of an efficient design and the final multiplier was designed on these indexes. To verify the operation of embedded multiplier, it was simulated with various audio algorithms.

A Study on the Probabilistic Production Cost Simulation by the Mixture of Cumulants Approximation (Mixture of Cumulants Approximaton 법에 의한 발전 시물레이션에 관한 연구)

  • 송길영;김용하
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.40 no.1
    • /
    • pp.1-9
    • /
    • 1991
  • This paper describes a new method of calculating expected energy generation and loss of load probability (L.O.L.P) for electric power system operation and expansion planning. The method represents an equivalent load duration curve (E.L.D.C) as a mixture of cumulants approximation (M.O.N.A). By regarding a load distribution as many normal distributions-rather than one normal distribution-and representing each of them in terms of Gram-Charlier expansion, we could improve the accuracy of results. We developed an algorithm which automatically determines the number of distribution and demarcation points. In modeling of a supply system, we made subsets of generators according to the number of generator outage: since the calculation of each subset's moment needs to be processed rapidly, we further developed specific recursive formulae. The method is applied to the test systems and the results are compared with those of cumulant, M.O.N.A. and Booth-Baleriaux method. It is verified that the M.O.C.A. method is faster and more accure than any other method.

  • PDF

An Optimized Hybrid Radix MAC Design (최적화된 4진18진 혼합 MAC 설계)

  • 정진우;김승철;이용주;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.173-176
    • /
    • 2002
  • This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs the radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of the radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.

  • PDF

17$\times$17-b Multiplier for 32-bit RISC/DSP Processors (32 비트 RISC/DSP 프로세서를 위한 17 비트 $\times$ 17 비트 곱셈기의 설계)

  • 박종환;문상국;홍종욱;문병인;이용석
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.914-917
    • /
    • 1999
  • The paper describes a 17 $\times$ 17-b multiplier using the Radix-4 Booth’s algorithm. which is suitable for 32-bit RISC/DSP microprocessors. To minimize design area and achieve improved speed, a 2-stage pipeline structure is adopted to achieve high clock frequency. Each part of circuit is modeled and optimized at the transistor level, verification of functionality and timing is performed using HSPICE simulations. After modeling and validating the circuit at transistor level, we lay it out in a 0.35 ${\mu}{\textrm}{m}$ 1-poly 4-metal CMOS technology and perform LVS test to compare the layout with the schematic. The simulation results show that maximum frequency is 330MHz under worst operating conditions at 55$^{\circ}C$ , 3V, The post simulation after layout results shows 187MHz under worst case conditions. It contains 9, 115 transistors and the area of layout is 0.72mm by 0.97mm.

  • PDF

An Optimized Hybrid Radix MAC Design (최적화된 4진/8진 혼합 MAC 설계)

  • 정진우;김승철;이용주;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06a
    • /
    • pp.125-128
    • /
    • 2002
  • This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs tile radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of tile radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.

  • PDF

New Parallel MDC FFT Processor for Low Computation Complexity (연산복잡도 감소를 위한 새로운 8-병렬 MDC FFT 프로세서)

  • Kim, Moon Gi;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.3
    • /
    • pp.75-81
    • /
    • 2015
  • This paper proposed the new eight-parallel MDC FFT processor using the eight-parallel MDC architecture and the efficient scheduling scheme. The proposed FFT processor supports the 256-point FFT based on the modified radix-$2^6$ FFT algorithm. The proposed scheduling scheme can reduce the number of complex multipliers from eight to six without increasing delay buffers and computation cycles. Moreover, the proposed FFT processor can be used in OFDM systems required high throughput and low hardware complexity. The proposed FFT processor has been designed and implemented with a 90nm CMOS technology. The experimental result shows that the area of the proposed FFT processor is $0.27mm^2$. Furthermore, the proposed eight-parallel MDC FFT processor can achieve the throughput rate up to 2.7 GSample/s at 388MHz.

A STUDY ON THE PROBABILISTIC PRODUCTION COST SIMULATION BY THE MIXTURE OF CUMULANTS APPROXIMATION (MIXTURE OF CUMULANTS APPROXIMATION 법에 의한 발전시뮬레이션에 관한 연구)

  • Song, K.Y.;Kim, Y.H.;Cha, J.M.
    • Proceedings of the KIEE Conference
    • /
    • 1990.07a
    • /
    • pp.154-157
    • /
    • 1990
  • This paper describes a new method of calculating expected energy generation and loss of load probability (L.O.L.P) for electric power system operation and expansion planning. The method represents an equivalent load duration curve (E.L.D.C) as a mixture of cumulants approximation (M.O.C.A), which is the general case of mixture of normals approximation (M.O.N.A). By regarding a load distribution as many normal distributions-rather than one normal distribution-and representing each of them in terms of Gram-Charller expansion, we could improve the accuracy of results. We developed an algorithm which automatically determines the number of distribution and demarcation points. In modelling of a supply system, we made subsets of generators according to the number of generator outage: since the calculation of each subset's moment needs to be processed rapidly, we futher developed specific recursive formulae. The method is applied to the test systems and the results are compared with those of cumulant, M.O.N.A and Booth-Baleriaux method. It is verified that the M.O.C.A method is faster and more accurate than any other methods.

  • PDF