• Title/Summary/Keyword: 부분곱 감소

Search Result 34, Processing Time 0.026 seconds

Design of combined unsigned and signed parallel squarer (Unsigned와 signed 겸용 병렬 제곱기의 설계)

  • Cho, Kyung-Ju
    • Smart Media Journal
    • /
    • v.3 no.1
    • /
    • pp.39-45
    • /
    • 2014
  • The partial product matrix of a parallel squarer are symmetric about the diagonal. To reduce the number of partial product bits and the depth of partial product matrix, it can be typically folded, shifted and bit-rearranged. In this paper, an efficient design approach for the combined squarer, capable of operating on either unsigned or signed numbers based on a mode selection signal, is presented. By simulations, it is shown that the proposed combined squarers lead to up to 18% reduction in area, 11% reduction in propagation delay and 9% reduction in power consumption compared with the previous combined squarers.

Design of Partial Product Accumulator using Multi-Operand Decimal CSA and Improved Decimal CLA (다중 피연산자 십진 CSA와 개선된 십진 CLA를 이용한 부분곱 누산기 설계)

  • Lee, Yang;Park, TaeShin;Kim, Kanghee;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.56-65
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

Implementation of Hardware Data Prefetcher Adaptable for Various State-of-the-Art Workload (다양한 최신 워크로드에 적용 가능한 하드웨어 데이터 프리페처 구현)

  • Kim, KangHee;Park, TaeShin;Song, KyungHwan;Yoon, DongSung;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.20-35
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

Area-Efficient Squarer and Fixed-Width Squarer Design (저면적 제곱기 및 고정길이 제곱기의 설계)

  • Cho, Kyung-Ju
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.3
    • /
    • pp.42-47
    • /
    • 2011
  • The partial product matrix (PPM) of a parallel squarer is symmetric. To reduce the depth of PPM, it can be folded, shifted and rearranged. In this paper, we present an area-efficient squarer design method using new partial product rearrangement. Also, a fixed-width squarer design method of the proposed squarer is presented. By simulations, it is shown that the proposed squarers lead to up to 17% reduction in area, 10% reduction in propagation delay and 10% reduction in power consumption compared with previous squarers. By using the proposed fixed-width squarers, the area, propagation delay and power consumption can be further reduced up to 30%, 16% and 28%, respectively.

Design of Parallel Decimal Multiplier using Limited Range of Signed-Digit Number Encoding (제한된 범위의 Signed-Digit Number 인코딩을 이용한 병렬 십진 곱셈기 설계)

  • Hwang, In-Guk;Kim, Kanghee;Yoon, WanOh;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.3
    • /
    • pp.50-58
    • /
    • 2013
  • In this paper, parallel decimal fixed-point multiplier which uses the limited range of Singed-Digit number encoding and the reduction step is proposed. The partial products are generated without carry propagation delay by encoding a multiplicand and a multiplier to the limited range of SD number. With the limited range of SD number, the proposed multiplier can improve the partial product reduction step by increasing the number of possible operands for multi-operand SD addition. In order to estimate the proposed parallel decimal multiplier, synthesis is implemented using Design Compiler with SMIC 180nm CMOS technology library. Synthesis results show that the delay of proposed parallel decimal multiplier is reduced by 4.3% and the area by 5.3%, compared to the existing SD parallel decimal multiplier. Despite of the slightly increased delay and area of partial product generation step, the total delay and area are reduced since the partial product reduction step takes the most proportion.

Sign-Extension Reduction by Propagated-Carry Selection (전파 캐리의 선택에 의한 부호확장 오버헤드의 감소)

  • 이광철;조경주;박홍열;정진균
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.931-934
    • /
    • 2001
  • 고정 계수를 갖는 곱셈기의 구현 시 면적과 전력소모를 줄이기 위해서 곱셈계수를 CSD(Canonic Signed Digit) 형태로 표현 할 수 있다. CSD 계수의 1 또는 -1의 위치에 따라 부분곱들을 시프트 하여 더할 때 모든 부분곱들의 부호확장이 필요하며 이로 인해 하드웨어의 오버헤드가 증가하게 된다. 본 논문에서는 부호확장 부분에서의 캐리 전파를 적절히 조절함으로써 부호확장으로 인한 오버헤드를 조절 할 수 있다는 사실을 이용하여 새로운 부호확장 오버혜드 감소방법을 제시한다. 제안한 방법과 기존의 방법을 다양한 시뮬레이션을 통해서 비교하고 기존의 방법에 비해 약 30%의 부호확장 오버헤드를 줄일 수 있음을 보인다.

  • PDF

Design of a High Speed 4-2 Compressor Architecture (고속 4-2 압축기 구조의 설계)

  • Kim, Seung-Wan;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.273-274
    • /
    • 2014
  • 4-2 압축기는 곱셈기의 부분 곱 합 트리(partial product summation tree)의 기본적인 구성요소이다. 본 논문은 고속 연산이 가능한 4-2 압축기의 구조를 제안한다. 제안한 구조는 최적화된 XOR-XNOR와 MUX로 구성된다 이 구조는 기존의 구조에 비해 신호 전달시간이 감소하여 고속 연산이 가능한 장점을 갖는다.

  • PDF

Approximate Multiplier with High Density, Low Power and High Speed using Efficient Partial Product Reduction (효율적인 부분 곱 감소를 이용한 고집적·저전력·고속 근사 곱셈기)

  • Seo, Ho-Sung;Kim, Dae-Ik
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.4
    • /
    • pp.671-678
    • /
    • 2022
  • Approximate computing is an computational technique that is acceptable degree of inaccurate results of accurate results. Approximate multiplication is one of the approximate computing methods for high-performance and low-power computing. In this paper, we propose a high-density, low-power, and high-speed approximate multiplier using approximate 4-2 compressor and improved full adder. The approximate multiplier with approximate 4-2 compressor consists of three regions of the exact, approximate and constant correction regions, and we compared them by adjusting the size of region by applying an efficient partial product reduction. The proposed approximate multiplier was designed with Verilog HDL and was analyzed for area, power and delay time using Synopsys Design Compiler (DC) on a 25nm CMOS process. As a result of the experiment, the proposed multiplier reduced area by 10.47%, power by 26.11%, and delay time by 13% compared to the conventional approximate multiplier.

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.7
    • /
    • pp.53-61
    • /
    • 2004
  • In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.

A 32${\times}$32-b Multiplier Using a New Method to Reduce a Compression Level of Partial Products (부분곱 압축단을 줄인 32${\times}$32 비트 곱셈기)

  • 홍상민;김병민;정인호;조태원
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.6
    • /
    • pp.447-458
    • /
    • 2003
  • A high speed multiplier is essential basic building block for digital signal processors today. Typically iterative algorithms in Signal processing applications are realized which need a large number of multiply, add and accumulate operations. This paper describes a macro block of a parallel structured multiplier which has adopted a 32$\times$32-b regularly structured tree (RST). To improve the speed of the tree part, modified partial product generation method has been devised at architecture level. This reduces the 4 levels of compression stage to 3 levels, and propagation delay in Wallace tree structure by utilizing 4-2 compressor as well. Furthermore, this enables tree part to be combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, multiplier architecture can be regularly laid out with same modules composed of Booth selectors, compressors and Modified Partial Product Generators (MPPG). At the circuit level new Booth selector with less transistors and encoder are proposed. The reduction in the number of transistors in Booth selector has a greater impact on the total transistor count. The transistor count of designed selector is 9 using PTL(Pass Transistor Logic). This reduces the transistor count by 50% as compared with that of the conventional one. The designed multiplier in 0.25${\mu}{\textrm}{m}$ technology, 2.5V, 1-poly and 5-metal CMOS process is simulated by Hspice and Epic. Delay is 4.2㎱ and average power consumes 1.81㎽/MHz. This result is far better than conventional multiplier with equal or better than the best one published.