• Title/Summary/Keyword: Adders

Search Result 129, Processing Time 0.024 seconds

A Low-Power 2-D DCT/IDCT Architecture through Dynamic Control of Data Driven and Fine-Grain Partitioned Bit-Slices (데이터에 의한 구동과 세분화된 비트-슬라이스의 동적제어를 통한 저전력 2-D DCT/IDCT 구조)

  • Kim Kyeounsoo;Ryu Dae-Hyun
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.2
    • /
    • pp.201-210
    • /
    • 2005
  • This paper proposes a power efficient 2-dimensional DCT/IDCT architecture driven by input data to be processed. The architecture achieves low power by taking advantage of the typically large fraction of zero and small-valued input processing data in video and image data compression. In particular, it skips multiplication by zero and dynamically activates/deactivates required bit-slices of fine-grain bit partitioned adders within multipliers and accumulators using simple input ANDing and bit-slice MASKing. The processed results from 1-D DCT/IDCT do not have unnecessary sign extension bits (SEBs), which are used for further power reduction in matrix transposer. The results extracted by bit-level transition activity simulations indicate significant power reduction compared to conventional designs.

  • PDF

A Design of Low-power/Small-area Arithmetic Units for Mobile 3D Graphic Accelerator (휴대형 3D 그래픽 가속기를 위한 저전력/저면적 산술 연산기 회로 설계)

  • Kim Chay-Hyeun;Shin Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.5
    • /
    • pp.857-864
    • /
    • 2006
  • This paper describes a design of low-power/small-area arithmetic circuits which are vector processing unit powering nit, divider unit and square-root unit for mobile 3D graphic accelerator. To achieve area-efficient and low-power implementation that is an essential consideration for mobile environment, the fixed-point f[mat of 16.16 is adopted instead of conventional floating-point format. The vector processing unit is designed using redundant binary(RB) arithmetic. As a result, it can operate 30% faster and obtained gate count reduction of 10%, compared to the conventional methods which consist of four multipliers and three adders. The powering nit, divider unit and square-root nit are based on logarithm number system. The binary-to-logarithm converter is designed using combinational logic based on six-region approximation method. So, the powering mit, divider unit and square-root unit reduce gate count when compared with lookup table implementation.

An Efficient Test Method for a Full-Custom Design of a High-Speed Binary Multiplier (풀커스텀 (full-custom) 고속 곱셈기 회로의 효율적인 테스트 방안)

  • Moon, San-Gook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.830-833
    • /
    • 2007
  • In this paper, we implemented a $17{\times}17b$ binary digital multiplier using radix-4 Booth;s algorithmand proposed an efficient testing methodology for the full-custom design. A two-stage pipeline architecture was applied to achieve higher throughput and 4:2 adders were used for regular layout structure in the Wallace tree partition. Several chips were fabricated using LG Semicon 0.6-um 3-Metal N-well CMOS technology. We did fault simulations efficiently using the proposed test method resulting in the reduction of the number of faulty nodes by 88%. The chip contains 9115 transistors and the core area occupies $1135^*1545$ mm2. The functional tests using ATS-2 tester showed that it can operate with 24 MHz clock at 5.0 V at room temperature.

  • PDF

The Efficient 32×32 Inverse Transform Design for High Performance HEVC Decoder (고성능 HEVC 복호기를 위한 효율적인 32×32 역변환기 설계)

  • Han, Geumhee;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.4
    • /
    • pp.953-958
    • /
    • 2013
  • In this paper, an efficient hardware architecture is proposed for $32{\times}32$ inverse transform HEVC decoder. HEVC is a new image compression standard to deal with much larger image sizes compared with conventional image codecs, such as 4k, 8k images. To process huge image data effectively, it adopts various new block structures. Theses blocks consists of $4{\times}4$, $8{\times}8$, $16{\times}16$, and $32{\times}32$ block. This paper suggests an effective structures to process $32{\times}32$ inverse transform. This structure of inverse transform adopts the decomposed $16{\times}16$ matrixes of $32{\times}32$ matrix, and simplified the operations by implementing multiplying with shifters and adders. Additionally the operations frequency is downed by using multicycle paths. Also this structure can be easily adopted to a multi-size transform or a forward transform block in HEVC codec.

Design of Radix-4 FFT Processor Using Twice Perfect Shuffle (이중 완전 Shuffle을 이용한 Radix-4 FFT 프로세서의 설계)

  • Hwang, Myoung-Ha;Hwang, Ho-Jung
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.2
    • /
    • pp.144-150
    • /
    • 1990
  • This paper describes radix-4 Fast Fourier Transform (FFT) Processor designed with the new twice perfect shuffle developed from a perfect shuffle used in radix-2 FFT algorithm. The FFT Processor consists of a butterfly arithmetic circuit, address generators for input, output and coefficient, input and output registers and controller. Also, it requires the external ROM for storage of coefficient and RAM for input and output. The butterfly circuit includes 12 bit-serial ($16{\times}8$) multipliers, adders, subtractors and delay shift registers. Operating on 25 MHz two phase clock, this processor can compute 256 point FFT in 6168 clocks, i.e. 247 us and provides flexibility by allowing the user to select any size among 4,16,64,and256points. Being fabricated with 2-um double metal CMOS process, it includes about 28000 transistors and 55 pads in $8.0{\times}8.2mm^2$area.

  • PDF

The New Generation Circulation Method to Generalized Reed-Muller(GRM) Coefficients over GF(3) (극수의 순환성을 이용한 새로운 GF(3)상의 GRM 상수 생성 방법)

  • Lee, Chol-U;Che, Wenzhe;Kim, Heung-Soo
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.10 s.340
    • /
    • pp.17-24
    • /
    • 2005
  • This paper propose a new generation method of GRM coefficients using the circulation property of polarity over GF(3). The general method to derive GRM coefficients are obtain the filled polarity of GRM coefficients using RM expansion and expand it for the polarities. Since the general method has many operations when the number of the variables are incremented. Proposed method in this paper simplifies the generation procedure and reduces a number of operators compare to parallel type because of the cyclic property of polarity. Comparing to the proposed papers, the proposed method use only adders without multiplier. So it improves the complexity of the system with efficient composition of the circuits.

A Scalable Word-based RSA Cryptoprocessor with PCI Interface Using Pseudo Carry Look-ahead Adder (가상 캐리 예측 덧셈기와 PCI 인터페이스를 갖는 분할형 워드 기반 RSA 암호 칩의 설계)

  • Gwon, Taek-Won;Choe, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.8
    • /
    • pp.34-41
    • /
    • 2002
  • This paper describes a scalable implementation method of a word-based RSA cryptoprocessor using pseudo carry look-ahead adder The basic organization of the modular multiplier consists of two layers of carry-save adders (CSA) and a reduced carry generation and Propagation scheme called the pseudo carry look-ahead adder for the high-speed final addition. The proposed modular multiplier does not need complicated shift and alignment blocks to generate the next word at each clock cycle. Therefore, the proposed architecture reduces the hardware resources and speeds up the modular computation. We implemented a single-chip 1024-bit RSA cryptoprocessor based on the word-based modular multiplier with 256 datapaths in 0.5${\mu}{\textrm}{m}$ SOG technology after verifying the proposed architectures using FPGA with PCI bus.

A New Complex-Number Multiplication Algorithm using Radix-4 Booth Recoding and RB Arithmetic, and a 10-bit CMAC Core Design (Radix-4 Booth Recoding과 RB 연산을 이용한 새로운 복소수 승산 알고리듬 및 10-bit CMAC코어 설계)

  • 김호하;신경욱
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.9
    • /
    • pp.11-20
    • /
    • 1998
  • High-speed complex-number arithmetic units are essential to baseband signal processing of modern digital communication systems such as channel equalization, timing recovery, modulation and demodulation. In this paper, a new complex-number multiplication algorithm is proposed, which is based on redundant binary (RB) arithmetic combined with radix-4 Booth recoding scheme. The proposed algorithm reduces the number of partial product by one-half as compared with the conventional direct method using real-number multipliers and adders. It also leads to a highly parallel architecture and simplified circuit, resulting in high-speed operation and low power dissipation. To demonstrate the proposed algorithm, a prototype complex-number multiplier-accumulator (CMAC) core with 10-bit operands has been designed using 0.8-$\mu\textrm{m}$ N-Well CMOS technology. The designed CMAC core contains about 18,000 transistors on the area of about 1.60 ${\times}$ 1.93 $\textrm{mm}^2$. The functional and speed test results show that it can operate with 120-MHz clock at V$\sub$DD/=3.3-V, and its power consumption is given to about 63-mW.

  • PDF

A Study on the Hardware Implementation of Competitive Learning Neural Network with Constant Adaptaion Gain and Binary Reinforcement Function (일정 적응이득과 이진 강화함수를 가진 경쟁학습 신경회로망의 디지탈 칩 개발과 응용에 관한 연구)

  • 조성원;석진욱;홍성룡
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.7 no.5
    • /
    • pp.34-45
    • /
    • 1997
  • In this paper, we present hardware implemcntation of self-organizing feature map (SOFM) neural networkwith constant adaptation gain and binary reinforcement function on FPGA. Whereas a tnme-varyingadaptation gain is used in the conventional SOFM, the proposed SOFM has a time-invariant adaptationgain and adds a binary reinforcement function in order to compensate for the lowered abilityof SOFM due to the constant adaptation gain. Since the proposed algorithm has no multiplication operation.it is much easier to implement than the original SOFM. Since a unit neuron is composed of 1adde $r_tracter and 2 adders, its structure is simple, and thus the number of neurons fabricated onFPGA is expected to he large. In addition, a few control signal: ;:rp sufficient for controlling !he neurons.Experimental results show that each componeni ot thi inipiemented neural network operates correctlyand the whole system also works well.stem also works well.

  • PDF

Sign-Extension Overhead Reduction by Propagated-Carry Selection (전파캐리의 선택에 의한 부호확장 오버헤드의 감소)

  • 조경주;김명순;유경주;정진균
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.6C
    • /
    • pp.632-639
    • /
    • 2002
  • To reduce the area and power consumption in constant coefficient multiplications, the constant coefficient can be encoded using canonic signed digit(CSD) representation. When the partial product terms are added depending on the nonzero bit(1 or -1) positions in the CSD-encoded multiplier, all sign bits are properly extended before the addition takes place. In this paper, to reduce the overhead due to sign extension, a new method is proposed based on the fact that carry propagation in the sign extension part can be controlled such that a desired input bit can be propagated as a carry. Also, a fixed-width multiplier design method suitable for CSD multiplication is proposed. As an application, 43-tap filbert transformer for SSB/BPSK-DS/CDMA is implemented. It is shown that, about 16∼28% adders can be saved by the proposed method compared with the conventional methods.