• Title/Summary/Keyword: distributed arithmetic(DA)

Search Result 29, Processing Time 0.028 seconds

Hardwired Distributed Arithmetic for Multiple Constant Multiplications and Its Applications for Transformation (다중 상수 곱셈을 위한 하드 와이어드 분산 연산)

  • Kim, Dae-Won;Choi, Jun-Rim
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.949-952
    • /
    • 2005
  • We propose the hardwired distributed arithmetic which is applied to multiple constant multiplications and the fixed data path in the inner product of fixed coefficient as a result of variable radix-2 multi-bit coding. Variable radix-2 multi-bit coding is to reduce the partial product in constant multiplication and minimize the number of addition and shifts. At results, this procedure reduces the number of partial products that the required multiplication timing is shortened, whereas the area reduced relative to the DA architecture. Also, this architecture shows the best performance for DCT/IDCT and DWT architecture in the point of area reduction up to 20% from reducing the partial products up to 40% maximally.

  • PDF

VLSI Architecture of Fast Jacket Transform (Fast Jacket Transform의 VLSI 아키텍쳐)

  • 유경주;홍선영;이문호;정진균
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.769-772
    • /
    • 2001
  • Waish-Hadamard Transform은 압축, 필터링, 코드 디자인 등 다양한 이미지처리 분야에 응용되어왔다. 이러한 Hadamard Transform을 기본으로 확장한 Jacket Transform은 행렬의 원소에 가중치를 부여함으로써 Weighted Hadamard Matrix라고 한다. Jacket Matrix의 cocyclic한 특성은 암호화, 정보이론, TCM 등 더욱 다양한 응용분야를 가질 수 있고, Space Time Code에서 대역효율, 전력면에서도 효율적인 특성을 나타낸다 [6],[7]. 본 논문에서는 Distributed Arithmetic(DA) 구조를 이용하여 Fast Jacket Transform(FJT)을 구현한다. Distributed Arithmetic은 ROM과 어큐뮬레이터를 이용하고, Jacket Watrix의 행렬을 분할하고 간략화하여 구현함으로써 하드웨어의 복잡도를 줄이고 기존의 시스톨릭한 구조보다 면적의 이득을 얻을 수 있다. 이 방법은 수학적으로 간단할 뿐 만 아니라 행렬의 곱의 형태를 단지 덧셈과 뺄셈의 형태로 나타냄으로써 하드웨어로 쉽게 구현할 수 있다. 이 구조는 입력데이타의 워드길이가 n일 때, O(2n)의 계산 복잡도를 가지므로 기존의 시스톨릭한 구조와 비교하여 더 적은 면적을 필요로 하고 FPGA로의 구현에도 적절하다.

  • PDF

Low-power Design and Implementation of IMT-2000 Interpolation Filter using Add/Sub Processor (덧셈 프로세서를 사용한 IMT-2000 인터폴레이션 필터의 저전력 설계 및 구현)

  • Jang Young-Beom;Lee Hyun-Jung;Moon Jong-Beom;Lee Won-Sang
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.1
    • /
    • pp.79-85
    • /
    • 2005
  • In this paper, low-power design and implementation techniques for IMT-2000 interpolation filter are proposed. Processor technique for DA(Distributed Arithmetic) filter and minimization technique for number of addition in CSD(Canonic Signed Digit) filter are utilized for low-power implementation. proposed filter structure consists of 3 blocks. In the first CSD coefficient block, every possible 4 bit CSD coefficients are calculated and stored. In second processor block, multiplication is done by MUX and addition processor in terms of filter coefficient. Finally, in third shift register block, multiplied values are output and stored in shift register. For IMT-2000 interpolation filter, proposed and conventional structures are implemented by using Verilog-HDL coding. Gate counts for the proposed structure is reduced to 31.57% comparison with those of the conventional one.

Low-Power Radix-4 butterfly structure for OFDM FFT (OFDM FFT용 저전력 Radix-4 나비연산기 구조)

  • Kim, Do-Han;Kim, Bee-Chul;Hur, Eun-Sung;Lee, Won-Sang;Jang, Young-Beom
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.13-14
    • /
    • 2006
  • In this paper, an efficient butterfly structure for Radix-4 FFT algorithm using DA(Distributed Arithmetic) is proposed. It is shown that DA can be efficiently used in twiddle factor calculation of the Radix-4 FFT algorithm. The Verilog-HDL coding results for the proposed DA butterfly structure show 61.02% cell area reduction comparison with those of the conventional multiplier butterfly structure. Furthermore, the 64-point Radix-4 pipeline structure using the proposed butterfly and delay commutators is compared with other conventional structures. Implementation coding results show 46.1% cell area reduction.

  • PDF

High-throughput and low-area implementation of orthogonal matching pursuit algorithm for compressive sensing reconstruction

  • Nguyen, Vu Quan;Son, Woo Hyun;Parfieniuk, Marek;Trung, Luong Tran Nhat;Park, Sang Yoon
    • ETRI Journal
    • /
    • v.42 no.3
    • /
    • pp.376-387
    • /
    • 2020
  • Massive computation of the reconstruction algorithm for compressive sensing (CS) has been a major concern for its real-time application. In this paper, we propose a novel high-speed architecture for the orthogonal matching pursuit (OMP) algorithm, which is the most frequently used to reconstruct compressively sensed signals. The proposed design offers a very high throughput and includes an innovative pipeline architecture and scheduling algorithm. Least-squares problem solving, which requires a huge amount of computations in the OMP, is implemented by using systolic arrays with four new processing elements. In addition, a distributed-arithmetic-based circuit for matrix multiplication is proposed to counterbalance the area overhead caused by the multi-stage pipelining. The results of logic synthesis show that the proposed design reconstructs signals nearly 19 times faster while occupying an only 1.06 times larger area than the existing designs for N = 256, M = 64, and m = 16, where N is the number of the original samples, M is the length of the measurement vector, and m is the sparsity level of the signal.

A VLSI Architecture for the Binary Jacket Sequence (이진 자켓 비트열의 VLSI 구조)

  • 박주용;이문호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.2A
    • /
    • pp.116-123
    • /
    • 2002
  • The jacket matrix is based on the Walsh-Hadamard matrix and an extension of it. While elements of the Walsh-Hadamard matrix are +1, or -1, those of the Jacket matrix are ${\pm}$1 and ${\pm}$$\omega$, which is $\omega$, which is ${\pm}$j and ${\pm}$2$\sub$n/. This matrix has weights in the center part of the matrix and its size is 1/4 of Hadamard matrix, and it has also two parts, sigh and weight. In this paper, instead of the conventional Jacket matrix where the weight is imposed by force, a simple Jacket sequence generation method is proposed. The Jacket sequence is generated by AND and Exclusive-OR operations between the binary indices bits of row and those of column. The weight is imposed on the element by when the product of each Exclusive-OR operations of significant upper two binary index bits of a row and column is 1. Each part of the Jacket matrix can be represented by jacket sequence using row and column binary index bits. Using Distributed Arithmetic (DA), we present a VLSI architecture of the Fast Jacket transform is presented. The Jacket matrix is able to be applied to cryptography, the information theory and complex spreading jacket QPSK modulation for WCDMA.

The Design of SoC for DCT/DWT Processor (DCT/DWT 프로세서를 위한 SoC 설계)

  • Kim, Young-Jin;Lee, Hyon-Soo
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.527-528
    • /
    • 2006
  • In this paper, we propose an IP design and implementation of System on a chip(SoC) for Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) processor using adder-based DA(Adder-based Distributed Arithmetic). To reduced hardware cost and to improve operating speed, the combined DCT/ DWT processor used the bit-serial method and DA module. The transform of coefficient equation result in reduction in hardware cost and has a regularity in implementation. We use Verilog-HDL and Xilinx ISE for simulation and implement FPGA on SoCMaster-3.

  • PDF

A High-Speed Multiplier-Free Realization of IIR Filter Using ROM's

  • Sakunkonch, Thanyapat;Tantaratana, Sawasd
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.711-714
    • /
    • 2000
  • In this paper, we propose a high-speed multiplier-free realization using ROM’s to store the results of coefficient scalings in Combination With higher signal rate and pipelined operations. We show that hardware multipliers are not needed. By varying some parameters, the proposed structure provides various combinations of hardware and clock speed (or through-put). An example is given comparing the proposed realization with the distributed arithmetic (DA) realization. Results show that With Proper Choices of the Parameters the proposed structure achieves a faster processing speed with less hardware, as compared to the DA realization.

  • PDF

A Design of high throughput IDCT processor in Distrited Arithmetic Method (처리율을 개선시킨 분산연산 방식의 IDCT 프로세서 설계)

  • 김병민;배현덕;조태원
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.40 no.6
    • /
    • pp.48-57
    • /
    • 2003
  • In this paper, An 8${\times}$l ID-IDCT processor with adder-based distributed arithmetic(DA) and bit-serial method Is presented. To reduce hardware cost and to improve operating speed, the proposed 8${\times}$1 ID-IDCT used the bit-serial method and DA method. The transform of coefficient equation results in reduction in hardware cost and has a regularity in implementation. The sign extension computation method reduces operation clock. As a result of logic synthesis, The gate count of designed 8${\times}$1 1D-IDCT is 17,504. The sign extension processing block has gate count of 3,620. That is 20% of total 8${\times}$1 ID-IDCT architecture. But the sign extension processing block improves more than twice in throughput. The designed IDCT processes 50Mpixels per second and at a clock frequency of 100MHz.

2-D DCT/IDCT Processor Design Reducing Adders in DA Architecture (DA구조 이용 가산기 수를 감소한 2-D DCT/IDCT 프로세서 설계)

  • Jeong Dong-Yun;Seo Hae-Jun;Bae Hyeon-Deok;Cho Tae-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.3 s.345
    • /
    • pp.48-58
    • /
    • 2006
  • This paper presents 8x8 two dimensional DCT/IDCT processor of adder-based distributed arithmetic architecture without applying ROM units in conventional memories. To reduce hardware cost in the coefficient matrix of DCT and IDCT, an odd part of the coefficient matrix was shared. The proposed architecture uses only 29 adders to compute coefficient operation in the 2-D DCT/IDCT processor, while 1-D DCT processor consists of 18 adders to compute coefficient operation. This architecture reduced 48.6% more than the number of adders in 8x8 1-D DCT NEDA architecture. Also, this paper proposed a form of new transpose network which is different from the conventional transpose memory block. The proposed transpose network block uses 64 registers with reduction of 18% more than the number of transistors in conventional memory architecture. Also, to improve throughput, eight input data receive eight pixels in every clock cycle and accordingly eight pixels are produced at the outputs.