• Title/Summary/Keyword: inverse integer transform

Search Result 14, Processing Time 0.025 seconds

An Efficient Hardware Architecture of Intra Prediction and TQ/IQIT Module for H.264 Encoder

  • Suh, Ki-Bum;Park, Seong-Mo;Cho, Han-Jin
    • ETRI Journal
    • /
    • v.27 no.5
    • /
    • pp.511-524
    • /
    • 2005
  • In this paper, we propose a novel hardware architecture for an intra-prediction, integer transform, quantization, inverse integer transform, inverse quantization, and mode decision module for the macroblock engine of a new video coding standard, H.264. To reduce the cycle of intra prediction, transform/quantization, and inverse quantization/inverse transform of H.264, a reduction method for cycle overhead in the case of I16MB mode is proposed. This method can process one macroblock for 927 cycles for all cases of macroblock type by processing $4{\times}4$ Hadamard transform and quantization during $16{\times}16$ prediction. This module was designed using Verilog Hardware Description Language (HDL) and operates with a 54 MHz clock using the Hynix $0.35 {\mu}m$ TLM (triple layer metal) library.

  • PDF

Integer Inverse Transform Structure Based on Matrix for VP9 Decoder (VP9 디코더에 대한 행렬 기반의 정수형 역변환 구조)

  • Lee, Tea-Hee;Hwang, Tae-Ho;Kim, Byung-Soo;Kim, Dong-Sun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.4
    • /
    • pp.106-114
    • /
    • 2016
  • In this paper, we propose an efficient integer inverse transform structure for vp9 decoder. The proposed structure is a hardware structure which is easy to control and requires less hardware resources, and shares algorithms for realizing entire DCT(Discrete Cosine Transform), ADST(Asymmetric Discrete Sine Transform) and WHT(Walsh-Hadamard Transform) in vp9. The integer inverse transform for vp9 google model has a fast structure, named butterfly structure. The integer inverse transform for google C model, unlike universal fast structure, takes a constant rounding shift operator on each stage and includes an asymmetrical sine transform structure. Thus, the proposed structure approximates matrix coefficient values for all transform mode and is used to matrix operation method. With the proposed structure, shared operations for all inverse transform algorithm modes can be possible with reduced number of multipliers compared to the butterfly structure, which in turn manages the hardware resources more efficiently.

Hardware Implementation of Integer Transform and Quantization for H.264 (하드웨어 기반의 H.264 정수 변환 및 양자화 구현)

  • 임영훈;정용진
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.12C
    • /
    • pp.1182-1191
    • /
    • 2003
  • In this paper, we propose a new hardware architecture for integer transform, quantizer, inverse quantizer, and inverse integer transform of a new video coding standard H.264/JVT. We describe the algorithm and derive hardware architecture emphasizing the importance of area for low cost and low power consumption. The proposed architecture has been verified by PCI-interfaced emulation board using APEX-II Alters FPGA and also by ASIC synthesis using Samsung 0.18 um CMOS cell library. The ASIC synthesis result shows that the proposed hardware can operate at 100 MHz, processing more than 1,300 QCIF video frames per second. The hardware is going to be used as a core module when implementing a complete H.264 video encoder/decoder ASIC for real-time multimedia application.

Fast Binary Block Inverse Jacket Transform

  • Lee Moon-Ho;Zhang Xiao-Dong;Pokhrel Subash Shree;Choe Chang-Hui;Hwang Gi-Yean
    • Journal of electromagnetic engineering and science
    • /
    • v.6 no.4
    • /
    • pp.244-252
    • /
    • 2006
  • A block Jacket transform and. its block inverse Jacket transformn have recently been reported in the paper 'Fast block inverse Jacket transform'. But the multiplication of the block Jacket transform and the corresponding block inverse Jacket transform is not equal to the identity transform, which does not conform to the mathematical rule. In this paper, new binary block Jacket transforms and the corresponding binary block inverse Jacket transforms of orders $N=2^k,\;3^k\;and\;5^k$ for integer values k are proposed and the mathematical proofs are also presented. With the aid of the Kronecker product of the lower order Jacket matrix and the identity matrix, the fast algorithms for realizing these transforms are obtained. Due to the simple inverse, fast algorithm and prime based $P^k$ order of proposed binary block inverse Jacket transform, it can be applied in communications such as space time block code design, signal processing, LDPC coding and information theory. Application of circular permutation matrix(CPM) binary low density quasi block Jacket matrix is also introduced in this paper which is useful in coding theory.

ASIP Instructions and Their Hardware Architecture for H.264/AVC

  • Lee, Jung-H.;Kim, Sung-D.;Sunwoo, Myung-H.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.5 no.4
    • /
    • pp.237-242
    • /
    • 2005
  • H.264/AVC adopts new features compared with previous multimedia algorithms. It is inefficient to implement some of the new blocks using existing DSP instructions. Hence, new instructions are required to implement H.264/AVC. This paper proposes novel instructions for intra-prediction, in-loop deblocking filter, entropy coding and integer transform. Performance comparisons show that the required computation cycles for the in-loop deblocking filter can be reduced about $20{\sim}25%$. This paper also proposes new instructions for the integer transform. The proposed instructions can execute one dimension forward/inverse integer transform. The integer transform can be implemented using much smaller hardware size than existing DSPs.

Optimized Integer Cosine Transform (최적화 정수형 여현 변환)

  • 이종하;김혜숙;송인준;곽훈성
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.9
    • /
    • pp.1207-1214
    • /
    • 1995
  • We present an optimized integer cosine transform(OICT) as an alternative approach to the conventional discrete cosine transform(DCT), and its fast computational algorithm. In the actual implementation of the OICT, we have used the techniques similar to those of the orthogonal integer transform(OIT). The normalization factors are approximated to single one while keeping the reconstruction error at the best tolerable level. By obtaining a single normalization factor, both forward and inverse transform are performed using only the integers. However, there are so many sets of integers that are selected in the above manner, the best OICT matrix obtained through value minimizing the Hibert-Schmidt norm and achieving fast computational algorithm. Using matrix decomposing, a fast algorithm for efficient computation of the order-8 OICT is developed, which is minimized to 20 integer multiplications. This enables us to implement a high performance 2-D DCT processor by replacing the floating point operations by the integer number operations. We have also run the simulation to test the performance of the order-8 OICT with the transform efficiency, maximum reducible bits, and mean square error for the Wiener filter. When the results are compared to those of the DCT and OIT, the OICT has out-performed them all. Furthermore, when the conventional DCT coefficients are reduced to 7-bit as those of the OICT, the resulting reconstructed images were critically impaired losing the orthogonal property of the original DCT. However, the 7-bit OICT maintains a zero mean square reconstruction error.

  • PDF

High Throughput Parallel Design of 2-D $8{\times}8$ Integer Transforms for H.264/AVC (H.264/AVC 를 위한 높은 처리량의 2-D $8{\times}8$ integer transforms 병렬 구조 설계)

  • Sharma, Meeturani;Tiwari, Honey;Cho, Yong-Beom
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.8
    • /
    • pp.27-34
    • /
    • 2012
  • In this paper, the implementation of high throughput two-dimensional (2-D) $8{\times}8$ forward and inverse integer DCT transform for H.264 is presented. The forward and inverse transforms are represented using simple shift and addition operations. Matrix decomposition and matrix operation such as the Kronecker product and direct sum are used to reduce the computation complexity. The proposed design uses integer computations and does not use transpose memory and hence, the resource consumption is also reduced. The maximum operating frequency of the proposed pipelined architecture is 1.184 GHz, which achieves 25.27 Gpixels/sec throughput rate with the hardware cost of 44864 gates. High throughput and low hardware makes the proposed design useful for real time H.264/AVC high definition processing.

A High Throughput Multiple Transform Architecture for H.264/AVC Fidelity Range Extensions

  • Ma, Yao;Song, Yang;Ikenaga, Takeshi;Goto, Satoshi
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.7 no.4
    • /
    • pp.247-253
    • /
    • 2007
  • In this paper, a high throughput multiple transform architecture for H.264 Fidelity Range Extensions (FRExt) is proposed. New techniques are adopted which (1) regularize the $8{\times}8$ integer forward and inverse DCT transform matrices, (2) divide them into four $4{\times}4$ sub-matrices so that simple fast butterfly algorithm can be used, (3) because of the similarity of the sub-matrices, mixed butterflies are proposed that all the sub-matrices of $8{\times}8$ and matrices of $4{\times}4$ forward DCT (FDCT), inverse DCT (IDCT) and Hadamard transform can be merged together. Based on these techniques, a hardware architecture is realized which can achieve throughput of 1.488Gpixel/s when processing either $4{\times}4\;or\;8{\times}8$ transform. With such high throughput, the design can satisfy the critical requirement of the real-time multi-transform processing of High Definition (HD) applications such as High Definition DVD (HD-DVD) ($1920{\times}1080@60Hz$) in H.264/AVC FRExt. This work has been synthesized using Rohm 0.18um library. The design can work on a frequency of 93MHz and throughput of 1.488Gpixel/s with a cost of 56440 gates.

A VLSI Architecture of an 8$\times$8 OICT for HDTV Application (HDTU용 8$\times$8 최적화 정수형 여현 변환의 VLSE 구조)

  • 송인준;황상문;이종하;류기수;곽훈성
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.36T no.1
    • /
    • pp.1-7
    • /
    • 1999
  • We present VLSI architecture for a high performance 2-D DCT processor which is used compressing system of real time image processing or HDTV using fast computational algorithm of the Optimized Integer Cosine Transform(OICT). The coefficients of the OICT are integer, so the OICT performs only the integer operations for both forward and inverse transform. Therefore the proposed architecture could be greatly enhanced in improving the speed, reduced the hardware cost considerably by replacing the multiplication operations with shift and addition operations compared with DCT which performs floating-point operations.

  • PDF

New Continuous Variable Space Optimization Methodology for the Inverse Kinematics of Binary Manipulators Consisting of Numerous Modules (수많은 모듈로 구성된 이진 매니플레이터 역기구 설계를 위한 연속변수공간 최적화 신기법 연구)

  • Jang Gang-Won;Nam Sang Jun;Kim Yoon Young
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.28 no.10
    • /
    • pp.1574-1582
    • /
    • 2004
  • Binary manipulators have recently received much attention due to hyper-redundancy, light weight, good controllability and high reliability. The precise positioning of the manipulator end-effecter requires the use of many modules, which results in a high-dimensional workspace. When the workspace dimension is large, existing inverse kinematics methods such as the Ebert-Uphoff algorithm may require impractically large memory size in determining the binary positions of all actuators. To overcome this limitation, we propose a new inverse kinematics algorithm: the inverse kinematics problem is formulated as an optimization problem using real-valued design variables, The key procedure in this approach is to transform the integer-variable optimization problem to a real-variable optimization problem and to push the real-valued design variables as closely as possible to the permissible binary values. Since the actual optimization is performed in real-valued design variables, the design sensitivity becomes readily available, and the optimization method becomes extremely efficient. Because the proposed formulation is quite general, other design considerations such as operation power minimization can be easily considered.