• Title/Summary/Keyword: 공통 연산기

Search Result 46, Processing Time 0.028 seconds

A Design of High Performance Operation Intra Predictor for H.264/AVC Decoder (H.264/AVC 복호기를 위한 고성능 연산처리 인트라 예측기 설계)

  • Jin, Xianzhe;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.11
    • /
    • pp.2503-2510
    • /
    • 2012
  • This paper proposes a parallel operation intra predictor for H.264/AVC decoder. In previous intra predictor design, common operation units were designed for 17 prediction modes in order to compute more effectively. However, it was designed by analyzing the equation applied to one pixel. So, there are four operation units for computing 16 pixels in a $4{\times}4$ block and they need four cycles. In this paper, the proposed intra predictor contains T3(Three Type Transform) operation unit for parallel operation. It divides 17 modes into 3 types to calculate 16 pixels of a $4{\times}4$ block in only one cycle and needs 16 cycles minimum in 16x16 block. As the result of the experiment, in terms of processing cycle, the performance of proposed intra predictor is 58.95% higher than the previous one.

An Intra Prediction Hardware Architecture Design for Computational Complexity Reduction of HEVC Decoder (HEVC 복호기의 연산 복잡도 감소를 위한 화면내 예측 하드웨어 구조 설계)

  • Jung, Hongkyun;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.5
    • /
    • pp.1203-1212
    • /
    • 2013
  • In this paper, an intra prediction hardware architecture is proposed to reduce computational complexity of intra prediction in HEVC decoder. The architecture uses shared operation units and common operation units and adopts a fast smoothing decision algorithm and a fast algorithm to generate coefficients of a filter. The shared operation unit shares adders processing common equations to remove the computational redundancy. The unit computes an average value in DC mode for reducing the number of execution cycles in DC mode. In order to reduce operation units, the common operation unit uses one operation unit generating predicted pixels and filtered pixels in all prediction modes. In order to reduce processing time and operators, the decision algorithm uses only bit-comparators and the fast algorithm uses LUT instead of multiplication operators. The proposed architecture using four shared operation units and eight common operation units which can reduce execution cycles of intra prediction. The architecture is synthesized using TSMC 0.13um CMOS technology. The gate count and the maximum operating frequency are 40.5k and 164MHz, respectively. As the result of measuring the performance of the proposed architecture using the extracted data from HM 7.1, the execution cycle of the architecture is about 93.7% less than the previous design.

An Intra Prediction Hardware Design for High Performance HEVC Encoder (고성능 HEVC 부호기를 위한 화면내 예측 하드웨어 설계)

  • Park, Seung-yong;Guard, Kanda;Ryoo, Kwang-ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.875-878
    • /
    • 2015
  • In this paper, we propose an intra prediction hardware architecture with less processing time, computations and reduced hardware area for a high performance HEVC encoder. The proposed intra prediction hardware architecture uses common operation units to reduce computational complexity and uses $4{\times}4$ block unit to reduce hardware area. In order to reduce operation time, common operation unit uses one operation unit to generate predicted pixels and filtered pixels in all prediction modes. Intra prediction hardware architecture introduces the $4{\times}4$ PU design processing to reduce the hardware area and uses intemal registers to support $32{\times}32$ PU processmg. The proposed hardware architecture uses ten common operation units which can reduce execution cycles of intra prediction. The proposed Intra prediction hardware architecture is designed using Verilog HDL(Hardware Description Language), and has a total of 41.5k gates in TSMC $0.13{\mu}m$ CMOS standard cell library. At 150MHz, it can support 4K UHD video encoding at 30fps in real time, and operates at a maximum of 200MHz.

  • PDF

Design of Parallel Inverse Quantization and Inverse Transform Architecture for High Performance H.264/AVC Decoder (고성능 H.264/AVC 복호기를 위한 병렬 역양자화 및 역변환 구조 설계)

  • Jung, Hong-Kyun;Ryoo, Kwang-Ki
    • Proceedings of the KAIS Fall Conference
    • /
    • 2011.12b
    • /
    • pp.434-437
    • /
    • 2011
  • 본 논문에서는 H.264/AVC 복호기의 성능을 향상시키기 위해 병렬 역양자화 구조와 역변환 구조를 제안한다. 제안하는 역양자화 구조는 공통 연산기를 사용하여 계산 복잡도를 감소시키고, 4개의 공통연산기를 사용하여 역양자화 수행 사이클 수를 1 사이클로 감소시킨다. 제안하는 역변환 구조는 4개의 변환 연산기를 사용하여 역변환 연산을 수행하는데 2 사이클이 소요된다. 또한 제안하는 구조는 역양자화 연산과 수평 역변환 연산을 동시에 수행하는 병렬 구조를 채택하여 역양자화 및 역변환 수행 사이클 수를 2 사이클로 감소시킨다. 제안하는 구조를 Magnachip 0.18um CMOS 공정 라이브러리를 이용하여 합성한 결과 1.5MHz의 동작 주파수에서 게이트 수는 14,173이고, 표준 참조 소프트웨어 JM 9.4에서 추출한 데이터를 이용하여 성능을 측정한 결과 제안하는 구조의 수행 사이클 수가 기존 구조 대비 38.74% 향상되었다.

  • PDF

An Efficient Hardware Design of Intra Predictor for High Performance HEVC Decoder (고성능 HEVC 복호기를 위한 화면내 예측기의 효율적인 하드웨어 설계)

  • Jung, Hongkyun;Kang, Sukmin;Ryoo, Kwangki
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.668-671
    • /
    • 2012
  • 본 논문에서는 차세대 비디오 압축 표준인 HEVC(High Efficiency Video Coding) 복호기의 연산량과 하드웨어 면적을 감소시키기 위하여 화면내 예측 하드웨어 구조를 제안한다. 제안하는 하드웨어 구조는 공통 수식에 대한 연산을 공유하는 공유 연산기를 사용하여 연산량 및 연산기 개수를 감소시키고, $4{\times}4$ PU와 $64{\times}64$ PU의 필터링 수행 여부에 대한 연산을 수행하지 않고 나머지 PU에 대해서는 LUT를 이용하여 연산을 수행하기 때문에 연산량 및 연산 시간을 감소시킨다. 또한 하나의 공통 연산기만을 사용하여 예측 픽셀을 생성하기 때문에 하드웨어 면적이 감소한다. 제안하는 구조를 TSMC 0.18um 공정을 이용하여 합성한 결과 최대 동작 주파수는 100MHz이고, 게이트 수는 140,697이다. $4{\times}4$ PU를 기준으로 제안하는 구조의 처리 사이클 수는 11 사이클로 기존 구조 대비 54% 감소하였고, 16개 참조 픽셀의 필터링 처리를 기준으로 제안하는 구조의 덧셈 연산기 개수는 37개로 표준 draft 6에 비해 22.9% 감소하였다.

The Hardware Design of CABAC for High Performance H.264 Encoder (고성능 H.264 인코더를 위한 CABAC 하드웨어 설계)

  • Myoung, Je-Jin;Ryoo, Kwang-Ki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.4
    • /
    • pp.771-777
    • /
    • 2012
  • This paper proposes a binary arithmetic encoder of CABAC using a Common Operation Unit including the three modes. The binary arithmetic encoder performing arithmetic encoding and renormalizer can be simply implemented into a hardware architecture since the COU is used regardless of the modes. The proposed binary arithmetic encoder of CABAC includes Context RAM, Context Updater, Common Operation Unit and Bit-Gen. The architecture consists of 4-stage pipeline operating one symbol for each clock cycle. The area of proposed binary arithmetic encoder of CABAC is reduced up to 47%, the performance of proposed binary arithmetic encoder of CABAC is 19% higher than the previous architecture.

The Hardware Architecture of Efficient Intra Predictor for H.264/AVC Decoder (H.264/AVC 복호기를 위한 효율적인 인트라 예측기 하드웨어 구조)

  • Kim, Ok;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.5
    • /
    • pp.24-30
    • /
    • 2010
  • In this paper, we described intra prediction which is the one of techniques to be used for higher compression performance in H.264/AVC and proposed the design of intra predictor for efficient intra prediction mode processing. The proposed system is consist of processing elements, precomputation processing elements, an intra prediction controller, an internal memory and a register controller. The proposed system needs the reduced the computation cycles by using processing elements and precomputation processing element and also needs the reduced the number of access time to external memory by using internal memory and registers architecture. We designed the proposed system with Verilog-HDL and verified with suitable test vectors which are encoded YUV files. The proposed architecture belongs to the baseline profile of H.264/AVC decoder and is suitable for portable devices such as cellular phone with the size of $176{\times}144$. As a result of experiment, the performance of the proposed intra predictor is about 60% higher than that of the previous one.

Design of an Efficient AES-ARIA Processor using Resource Sharing Technique (자원 공유기법을 이용한 AES-ARIA 연산기의 효율적인 설계)

  • Koo, Bon-Seok;Ryu, Gwon-Ho;Chang, Tae-Joo;Lee, Sang-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.6A
    • /
    • pp.39-49
    • /
    • 2008
  • AEA and ARIA are next generation standard block cipher of US and Korea, respectively, and these algorithms are used in various fields including smart cards, electronic passport, and etc. This paper addresses the first efficient unified hardware architecture of AES and ARIA, and shows the implementation results with 0.25um CMOS library. We designed shared S-boxes based on composite filed arithmetic for both algorithms, and also extracted common terms of the permutation matrices of both algorithms. With the $0.25-{\mu}m$ CMOS technology, our processor occupies 19,056 gate counts which is 32% decreased size from discrete implementations, and it uses 11 clock cycles and 16 cycles for AES and ARIA encryption, which shows 720 and 1,047 Mbps, respectively.

A High-Performance Architecture for 2 Dimensional Block-Based Computer Generated Hologram (2차원 블록 단위 기반의 고성능 컴퓨터 생성 홀로그램 생성기의 구조)

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.109-110
    • /
    • 2012
  • 본 논문에서는 홀로그램을 실시간으로 생성하기 위하여 수정된 디지털 홀로그램(computer-generated hologram, CGH) 수식을 재정의 하여 3단계로 나누고 2차원 블록 단위 기반의 컴퓨터 생성 홀로그램 생성기의 하드웨어 구조를 제안하였다. 유효광원의 대한 z축 항에 대하여 연산하는 공통항 연산기와 x,y축을 연산하는 죄표값 연산기 마지막으로 각 화소의 대하여 연산하는 화소값 연산기로 이루어진 하드웨어를 제안하고 구현 하였다. 구현한 하드웨어는 $32{\times}32$ 중간 블록의 구조를 가질 때 기존 연구에 비하여 86%이상의 DSP블록을 줄일 수 있다.

  • PDF

An Optimized Hardware Design for High Performance Residual Data Decoder (고성능 잔여 데이터 복호기를 위한 최적화된 하드웨어 설계)

  • Jung, Hong-Kyun;Ryoo, Kwang-Ki
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.11
    • /
    • pp.5389-5396
    • /
    • 2012
  • In this paper, an optimized residual data decoder architecture is proposed to improve the performance in H.264/AVC. The proposed architecture is an integrated architecture that combined parallel inverse transform architecture and parallel inverse quantization architecture with common operation units applied new inverse quantization equations. The equations without division operation can reduce execution time and quantity of operation for inverse quantization process. The common operation unit uses multiplier and left shifter for the equations. The inverse quantization architecture with four common operation units can reduce execution cycle of inverse quantization to one cycle. The inverse transform architecture consists of eight inverse transform operation units. Therefore, the architecture can reduce the execution cycle of inverse transform to one cycle. Because inverse quantization operation and inverse transform operation are concurrency, the execution cycle of inverse transform and inverse quantization operation for one $4{\times}4$ block is one cycle. The proposed architecture is synthesized using Magnachip 0.18um CMOS technology. The gate count and the critical path delay of the architecture are 21.9k and 5.5ns, respectively. The throughput of the architecture can achieve 2.89Gpixels/sec at the maximum clock frequency of 181MHz. As the result of measuring the performance of the proposed architecture using the extracted data from JM 9.4, the execution cycle of the proposed architecture is about 88.5% less than that of the existing designs.