• Title/Summary/Keyword: HEVC decoder

Search Result 43, Processing Time 0.044 seconds

Performance Analysis of HEVC Decoder Parallelization based on Slice and Tile for Ultra-High Definition Video (초고해상도 비디오를 위한 분할 영상 기반 HEVC 복호화기 병렬화)

  • Son, SoHee;Baek, A-Ram;Choi, Haechul
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2016.06a
    • /
    • pp.359-360
    • /
    • 2016
  • 본 논문에서는 초고화질의 비디오 실시간 복호화를 위해 HEVC(High Efficiency Video Coding)에서 지원하는 병렬화 기술인 Slice와 Tile 기술을 이용하여 초고해상도 영상에 대한 복호화기 병렬화 성능을 비교한다. Slice와 Tile은 분할 데이터간 의존성이 존재하지 않으므로 분할된 데이터를 다중 스레드에 할당하여 데이터-레벨 병렬화를 수행하였다. 실험 결과에서는 병렬화된 복호화기 성능이 기존 순차 복호화기에 비해 최대 2.08배 고속화 되었고, 분할 데이터 수가 증가하여도 화질 손실이 거의 없는 결과를 보인다.

  • PDF

Low-Complexity Watermarking into SAO Offsets for HEVC Videos

  • Wu, Xiangjian;Jo, Hyun-Ho;Sim, Donggyu
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.243-249
    • /
    • 2016
  • This paper proposes a new watermarking algorithm to embed watermarks in thr process of sample adaptive offsets (SAO) for high efficiency video coding (HEVC) compressed videos. The proposed method embeds two-bit watermark into the SAO offsets for each coding tree unit (CTU). To minimize visual quality degradation caused by embedding watermark, watermark bits are embedded into SAO offset depending on the SAO types of block. Furthermore, the embedded watermark can be extracted by simply adding four offsets and checking their least significant bits (LSB) at the decoder side. The experimental results show that the proposed method achieves 0.3% BD-rate increase without much visual quality degradation. Two-bit watermark for each CTU is embedded for more bit watermarking. In addition, the proposed method requires negligible computational load for watermark insertion and extraction.

High-Perlormance VLSI Architecture of HEVC CABAC Decoder by Multi-Parallel Algorithm (병 렬 알고리즘에 의한 H.265/HEVC CABAC 디코더의 고성능 구조)

  • Kim, Gi-Yeong;Bae, Jong-Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.934-937
    • /
    • 2015
  • 본 논문은 비디오 디코더의 병목현장을 해결하고 대량의 데이터를 처리할 수 있는 다중병렬처리방식의 HEVC CABAC 디코더를 소개한다. CABAC 디코더를 병렬화한 하드웨어 VLSI구조를 설계하여 크기 대비 높은 처리량이 나오는지를 설계 및 분석결과를 통해 연구결과를 도출하는 게 본 논문의 목적이다. CABAC 디코더 내부 module(산술 디코더, 문맥 모델러, 역이진화기) 1개에서 4개까지의 병렬화를 분석한 결과 4개의 병렬화를 했을 때가 크기 대비 처리량이 가장 높다는 것을 알 수 있었다. 또한 내부 module 4개를 병렬화한 CABAC 디코더 4개를 병렬화하여 slice 단위로 나눠진 프레임 1개를 한 번에 처리하는 방식을 채택하였다. 본 논문에서는 각 CABAC 디코더의 내부 module 4개를 병렬화하고, 병렬화한 CABAC 디코더 4개를 다시 병렬화하는 하드웨어 구조를 사용한다.

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

  • Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.1
    • /
    • pp.1-9
    • /
    • 2014
  • This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.

The Hardware Design of Effective Sample Adaptive Offset for High Performance HEVC Decoder (고성능 HEVC 복호기를 위한 효과적인 Sample Adaptive Offset 하드웨어 설계)

  • Park, Seungyong;Lee, Dongweon;Ryoo, Kwangki
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.645-648
    • /
    • 2012
  • 본 논문에서는 고성능 HEVC(High Efficiency Video Coding) 복호기 설계를 위한 효율적인 SAO(Sample Adaptive Offset)의 하드웨어 구조 설계에 대해 기술한다. SAO는 양자화 등의 손실 압축에 의해 발생하는 정보의 손실을 보상하는 기술이다. 하지만 HEVC의 최대 블록 크기인 $64{\times}64$ 단위를 화소 단위 연산을 수행하기 때문에 높은 연산시간 및 연산량이 요구된다. 따라서 본 논문에서 제안하는 SAO 하드웨어 구조는 $8{\times}8$ 단위를 처리하는 연산기로 구성하여 하드웨어 면적을 최소화하였고, 내부레지스터를 이용하여 $64{\times}64$ 블록 크기를 지원한다. 또한 기존 SAO의 top-down 블록분할 구조에서 down-top 블록분할 구조로 설계하여 연산시간 및 연산량을 최소화 하였다. 제안하는 하드웨어 구조는 Verilog HDL로 설계하였으며, TSMC 칩 공정 $0.18{\mu}m$ 셀 라이브러리로 합성한 결과 동작 주파수는 250MHz, 전체 게이트 수는 45.4k 이다.

The Hardware Design of Effective In-loop Filter for High Performance HEVC Decoder (고성능 HEVC 복호기를 위한 효과적인 In-loop Filter 하드웨어 설계)

  • Park, Seungyong;Cho, Hyunpyo;Park, Jaeha;Kang, Byungik;Ryoo, Kwangki
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.1506-1509
    • /
    • 2013
  • 본 논문에서는 고성능 HEVC(High Efficiency Video Coding) 복호기 설계를 위한 효율적인 in-loop filter의 하드웨어 구조 설계에 대해 기술한다. in-loop filter는 deblocking filter와 SAO로 구성되며, 블록 단위 영상 압축 및 양자화 등에서 발생하는 정보의 손실을 보상하는 기술이다. 하지만 HEVC는 $64{\times}64$ 블록 크기까지 화소 단위 연산을 수행하기 때문에 높은 연산시간 및 연산량이 요구된다. 따라서 본 논문에서 제안하는 in-loop filter의 deblocking filter 모듈과 SAO 모듈은 최소 연산 단위인 $8{\times}8$ 블록 연산기로 구성하여 하드웨어 면적을 최소화하였다. 또한 SAO에서는 $8{\times}8$ 블록의 연산 결과를 내부레지스터에 저장하는 구조로 $64{\times}64$ 블록 크기를 지원하도록 설계하여 연산시간 및 연산량을 최소화 하였다. 제안하는 하드웨어 구조는 Verilog HDL로 설계하였으며, TSMC 칩 공정 180nm 셀 라이브러리로 합성한 결과 동작 주파수는 270MHz이고, 전체 게이트 수는 48.9k이다.

An Intra Prediction Hardware Architecture Design for Computational Complexity Reduction of HEVC Decoder (HEVC 복호기의 연산 복잡도 감소를 위한 화면내 예측 하드웨어 구조 설계)

  • Jung, Hongkyun;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.5
    • /
    • pp.1203-1212
    • /
    • 2013
  • In this paper, an intra prediction hardware architecture is proposed to reduce computational complexity of intra prediction in HEVC decoder. The architecture uses shared operation units and common operation units and adopts a fast smoothing decision algorithm and a fast algorithm to generate coefficients of a filter. The shared operation unit shares adders processing common equations to remove the computational redundancy. The unit computes an average value in DC mode for reducing the number of execution cycles in DC mode. In order to reduce operation units, the common operation unit uses one operation unit generating predicted pixels and filtered pixels in all prediction modes. In order to reduce processing time and operators, the decision algorithm uses only bit-comparators and the fast algorithm uses LUT instead of multiplication operators. The proposed architecture using four shared operation units and eight common operation units which can reduce execution cycles of intra prediction. The architecture is synthesized using TSMC 0.13um CMOS technology. The gate count and the maximum operating frequency are 40.5k and 164MHz, respectively. As the result of measuring the performance of the proposed architecture using the extracted data from HM 7.1, the execution cycle of the architecture is about 93.7% less than the previous design.

Test Stream Generation Method for UHDTV Broadcasting Standard (UHD 방송 표준 검증을 위한 시험 스트림 개발에 관한 연구)

  • Kim, Jaeil;Bae, Sungpo;Yang, Jinyoung;Kwon, Donghyun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.7
    • /
    • pp.823-832
    • /
    • 2016
  • This paper presents a generation method of test streams for verifying conformance of an UHD broadcasting receiver including decoders for video and audio as well as parsers for PSIP and closed caption data. The proposed test streams for video/audio signals can evaluate conformance of HEVC, AC-3 and DTS-HD standards. Especially, test streams for HEVC video compression standard can be used for testing syntax compliance and error resilience for a HEVC decoder. Moreover, the proposed test streams for system/program and closed caption can be applied for verifying parsers for PSIP and CEA-708 standards.

A Study on the Full-HD HEVC Encoder IP Design (고해상도 비디오 인코더 IP 설계에 대한 연구)

  • Lee, Sukho;Cho, Seunghyun;Kim, Hyunmi;Lee, Jehyun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.12
    • /
    • pp.167-173
    • /
    • 2015
  • This paper presents a study on the Full-HD HEVC(High Efficiency Video Coding) encoder IP(Intellectual Property) design. The designed IP is for HEVC main profile 4.1, and performs encoding with a speed of 60 fps of full high definition. Before hardware and software design, overall reference model was developed with C language, and we proposed a parallel processing architecture for low-power consumption. And also we coded firmware and driver programs relating IP. The platform for verification of developed IP was developed, and we verified function and performance for various pictures under several encoding conditions by implementing designed IP to FPGA board. Compared to HM-13.0, about 35% decrease in bit-rate under same PSNR was achieved, and about 25% decrease in power consumption under low-power mode was performed.

Design of Sub-pixel Interpolation Circuit for Real-time Multi-decoder Supporting 4K-UHD Video Images (4K-UHD 영상을 지원하는 실시간 통합 복호기용 부화소 보간 회로 설계)

  • Lee, Sujung;Cho, Kyeongsoon
    • Journal of IKEEE
    • /
    • v.19 no.1
    • /
    • pp.1-9
    • /
    • 2015
  • This paper proposes the design of sub-pixel interpolation circuit for real-time multi-decoder supporting 4K-UHD video images. The proposed sub-pixel interpolation circuit supports H.264, MPEG-4, VC-1 and new video compression standard HEVC. The common part of the interpolation algorithm used in each video compression standard is shared to reduce the circuit size. An intermediate buffer is effectively used to reduce the circuit size and optimize the performance. The proposed sub-pixel interpolation circuit was synthesised by using 130nm standard cell library. The synthesized gate-level circuit consists of 122,564 gates and processes 35~86 image frames per second for 4K-UHD video at the maximum operation frequency of 200MHz. Therefore, the proposed circuit can process 4K-UHD video in real time.