Search | Korea Science

A Design of High Performance Operation Intra Predictor for H.264/AVC Decoder (H.264/AVC 복호기를 위한 고성능 연산처리 인트라 예측기 설계)

Jin, Xianzhe;Ryoo, Kwangki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.16 no.11
- /
- pp.2503-2510
- /
- 2012
This paper proposes a parallel operation intra predictor for H.264/AVC decoder. In previous intra predictor design, common operation units were designed for 17 prediction modes in order to compute more effectively. However, it was designed by analyzing the equation applied to one pixel. So, there are four operation units for computing 16 pixels in a $4{\times}4$ block and they need four cycles. In this paper, the proposed intra predictor contains T3(Three Type Transform) operation unit for parallel operation. It divides 17 modes into 3 types to calculate 16 pixels of a $4{\times}4$ block in only one cycle and needs 16 cycles minimum in 16x16 block. As the result of the experiment, in terms of processing cycle, the performance of proposed intra predictor is 58.95% higher than the previous one.
https://doi.org/10.6109/jkiice.2012.16.11.2503 인용 PDF KSCI

Design of Parallel Inverse Quantization and Inverse Transform Architecture for High Performance H.264/AVC Decoder (고성능 H.264/AVC 복호기를 위한 병렬 역양자화 및 역변환 구조 설계)

Jung, Hong-Kyun;Ryoo, Kwang-Ki
- Proceedings of the KAIS Fall Conference
- /
- 2011.12b
- /
- pp.434-437
- /
- 2011
본 논문에서는 H.264/AVC 복호기의 성능을 향상시키기 위해 병렬 역양자화 구조와 역변환 구조를 제안한다. 제안하는 역양자화 구조는 공통 연산기를 사용하여 계산 복잡도를 감소시키고, 4개의 공통연산기를 사용하여 역양자화 수행 사이클 수를 1 사이클로 감소시킨다. 제안하는 역변환 구조는 4개의 변환 연산기를 사용하여 역변환 연산을 수행하는데 2 사이클이 소요된다. 또한 제안하는 구조는 역양자화 연산과 수평 역변환 연산을 동시에 수행하는 병렬 구조를 채택하여 역양자화 및 역변환 수행 사이클 수를 2 사이클로 감소시킨다. 제안하는 구조를 Magnachip 0.18um CMOS 공정 라이브러리를 이용하여 합성한 결과 1.5MHz의 동작 주파수에서 게이트 수는 14,173이고, 표준 참조 소프트웨어 JM 9.4에서 추출한 데이터를 이용하여 성능을 측정한 결과 제안하는 구조의 수행 사이클 수가 기존 구조 대비 38.74% 향상되었다.
PDF

An Intra Prediction Hardware Architecture Design for Computational Complexity Reduction of HEVC Decoder (HEVC 복호기의 연산 복잡도 감소를 위한 화면내 예측 하드웨어 구조 설계)

Jung, Hongkyun;Ryoo, Kwangki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.17 no.5
- /
- pp.1203-1212
- /
- 2013
In this paper, an intra prediction hardware architecture is proposed to reduce computational complexity of intra prediction in HEVC decoder. The architecture uses shared operation units and common operation units and adopts a fast smoothing decision algorithm and a fast algorithm to generate coefficients of a filter. The shared operation unit shares adders processing common equations to remove the computational redundancy. The unit computes an average value in DC mode for reducing the number of execution cycles in DC mode. In order to reduce operation units, the common operation unit uses one operation unit generating predicted pixels and filtered pixels in all prediction modes. In order to reduce processing time and operators, the decision algorithm uses only bit-comparators and the fast algorithm uses LUT instead of multiplication operators. The proposed architecture using four shared operation units and eight common operation units which can reduce execution cycles of intra prediction. The architecture is synthesized using TSMC 0.13um CMOS technology. The gate count and the maximum operating frequency are 40.5k and 164MHz, respectively. As the result of measuring the performance of the proposed architecture using the extracted data from HM 7.1, the execution cycle of the architecture is about 93.7% less than the previous design.
https://doi.org/10.6109/jkiice.2013.17.5.1203 인용 PDF KSCI

An Intra Prediction Hardware Design for High Performance HEVC Encoder (고성능 HEVC 부호기를 위한 화면내 예측 하드웨어 설계)

Park, Seung-yong;Guard, Kanda;Ryoo, Kwang-ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.10a
- /
- pp.875-878
- /
- 2015
In this paper, we propose an intra prediction hardware architecture with less processing time, computations and reduced hardware area for a high performance HEVC encoder. The proposed intra prediction hardware architecture uses common operation units to reduce computational complexity and uses $4{\times}4$ block unit to reduce hardware area. In order to reduce operation time, common operation unit uses one operation unit to generate predicted pixels and filtered pixels in all prediction modes. Intra prediction hardware architecture introduces the $4{\times}4$ PU design processing to reduce the hardware area and uses intemal registers to support $32{\times}32$ PU processmg. The proposed hardware architecture uses ten common operation units which can reduce execution cycles of intra prediction. The proposed Intra prediction hardware architecture is designed using Verilog HDL(Hardware Description Language), and has a total of 41.5k gates in TSMC $0.13{\mu}m$ CMOS standard cell library. At 150MHz, it can support 4K UHD video encoding at 30fps in real time, and operates at a maximum of 200MHz.
PDF

An Efficient Hardware Design of Intra Predictor for High Performance HEVC Decoder (고성능 HEVC 복호기를 위한 화면내 예측기의 효율적인 하드웨어 설계)

Jung, Hongkyun;Kang, Sukmin;Ryoo, Kwangki
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.11a
- /
- pp.668-671
- /
- 2012
본 논문에서는 차세대 비디오 압축 표준인 HEVC(High Efficiency Video Coding) 복호기의 연산량과 하드웨어 면적을 감소시키기 위하여 화면내 예측 하드웨어 구조를 제안한다. 제안하는 하드웨어 구조는 공통 수식에 대한 연산을 공유하는 공유 연산기를 사용하여 연산량 및 연산기 개수를 감소시키고, $4{\times}4$ PU와 $64{\times}64$ PU의 필터링 수행 여부에 대한 연산을 수행하지 않고 나머지 PU에 대해서는 LUT를 이용하여 연산을 수행하기 때문에 연산량 및 연산 시간을 감소시킨다. 또한 하나의 공통 연산기만을 사용하여 예측 픽셀을 생성하기 때문에 하드웨어 면적이 감소한다. 제안하는 구조를 TSMC 0.18um 공정을 이용하여 합성한 결과 최대 동작 주파수는 100MHz이고, 게이트 수는 140,697이다. $4{\times}4$ PU를 기준으로 제안하는 구조의 처리 사이클 수는 11 사이클로 기존 구조 대비 54% 감소하였고, 16개 참조 픽셀의 필터링 처리를 기준으로 제안하는 구조의 덧셈 연산기 개수는 37개로 표준 draft 6에 비해 22.9% 감소하였다.
https://doi.org/10.3745/PKIPS.y2012m11a.668 인용 PDF

The Hardware Architecture of Efficient Intra Predictor for H.264/AVC Decoder (H.264/AVC 복호기를 위한 효율적인 인트라 예측기 하드웨어 구조)

Kim, Ok;Ryoo, Kwang-Ki
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.47 no.5
- /
- pp.24-30
- /
- 2010
In this paper, we described intra prediction which is the one of techniques to be used for higher compression performance in H.264/AVC and proposed the design of intra predictor for efficient intra prediction mode processing. The proposed system is consist of processing elements, precomputation processing elements, an intra prediction controller, an internal memory and a register controller. The proposed system needs the reduced the computation cycles by using processing elements and precomputation processing element and also needs the reduced the number of access time to external memory by using internal memory and registers architecture. We designed the proposed system with Verilog-HDL and verified with suitable test vectors which are encoded YUV files. The proposed architecture belongs to the baseline profile of H.264/AVC decoder and is suitable for portable devices such as cellular phone with the size of $176{\times}144$. As a result of experiment, the performance of the proposed intra predictor is about 60% higher than that of the previous one.
PDF KSCI

A High-Performance Architecture for 2 Dimensional Block-Based Computer Generated Hologram (2차원 블록 단위 기반의 고성능 컴퓨터 생성 홀로그램 생성기의 구조)

Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2012.07a
- /
- pp.109-110
- /
- 2012
본 논문에서는 홀로그램을 실시간으로 생성하기 위하여 수정된 디지털 홀로그램(computer-generated hologram, CGH) 수식을 재정의 하여 3단계로 나누고 2차원 블록 단위 기반의 컴퓨터 생성 홀로그램 생성기의 하드웨어 구조를 제안하였다. 유효광원의 대한 z축 항에 대하여 연산하는 공통항 연산기와 x,y축을 연산하는 죄표값 연산기 마지막으로 각 화소의 대하여 연산하는 화소값 연산기로 이루어진 하드웨어를 제안하고 구현 하였다. 구현한 하드웨어는 $32{\times}32$ 중간 블록의 구조를 가질 때 기존 연구에 비하여 86%이상의 DSP블록을 줄일 수 있다.
PDF

The Hardware Design of CABAC for High Performance H.264 Encoder (고성능 H.264 인코더를 위한 CABAC 하드웨어 설계)

Myoung, Je-Jin;Ryoo, Kwang-Ki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.16 no.4
- /
- pp.771-777
- /
- 2012
This paper proposes a binary arithmetic encoder of CABAC using a Common Operation Unit including the three modes. The binary arithmetic encoder performing arithmetic encoding and renormalizer can be simply implemented into a hardware architecture since the COU is used regardless of the modes. The proposed binary arithmetic encoder of CABAC includes Context RAM, Context Updater, Common Operation Unit and Bit-Gen. The architecture consists of 4-stage pipeline operating one symbol for each clock cycle. The area of proposed binary arithmetic encoder of CABAC is reduced up to 47%, the performance of proposed binary arithmetic encoder of CABAC is 19% higher than the previous architecture.
https://doi.org/10.6109/jkiice.2012.16.4.771 인용 PDF KSCI

An Optimized Hardware Design for High Performance Residual Data Decoder (고성능 잔여 데이터 복호기를 위한 최적화된 하드웨어 설계)

Jung, Hong-Kyun;Ryoo, Kwang-Ki
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.13 no.11
- /
- pp.5389-5396
- /
- 2012
In this paper, an optimized residual data decoder architecture is proposed to improve the performance in H.264/AVC. The proposed architecture is an integrated architecture that combined parallel inverse transform architecture and parallel inverse quantization architecture with common operation units applied new inverse quantization equations. The equations without division operation can reduce execution time and quantity of operation for inverse quantization process. The common operation unit uses multiplier and left shifter for the equations. The inverse quantization architecture with four common operation units can reduce execution cycle of inverse quantization to one cycle. The inverse transform architecture consists of eight inverse transform operation units. Therefore, the architecture can reduce the execution cycle of inverse transform to one cycle. Because inverse quantization operation and inverse transform operation are concurrency, the execution cycle of inverse transform and inverse quantization operation for one $4{\times}4$ block is one cycle. The proposed architecture is synthesized using Magnachip 0.18um CMOS technology. The gate count and the critical path delay of the architecture are 21.9k and 5.5ns, respectively. The throughput of the architecture can achieve 2.89Gpixels/sec at the maximum clock frequency of 181MHz. As the result of measuring the performance of the proposed architecture using the extracted data from JM 9.4, the execution cycle of the proposed architecture is about 88.5% less than that of the existing designs.
https://doi.org/10.5762/KAIS.2012.13.11.5389 인용 PDF KSCI

Circuit Design of Modular Multiplier for Fast Exponentiation (고속 멱승을 위한 모듈라 곱셈기 회로 설계)

하재철;오중효;유기영;문상재
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 1997.11a
- /
- pp.221-231
- /
- 1997
본 논문에서는 고속 멱승을 위한 모듈라 곱셈기를 시스토릭 어레이로 설계한다. Montgomery 알고리듬 및 시스토릭 어레이 구조를 분석하고 공통 피승수 곱셈 개념을 사용한 변형된 Montgomery 알고리듬에 대해 시스토릭 어레이 곱셈기를 설계한다. 제안 곱셈기는 각 처리기 내부 연산을 병렬화 할 수 있고 연산 자체도 간단화 할 수 있어 시스토릭 어레이 하드웨어 구현에 유리하며 기존의 곱셈기를 사용하는 것보다 멱승 전체의 계산을 약 0.4배내지 0.6배로 감소시킬 수 있다.
PDF

Search Result 46, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)