• Title/Summary/Keyword: 병렬 부호화

Search Result 105, Processing Time 0.024 seconds

A Design of Fractional Motion Estimation Engine with 4×4 Block Unit of Interpolator & SAD Tree for 8K UHD H.264/AVC Encoder (8K UHD(7680×4320) H.264/AVC 부호화기를 위한 4×4블럭단위 보간 필터 및 SAD트리 기반 부화소 움직임 추정 엔진 설계)

  • Lee, Kyung-Ho;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.6
    • /
    • pp.145-155
    • /
    • 2013
  • In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Fractional Motion Estimation in 8K UHD($7680{\times}4320$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $10{\times}10$ reference data for interpolation, we design 2D cache buffer which consists of the $10{\times}10$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The gate count is 436.5Kgates. The proposed H.264/AVC Fractional Motion Estimation can support 8K UHD at 30 frames per second by running at 187MHz.

The Motion Estimator Implementation with Efficient Structure for Full Search Algorithm of Variable Block Size (다양한 블록 크기의 전역 탐색 알고리즘을 위한 효율적인 구조를 갖는 움직임 추정기 설계)

  • Hwang, Jong-Hee;Choe, Yoon-Sik
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.66-76
    • /
    • 2009
  • The motion estimation in video encoding system occupies the biggest part. So, we require the motion estimator with efficient structure for real-time operation. And for motion estimator's implementation, it is desired to design hardware module of an exclusive use that perform the encoding process at high speed. This paper proposes motion estimation detection block(MED), 41 SADs(Sum of Absolute Difference) calculation block, minimum SAD calculation and motion vector generation block based on parallel processing. The parallel processing can reduce effectively the amount of the operation. The minimum SAD calculation and MED block uses the pre-computation technique for reducing switching activity of the input signal. It results in high-speed operation. The MED and 41 SADs calculation blocks are composed of adder tree which causes the problem of critical path. So, the structure of adder tree has changed the most commonly used ripple carry adder(RCA) with carry skip adder(CSA). It enables adder tree to operate at high speed. In addition, as we enabled to easily control key variables such as control signal of search range from the outside, the efficiency of hardware structure increased. Simulation and FPGA verification results show that the delay of MED block generating the critical path at the motion estimator is reduced about 19.89% than the conventional strukcture.

Adaptive Search Range Decision for Accelerating GPU-based Integer-pel Motion Estimation in HEVC Encoders (HEVC 부호화기에서 GPU 기반 정수화소 움직임 추정을 고속화하기 위한 적응적인 탐색영역 결정 방법)

  • Kim, Sangmin;Lee, Dongkyu;Sim, Dong-Gyu;Oh, Seoung-Jun
    • Journal of Broadcast Engineering
    • /
    • v.19 no.5
    • /
    • pp.699-712
    • /
    • 2014
  • In this paper, we propose a new Adaptive Search Range (ASR) decision algorithm for accelerating GPU-based Integer-pel Motion Estimation (IME) of High Efficiency Video Coding (HEVC). For deciding the ASR, we classify a frame into two models using Motion Vector Differences (MVDs) then adaptively decide the search ranges of each model. In order to apply the proposed algorithm to the GPU-based ME process, starting points of the ME are decided using only temporal Motion Vectors (MVs). The CPU decides the ASR as well as the starting points and transfers them to the GPU. Then, the GPU performs the integer-pel ME. The proposed algorithm reduces the total encoding time by 37.9% with BD-rate increase of 1.1% and yields 951.2 times faster ME against the CPU-based anchor. In addition, the proposed algorithm achieves the time reduction of 57.5% in the ME running time with the negligible coding loss of 0.6%, compared with the simple GPU-based ME without ASR decision.

Implementation of LDPC Decoder using High-speed Algorithms in Standard of Wireless LAN (무선 랜 규격에서의 고속 알고리즘을 이용한 LDPC 복호기 구현)

  • Kim, Chul-Seung;Kim, Min-Hyuk;Park, Tae-Doo;Jung, Ji-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.12
    • /
    • pp.2783-2790
    • /
    • 2010
  • In this paper, we first review LDPC codes in general and a belief propagation algorithm that works in logarithm domain. LDPC codes, which is chosen 802.11n for wireless local access network(WLAN) standard, require a large number of computation due to large size of coded block and iteration. Therefore, we presented three kinds of low computational algorithms for LDPC codes. First, sequential decoding with partial group is proposed. It has the same H/W complexity, and fewer number of iterations are required with the same performance in comparison with conventional decoder algorithm. Secondly, we have apply early stop algorithm. This method reduces number of unnecessary iterations. Third, early detection method for reducing the computational complexity is proposed. Using a confidence criterion, some bit nodes and check node edges are detected early on during decoding. Through the simulation, we knew that the iteration number are reduced by half using subset algorithm and early stop algorithm is reduced more than one iteration and computational complexity of early detected method is about 30% offs in case of check node update, 94% offs in case of check node update compared to conventional scheme. The LDPC decoder have been implemented in Xilinx System Generator and targeted to a Xilinx Virtx5-xc5vlx155t FPGA. When three algorithms are used, amount of device is about 45% off and the decoding speed is about two times faster than convectional scheme.

Real-time Stereo Video Generation using Graphics Processing Unit (GPU를 이용한 실시간 양안식 영상 생성 방법)

  • Shin, In-Yong;Ho, Yo-Sung
    • Journal of Broadcast Engineering
    • /
    • v.16 no.4
    • /
    • pp.596-601
    • /
    • 2011
  • In this paper, we propose a fast depth-image-based rendering method to generate a virtual view image in real-time using a graphic processor unit (GPU) for a 3D broadcasting system. Before the transmission, we encode the input 2D+depth video using the H.264 coding standard. At the receiver, we decode the received bitstream and generate a stereo video using a GPU which can compute in parallel. In this paper, we apply a simple and efficient hole filling method to reduce the decoder complexity and reduce hole filling errors. Besides, we design a vertical parallel structure for a forward mapping process to take advantage of the single instruction multiple thread structure of GPU. We also utilize high speed GPU memories to boost the computation speed. As a result, we can generate virtual view images 15 times faster than the case of CPU-based processing.

A Study on High Speed LDPC Decoder Algorithm based on dc saperation (dc 분리 기반의 고속 LDPC 복호 알고리즘에 관한 연구)

  • Kwon, Hae-Chan;Kim, Tae-Hoon;Jung, Ji-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.9
    • /
    • pp.2041-2047
    • /
    • 2013
  • In this paper, we proposed high speed LDPC decoding algorithm based on DVB-S2 standard. For implementing the high speed LDPC decoder, HSS algorithm which reduce the iteration numbers without performance degradation is applied. In HSS algorithm, check node update units are update at the same time of bit node update. HSS can be accelerated to the decoding speed because it does not need to separate calculation of the bit nodes, However, check node calculation blocks need many clocks because of just one memory is used. Therefore, this paper proposed dc-split memory structure in order to reduced the delay and high speed decoder is possible. Finally, this paper presented maximum split memory and throughput for various coding rates in DVB-S2 standard.

(Theoretical Performance analysis of 12Mbps, r=1/2, k=7 Viterbi deocder and its implementation using FPGA for the real time performance evaluation) (12Mbps, r=1/2, k=7 비터비 디코더의 이론적 성능분석 및 실시간 성능검증을 위한 FPGA구현)

  • Jeon, Gwang-Ho;Choe, Chang-Ho;Jeong, Hae-Won;Im, Myeong-Seop
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.39 no.1
    • /
    • pp.66-75
    • /
    • 2002
  • For the theoretical performance analysis of Viterbi Decoder for wireless LAN with data rate 12Mbps, code rate 1/2 and constraint length 7 defined in IEEE 802.11a, the transfer function is derived using Cramer's rule and the first-event error probability and bit error probability is derived under the AWGN. In the design process, input symbol is quantized into 16 steps for 4 bit soft decision and register exchange method instead of memory method is proposed for trace back, which enables the majority at the final decision stage. In the implementation, the Viterbi decoder based on parallel architecture with pipelined scheme for processing 12Mbps high speed data rate and AWGN generator are implemented using FPGA chips. And then its performance is verified in real time.

A Study on Horizontal Shuffle Scheduling for High Speed LDPC decoding in DVB-S2 (DVB-S2 기반 고속 LDPC 복호를 위한 Horizontal Shuffle Scheduling 방식에 관한 연구)

  • Lim, Byeong-Su;Kim, Min-Hyuk;Jung, Ji-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.10
    • /
    • pp.2143-2149
    • /
    • 2012
  • DVB-S2 employs LDPC codes which approach to the Shannon's limit, since it has characteristics of a good distance, error floor does not appear. Furthermore it is possible to processes full parallel processing. However, it is very difficult to high speed decoding because of a large block size and number of many iterations. This paper present HSS algorithm to reduce the iteration numbers without performance degradation. In the flooding scheme, the decoder waits until all the check-to-variable messages are updated at all parity check nodes before computing the variable metric and updating the variable-to-check messages. The HSS algorithm is to update the variable metric on a check by check basis in the same way as one code draws benefit from the other. Eventually, LDPC decoding speed based on HSS algorithm improved 30% ~50% compared to conventional one without performance degradation.

Low Computational Complexity LDPC Decoding Algorithms for 802.11n Standard (802.11n 규격에서의 저복잡도 LDPC 복호 알고리즘)

  • Kim, Min-Hyuk;Park, Tae-Doo;Jung, Ji-Won;Lee, Seong-Ro;Jung, Min-A
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.2C
    • /
    • pp.148-154
    • /
    • 2010
  • In this paper, we first review LDPC codes in general and a belief propagation algorithm that works in logarithm domain. LDPC codes, which is chosen 802.11n for wireless local access network(WLAN) standard are required a large number of computation due to large size of coded block and iteration. Therefore, we presented three kinds of low computational algorithm for LDPC codes. First, sequential decoding with partial group is proposed. It has same H/W complexity, and fewer number of iteration's are required at same performance in comparison with conventional decoder algorithm. Secondly, we have apply early stop algorithm. This method is reduced number of unnecessary iteration. Third, early detection method for reducing the computational complexity is proposed. Using a confidence criterion, some bit nodes and check node edges are detected early on during decoding. Through the simulation, we knew that the iteration number are reduced by half using subset algorithm and early stop algorithm is reduced more than one iteration and computational complexity of early detected method is about 30% offs in case of check node update, 94% offs in case of check node update compared to conventional scheme.

A Study on Iterative MAP-Based Decoding of Turbo Code in the Mobile Communication System (이동통신 시스템에서 MAP기반 터보 부호의 복호에 관한 연구)

  • 박노진;강철호
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.2
    • /
    • pp.62-67
    • /
    • 2001
  • In the recent mobile communication systems, the performance of Turbo Code using the error correction coding depends on the interleaver influencing the free distance determination and the recursive decoding algorithms that is executed in the turbo decoder. However, performance depends on the interleaver depth that need a large time delay over the reception process. Moreover, Turbo Code has been known as the robust ending method with the confidence over the fading channel. The International Telecommunication Union(ITU) has recently adopted as the standardization of the channel coding over the third generation mobile communications such as IMT-2000. Therefore, in this paper, we proposed of the method to improve the conventional performance with the parallel concatenated 4-New Turbo Decoder using MAP a1gorithm in spite of complexity increasement. In the real-time video and video service over the third generation mobile communications, the performance of the proposed method was analyzed by the reduced decoding delay using the variable decoding method by computer simulation over AWGN and fading channels.

  • PDF