• Title/Summary/Keyword: Parallel Decoding

Search Result 152, Processing Time 0.021 seconds

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.

An Efficient Interpolation Hardware Architecture for HEVC Inter-Prediction Decoding

  • Jin, Xianzhe;Ryoo, Kwangki
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.2
    • /
    • pp.118-123
    • /
    • 2013
  • This paper proposes an efficient hardware architecture for high efficiency video coding (HEVC), which is the next generation video compression standard. It adopts several new coding techniques to reduce the bit rate by about 50% compared with the previous one. Unlike the previous H.264/AVC 6-tap interpolation filter, in HEVC, a one-dimensional seven-tap and eight-tap filter is adopted for luma interpolation, but it also increases the complexity and gate area in hardware implementation. In this paper, we propose a parallel architecture to boost the interpolation performance, achieving a luma $4{\times}4$ block interpolation in 2-4 cycles. The proposed architecture contains shared operations reducing the gate count increased due to the parallel architecture. This makes the area efficiency better than the previous design, in the best case, with the performance improved by about 75.15%. It is synthesized with the MagnaChip $0.18{\mu}m$ library and can reach the maximum frequency of 200 MHz.

A Design of Pipelined-parallel CABAC Decoder Adaptive to HEVC Syntax Elements (HEVC 구문요소에 적응적인 파이프라인-병렬 CABAC 복호화기 설계)

  • Bae, Bong-Hee;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.5
    • /
    • pp.155-164
    • /
    • 2015
  • This paper describes a design and implementation of CABAC decoder, which would handle HEVC syntax elements in adaptively pipelined-parallel computation manner. Even though CABAC offers the high compression rate, it is limited in decoding performance due to context-based sequential computation, and strong data dependency between context models, as well as decoding procedure bin by bin. In order to enhance the decoding computation of HEVC CABAC, the flag-type syntax elements are adaptively pipelined by precomputing consecutive flag-type ones; and multi-bin syntax elements are decoded by processing bins in parallel up to three. Further, in order to accelerate Binary Arithmetic Decoder by reducing the critical path delay, the update and renormalization of context modeling are precomputed parallel for the cases of LPS as well as MPS, and then the context modeling renewal is selected by the precedent decoding result. It is simulated that the new HEVC CABAC architecture could achieve the max. performance of 1.01 bins/cycle, which is two times faster with respect to the conventional approach. In ASIC design with 65nm library, the CABAC architecture would handle 224 Mbins/sec, which could decode QFHD HEVC video data in real time.

Tile Partitioning-based HEVC Parallel Decoding Optimization for Asymmetric Multicore Processor (비대칭 멀티코어 시스템 상의 HEVC 병렬 디코딩 최적화를 위한 타일 분할 기법)

  • Ryu, Yeongil;Roh, Hyun-Joon;Ryu, Eun-Seok
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1060-1065
    • /
    • 2016
  • Recently, there is an emerging need for parallel UHD video processing, and the usage of computing systems that have an asymmetric processor such as ARM big.LITTLE is actively increasing. Thus, a new parallel UHD video processing method that is optimized for the asymmetric multicore systems is needed. This paper proposes a novel HEVC tile partitioning method for parallel processing by analyzing the computational power of asymmetric multicores. The proposed method analyzes (1) the computing power of asymmetric multicores and (2) the regression model of computational complexity per video resolution. Finally, the model (3) determines the optimal HEVC tile resolution for each core and partitions/allocates the tiles to suitable cores. The proposed method minimizes the gap in the decoding time between the fastest CPU core and the slowest CPU core. Experimental results with the 4K UHD official test sequences show average 20% improvement in the decoding speedup on the ARM asymmetric multicore system.

A Design of Parallel Turbo Decoder based on Double Flow Method Using Even-Odd Cross Mapping (짝·홀 교차 사상을 이용한 Double Flow 기법 기반 병렬 터보 복호기 설계)

  • Jwa, Yu-Cheol;Rim, Chong-Suck
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.7
    • /
    • pp.36-46
    • /
    • 2017
  • The turbo code, an error correction code, needs a long decoding time since the same decoding process must be repeated several times in order to obtain a good BER performance. Thus, parallel processing may be used to reduce the decoding time, in which case there may be a memory contention that requires additional buffers. The QPP interleaving has been proposed to avoid such case, but there is still a possibility of memory contention when a decoder is constructed using the so-called double flow technique. In this paper, we propose an even-odd cross mapping technique to avoid memory conflicts even in decoding using the double-flow technique. This method uses the address generation characteristic of the QPP interleaving and can be used to implement the interleaving circuit between the decoding blocks and the LLR memory blocks. When the decoder implemented by applying the double flow and the proposed methods is compared with the decoder by the conventional MDF techniques, the decoding time is reduced by up to 32% with the total area increase by 8%.

Fabrication of a Low Power Parallel Analog Processing Viterbi Decoder for PRML Signal (PRML 신호용 저 전력 아날로그 병렬처리 비터비 디코더 개발)

  • Kim Hyun-Jung;Son Hong-Rak;Kim Hyong-Suk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.6 s.348
    • /
    • pp.38-46
    • /
    • 2006
  • A parallel analog Viterbi decoder which decodes PRML signal of DVD has been fabricated into a VLSI chip. The parallel analog Viterbi decoder implements the functions of the conventional digital Viterbi decoder utilizing the analog parallel processing circuit technology. In this paper, the analog parallel Viterbi decoding technology is applied for the PRML signal decoding of DVD. The benefits are low power consumption and less silicon consumption. The designed circuits are analysed and the test results of the fabricated chip are reported.

An FPGA Implementation of High-Speed Adaptive Turbo Decoder

  • Kim, Min-Huyk;Jung, Ji-Won;Bae, Jong-Tae;Choi, Seok-Soon;Lee, In-Ki
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.4C
    • /
    • pp.379-388
    • /
    • 2007
  • In this paper, we propose an adaptive turbo decoding algorithm for high order modulation scheme combined with originally design for a standard rate-1/2 turbo decoder for B/QPSK modulation. A transformation applied to the incoming I-channel and Q-channel symbols allows the use of an off-the-shelf B/QPSK turbo decoder without any modifications. Adaptive turbo decoder process the received symbols recursively to improve the performance. As the number of iterations increase, the execution time and power consumption also increase as well. The source of the latency and power consumption reduction is from the combination of the radix-4, dual-path processing, parallel decoding, and early-stop algorithms. We implemented the proposed scheme on a field-programmable gate array (FPGA) and compared its decoding speed with that of a conventional decoder. From the result of implementation, we confirm that the decoding speed of proposed adaptive decoding is faster than conventional scheme by 6.4 times.

Subsidiary Maximum Likelihood Iterative Decoding Based on Extrinsic Information

  • Yang, Fengfan;Le-Ngoc, Tho
    • Journal of Communications and Networks
    • /
    • v.9 no.1
    • /
    • pp.1-10
    • /
    • 2007
  • This paper proposes a multimodal generalized Gaussian distribution (MGGD) to effectively model the varying statistical properties of the extrinsic information. A subsidiary maximum likelihood decoding (MLD) algorithm is subsequently developed to dynamically select the most suitable MGGD parameters to be used in the component maximum a posteriori (MAP) decoders at each decoding iteration to derive the more reliable metrics performance enhancement. Simulation results show that, for a wide range of block lengths, the proposed approach can enhance the overall turbo decoding performance for both parallel and serially concatenated codes in additive white Gaussian noise (AWGN), Rician, and Rayleigh fading channels.

Proposal for Decoding-Compatible Parallel Deflate Algorithm by Inserting Control Header Composed of Non-Compressed Blocks (비 압축 블록으로 구성된 제어 헤더 삽입을 통한 압축 해제 호환성 있는 병렬 처리 Deflate 알고리즘 제안)

  • Kim Jung Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.5
    • /
    • pp.207-216
    • /
    • 2023
  • For decoding-compatible parallel Deflate algorithm, this study proposed a new method of the control header being made in such a way that essential information for parallel compression and decompression are stored in the Disposed Bit Area (DBA) of the non-compression block and being inserted into the compressed blocks. Through this, parallel compression and decompression are possible while maintaining perfect compatibility with the existing decoder. After applying this method, the compression time was reduced by up to 71.2% compared to the sequential processing method, and the parallel decompression time was reduced by up to 65.7%. In particular, it is well known that parallel decompression is impossible due to the structural limitations of the Deflate algorithm. However, the decoder equipped with the proposed method enables high-speed parallel decompression at the algorithm level and maintains compatibility, so that parallelly compressed data can be decoded normally by existing decoder programs.

The Construction and Viterbi Decoding of New (2k, k, l) Convolutional Codes

  • Peng, Wanquan;Zhang, Chengchang
    • Journal of Information Processing Systems
    • /
    • v.10 no.1
    • /
    • pp.69-80
    • /
    • 2014
  • The free distance of (n, k, l) convolutional codes has some connection with the memory length, which depends on not only l but also on k. To efficiently obtain a large memory length, we have constructed a new class of (2k, k, l) convolutional codes by (2k, k) block codes and (2, 1, l) convolutional codes, and its encoder and generation function are also given in this paper. With the help of some matrix modules, we designed a single structure Viterbi decoder with a parallel capability, obtained a unified and efficient decoding model for (2k, k, l) convolutional codes, and then give a description of the decoding process in detail. By observing the survivor path memory in a matrix viewer, and testing the role of the max module, we implemented a simulation with (2k, k, l) convolutional codes. The results show that many of them are better than conventional (2, 1, l) convolutional codes.