• Title/Summary/Keyword: Parallel coding

Search Result 161, Processing Time 0.03 seconds

An Efficient Interpolation Hardware Architecture for HEVC Inter-Prediction Decoding

  • Jin, Xianzhe;Ryoo, Kwangki
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.2
    • /
    • pp.118-123
    • /
    • 2013
  • This paper proposes an efficient hardware architecture for high efficiency video coding (HEVC), which is the next generation video compression standard. It adopts several new coding techniques to reduce the bit rate by about 50% compared with the previous one. Unlike the previous H.264/AVC 6-tap interpolation filter, in HEVC, a one-dimensional seven-tap and eight-tap filter is adopted for luma interpolation, but it also increases the complexity and gate area in hardware implementation. In this paper, we propose a parallel architecture to boost the interpolation performance, achieving a luma $4{\times}4$ block interpolation in 2-4 cycles. The proposed architecture contains shared operations reducing the gate count increased due to the parallel architecture. This makes the area efficiency better than the previous design, in the best case, with the performance improved by about 75.15%. It is synthesized with the MagnaChip $0.18{\mu}m$ library and can reach the maximum frequency of 200 MHz.

Self-Checking Look-up Tables using Scalable Error Detection Coding (SEDC) Scheme

  • Lee, Jeong-A;Siddiqui, Zahid Ali;Somasundaram, Natarajan;Lee, Jeong-Gun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.13 no.5
    • /
    • pp.415-422
    • /
    • 2013
  • In this paper, we present Self-Checking look-up-table (LUT) based on Scalable Error Detection Coding (SEDC) scheme for use in fault-tolerant reconfigurable architectures. SEDC scheme has shorter latency than any other existing coding schemes for all unidirectional error detection and the LUT execution time remains unaffected with self-checking capabilities. SEDC scheme partitions the contents of LUT into combinations of 1-, 2-, 3- and 4-bit segments and generates corresponding check codes in parallel. We show that the proposed LUT with SEDC performs better than LUT with traditional Berger as well as Partitioned Berger Coding schemes. For 32-bit data, LUT with SEDC takes 39% less area and 6.6 times faster for self-checking than LUT with traditional Berger Coding scheme.

Comparison of Parallelized Network Coding Performance (네트워크 코딩의 병렬처리 성능비교)

  • Choi, Seong-Min;Park, Joon-Sang;Ahn, Sang-Hyun
    • The KIPS Transactions:PartC
    • /
    • v.19C no.4
    • /
    • pp.247-252
    • /
    • 2012
  • Network coding has been shown to improve various performance metrics in network systems. However, if network coding is implemented as software a huge time delay may be incurred at encoding/decoding stage so it is imperative for network coding to be parallelized to reduce time delay when encoding/decoding. In this paper, we compare the performance of parallelized decoders for random linear network coding (RLC) and pipeline network coding (PNC), a recent development in order to alleviate problems of RLC. We also compare multi-threaded algorithms on multi-core CPUs and massively parallelized algorithms on GPGPU for PNC/RLC.

High Throughput Parallel Decoding Method for H.264/AVC CAVLC

  • Yeo, Dong-Hoon;Shin, Hyun-Chul
    • ETRI Journal
    • /
    • v.31 no.5
    • /
    • pp.510-517
    • /
    • 2009
  • A high throughput parallel decoding method is developed for context-based adaptive variable length codes. In this paper, several new design ideas are devised and implemented for scalable parallel processing, a reduction in area, and a reduction in power requirements. First, simplified logical operations instead of memory lookups are used for parallel processing. Second, the codes are grouped based on their lengths for efficient logical operation. Third, up to M bits of the input stream can be analyzed simultaneously. For comparison, we designed a logical-operation-based parallel decoder for M=8 and a conventional parallel decoder. High-speed parallel decoding becomes possible with our method. In addition, for similar decoding rates (1.57 codes/cycle for M=8), our new approach uses 46% less chip area than the conventional method.

Fast Distributed Video Coding using Parallel LDPCA Encoding (병렬 LDPCA 채널코드 부호화 방법을 사용한 고속 분산비디오부호화)

  • Park, Jong-Bin;Jeon, Byeung-Woo
    • Journal of Broadcast Engineering
    • /
    • v.16 no.1
    • /
    • pp.144-154
    • /
    • 2011
  • In this paper, we propose a parallel LDPCA encoding method for fast transform-domain Wyner-Ziv video encoding which is suitable in an ultra fast and low power video encoding. The conventional transform-domain Wyner-Ziv video encoding performs LDPCA channel coding of quantized transform coefficients in bitplane-serial fashion, which takes about 60% of total encoding time, and this computational complexity becomes severer as the bitrate increases. The proposed method binds several bitplanes into one packed message and carries out the LDPCA encoding in parallel. The proposed LDPCA encoding method improves the encoding speed by 8 ~ 55 times. In the experiment, the proposed Wyner-Ziv encoder can encode 700 ~ 2,300 QCIF size frames per second with GOP=64. The method can be applied to the pixel-domain Wyner-Ziv encoder using LDPCA, and has a wide scope of application.

Parallel Method for HEVC Deblocking Filter based on Coding Unit Depth Information (코딩 유닛 깊이 정보를 이용한 HEVC 디블록킹 필터의 병렬화 기법)

  • Jo, Hyun-Ho;Ryu, Eun-Kyung;Nam, Jung-Hak;Sim, Dong-Gyu;Kim, Doo-Hyun;Song, Joon-Ho
    • Journal of Broadcast Engineering
    • /
    • v.17 no.5
    • /
    • pp.742-755
    • /
    • 2012
  • In this paper, we propose a parallel deblocking algorithm to resolve workload imbalance when the deblocking filter of high efficiency video coding (HEVC) decoder is parallelized. In HEVC, the deblocking filter which is one of the in-loop filters conducts two-step filtering on vertical edges first and horizontal edges later. The deblocking filtering can be conducted with high-speed through data-level parallelism because there is no dependency between adjacent edges for deblocking filtering processes. However, workloads would be imbalanced among regions even though the same amount of data for each region is allocated, which causes performance loss of decoder parallelization. In this paper, we solve the problem for workload imbalance by predicting the complexity of deblocking filtering with coding unit (CU) depth information at a coding tree block (CTB) and by allocating the same amount of workload to each core. Experimental results show that the proposed method achieves average time saving (ATS) by 64.3%, compared to single core-based deblocking filtering and also achieves ATS by 6.7% on average and 13.5% on maximum, compared to the conventional uniform data-level parallelism.

Multithread video coding processor for the videophone (동영상 전화기용 다중 스레드 비디오 코딩 프로세서)

  • 김정민;홍석균;이일완;채수익
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.5
    • /
    • pp.155-164
    • /
    • 1996
  • The architecture of a programmable video codec IC is described that employs multiple vector processors in a single chip. The vector processors operate in parallel and communicate with one another through on-chip shared memories. A single scalar control processor schedules each vector processor independently to achieve real-tiem video coding with special vector instructions. With programmable interconnection buses, the proposed architecture performs multi-processing of tasks and data in video coding. Therefore, it can provide good parallelism as well as good programmability. especially, it can operate multithread video coding, which processes several independent image sequences simultaneously. We explain its scheduling, multithred video coding, and vector processor architectures. We implemented a prototype video codec with a 0.8um CMOS cell-based technology for the multi-standard videophone. This codec can execute video encoding and decoding simultaneously for the QCIF image at a frame rate of 30Hz.

  • PDF

Tile-level and Frame-level Parallel Encoding for HEVC (타일 및 프레임 수준의 HEVC 병렬 부호화)

  • Kim, Younhee;Seok, Jinwuk;Jung, Soon-heung;Kim, Huiyong;Choi, Jin Soo
    • Journal of Broadcast Engineering
    • /
    • v.20 no.3
    • /
    • pp.388-397
    • /
    • 2015
  • High Efficiency Video Coding (HEVC)/H.265 is a new video coding standard which is known as high compression ratio compared to the previous standard, Advanced Video Coding (AVC)/H.264. Due to achievement of high efficiency, HEVC sacrifices the time complexity. To apply HEVC to the market applications, one of the key requirements is the fast encoding. To achieve the fast encoding, exploiting thread-level parallelism is widely chosen mechanism since multi-threading is commonly supported based on the multi-core computer architecture. In this paper, we implement both the Tile-level parallelism and the Frame-level parallelism for HEVC encoding on multi-core platform. Based on the implementation, we present two approaches in combining the Tile-level parallelism with Frame-level parallelism. The first approach creates the fixed number of tile per frame while the second approach creates the number of tile per frame adaptively according to the number of frame in parallel and the number of available worker threads. Experimental results show that both improves the parallel scalability compared to the one that use only tile-level parallelism and the second approach achieves good trade-off between parallel scalability and coding efficiency for both Full-HD (1080 x 1920) and 4K UHD (3840 x 2160) sequences.

Load Balancing Based on Transform Unit Partition Information for High Efficiency Video Coding Deblocking Filter

  • Ryu, Hochan;Park, Seanae;Ryu, Eun-Kyung;Sim, Donggyu
    • ETRI Journal
    • /
    • v.39 no.3
    • /
    • pp.301-309
    • /
    • 2017
  • In this paper, we propose a parallelization method for a High Efficiency Video Coding (HEVC) deblocking filter with transform unit (TU) split information. HEVC employs a deblocking filter to boost perceptual quality and coding efficiency. The deblocking filter was designed for data-level parallelism. In this paper, we demonstrate a method of distributing equal workloads to all cores or threads by anticipating the deblocking filter complexity based on the coding unit depth and TU split information. We determined that the average time saving of our proposed deblocking filter parallelization method has a speed-up factor that is 2% better than that of the uniformly distributed parallel deblocking filter, and 6% better than that of coding tree unit row distribution parallelism. In addition, we determined that the speed-up factor of our proposed deblocking filter parallelization method, in terms of percentage run-time, is up to 3.1 compared to the run-time of the HEVC test model 12.0 deblocking filter with a sequential implementation.

Evolutionary Neural Networks based on DNA coding and L-system (DNA Coding 및 L-system에 기반한 진화신경회로망)

  • 이기열;전호병;이동욱;심귀보
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.11a
    • /
    • pp.107-110
    • /
    • 2000
  • In this paper, we propose a method of constructing neural networks using bio-inspired emergent and evolutionary concepts. This method is algorithm that is based on the characteristics of the biological DNA and growth of plants. Here is, we propose a constructing method to make a DNA coding method for production rule of L-system. L-system is based on so-called the parallel rewriting mechanism. The DNA coding method has no limitation in expressing the production rule of L-system. Evolutionary algorithms motivated by Darwinian natural selection are population based searching methods and the high performance of which is highly dependent on the representation of solution space. In order to verify the effectiveness of our scheme, we apply it to one step ahead prediction of Mackey-Glass time series.

  • PDF