• Title/Summary/Keyword: Parallel Encoding

Search Result 101, Processing Time 0.021 seconds

Design and Implementation of Parallel MPEG Encoder with MPI on Cluster System (클러스터환경에서 MPI를 이용한 병렬 MPEG인코더의 설계 및 구현)

  • Lee, Joa-Hyoung;Jung, In-Bum
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.10
    • /
    • pp.1744-1750
    • /
    • 2008
  • As the computing and network technique move rm and spread widly, the usage of multimedia application becomes in general while the usage of text based application becomes low. Especially the application which treats the streaming media such as video or movie, one of multimedia data, holds a majority in the usage of computing. MPEG, one of the typical compression standard of streaming media, provides very high compression ratio so that general users could be close to the streaming media with easy usage. However, the encoding of MPEG requires lots of computing power and time. In the paper, we design and implement a parallel MPEG encoder with MPI in cluster envrionment to reduce the encoding time of MPEG.

An Efficient Parallelization Implementation of PU-level ME for Fast HEVC Encoding (고속 HEVC 부호화를 위한 효율적인 PU레벨 움직임예측 병렬화 구현)

  • Park, Soobin;Choi, Kiho;Park, Sang-Hyo;Jang, Euee Seon
    • Journal of Broadcast Engineering
    • /
    • v.18 no.2
    • /
    • pp.178-184
    • /
    • 2013
  • In this paper, we propose an efficient parallelization technique of PU-level motion estimation (ME) in the next generation video coding standard, high efficiency video coding (HEVC) to reduce the time complexity of video encoding. It is difficult to encode video in real-time because ME has significant complexity (i.e., 80 percent at the encoder). In order to solve this problem, various techniques have been studied, and among them is the parallelization, which is carefully concerned in algorithm-level ME design. In this regard, merge estimation method using merge estimation region (MER) that enables ME to be designed in parallel has been proposed; but, parallel ME based on MER has still unconsidered problems to be implemented ideally in HEVC test model (HM). Therefore, we propose two strategies to implement stable parallel ME using MER in HM. Through experimental results, the excellence of our proposed methods is shown; the encoding time using the proposed method is reduced by 25.64 percent on average of that of HM which uses sequential ME.

Efficient LDPC coding using a hybrid H-matrix

  • Kim Tae Jin;Lee Chan Ho;Yeo Soon Il;Roh Tae Moon
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.473-476
    • /
    • 2004
  • Low-Density Parity-Check (LDPC) codes are recently emerged due to its excellent performance to use. However, the parity check matrices (H) of the previous works are not adequate for hardware implementation of encoders or decoders. This paper proposes a hybrid parity check matrix for partially parallel decoder structures, which is efficient in hardware implementation of both decoders and encoders. Using proposed methods, the encoding design can become practical while keeping the hardware complexity of partially parallel decoder structures.

  • PDF

Design of a High Speed and Low Power CMOS Demultiplexer Using Redundant Multi-Valued Logic (Redundant Multi-Valued Logic을 이용한 고속 및 저전력 CMOS Demultiplexer 설계)

  • Kim, Tae-Sang;Kim, Jeong-Beom
    • Proceedings of the KIEE Conference
    • /
    • 2005.05a
    • /
    • pp.148-151
    • /
    • 2005
  • This paper proposes a high speed interface using redundant multi-valued logic for high speed communication ICs. This circuit is composed of encoding circuit that serial binary data are received and converted into parallel redundant multi-valued data, and decoding circuit that convert redundant multi-valued data to parallel binary data. Because of the multi-valued data conversion, this circuit makes it possible to achieve higher operating speeds than that of a conventional binary logic. Using this logic, a 1:4 demultiplexer (DEMUX, serial-parallel converter) IC was designed using a 0.35${\mu}m$ standard CMOS Process. Proposed demultiplexer is achieved an operating speed of 3Gb/s with a supply voltage of 3.3V and with power consumption of 48mW. Designed circuit is limited by maximum operating frequency of process. Therefore, this circuit is to achieve CMOS communication ICs with an operating speed greater than 3Gb/s in submicron process of high of operating frequency.

  • PDF

A Study on Parallel Processing System for Automatic Segmentation of Moving Object in Image Sequences

  • Lee, Hyung;Park, Jong-Won
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.429-432
    • /
    • 2000
  • The new MPEG-4 video coding standard enables content-based functionalities. In order to support the philosophy of the MPEG-4 visual standard, each frame of video sequences should be represented in terms of video object planes (VOP’s). In other words, video objects to be encoded in still pictures or video sequences should be prepared before the encoding process starts. Therefore, it requires a prior decomposition of sequences into VOP’s so that each VOP represents a moving object. A parallel processing system is required an automatic segmentation to be processed in real-time, because an automatic segmentation is time consuming. This paper addresses the parallel processing: system for an automatic segmentation for separating moving object from the background in image sequences. The proposed parallel processing system comprises of processing elements (PE’s) and a multi-access memory system (MAMS). Multi-access memory system is a memory controller to perform parallel memory access with the variety of types: horizontal, vertical, and block access way. In order to realize these ways, a multi-access memory system consists of a memory module selection module, data routing modules, and an address calculation and routing module. The proposed system is simulated and evaluated by the CADENCE Verilog-XL hardware simulation package.

  • PDF

CPU Parallel Processing and GPU-accelerated Processing of UHD Video Sequence using HEVC (HEVC를 이용한 UHD 영상의 CPU 병렬처리 및 GPU가속처리)

  • Hong, Sung-Wook;Lee, Yung-Lyul
    • Journal of Broadcast Engineering
    • /
    • v.18 no.6
    • /
    • pp.816-822
    • /
    • 2013
  • The latest video coding standard HEVC was developed by the joint work of JCT-VC(Joint Collaborative Team on Video Coding) from ITU-T VCEG and ISO/IEC MPEG. The HEVC standard reduces the BD-Bitrate of about 50% compared with the H.264/AVC standard. However, using the various methods for obtaining the coding gains has increased complexity problems. The proposed method reduces the complexity of HEVC by using both CPU parallel processing and GPU-accelerated processing. The experiment result for UHD($3840{\times}2144$) video sequences achieves 15fps encoding/decoding performance by applying the proposed method. Sooner or later, we expect that the H/W speedup of data transfer rates between CPU and GPU will result in reducing the encoding/decoding times much more.

Design of HEVC CABAC Encoder With Parallel Processing of Bypass Bins (우회 빈의 병렬처리가 가능한 HEVC CABAC 부호화기의 설계)

  • Kim, Doohwan;Moon, Jeonhak;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.19 no.4
    • /
    • pp.583-589
    • /
    • 2015
  • In the HEVC CABAC, the probability model is updated after a bin is encoded and next bin is encoded based on the updated probability model. Conventional CABAC encoders can encode only one bin per cycle, which cannot increase the encoding throughput. The probability model does not need to be updated in the bypass bins. In this paper, a HEVC CABAC encoder is proposed to increase encoding throughput by parallel processing of bypass bins. The designed CABAC encoder can process either a regular bin or maximum 4 bypass bins in a cycle. On the average, it can process 1.15~1.92 bins in a cycle. Synthesized in 0.18 um technology, its gate count, maximum operating speed, and the maximum throughput are 78,698 gates, 136 MHz, and 261 Mbin/s, respectively.

Parallel Branch Instruction Extension for Thumb-2 Instruction Set Architecture (Thumb-2 명령어 집합 구조의 병렬 분기 명령어 확장)

  • Kim, Dae-Hwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.7
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, the parallel branch instruction is proposed which executes a branch instruction and the frequently used instruction simultaneously to improve the performance of Thumb-2 instruction set architecture. In the proposed approach, new 32-bit parallel branch instructions are introduced which combine 16-bit branch instruction with each of the frequently used 16-bit LOAD, ADD, MOV, STORE, and SUB instructions, respectively. To provide the encoding space of the new instructions, the register field in less frequently executed instructions is reduced, and the new instructions are encoded by using the saved bits. Experiments show that the proposed approach improves performance by an average of 8.0% when compared to the conventional approach.

Asynchronous Multiplier with Parallel Array Structure (병렬배열구조를 사용한 비동기 곱셈기)

  • Park, Chan-Ho;Choe, Byeong-Su;Lee, Dong-Ik
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.5
    • /
    • pp.87-94
    • /
    • 2002
  • In this paper an asynchronous away multiplier with a parallel array structure is introduced. This parallel array structure is used to make the computation time faster with a lower Power consumption. Asymmetric parallel away structure is used to minimize the average computation time in an asynchronous multiplier. Simulation shows that this structure reduces the time needed for computation by 55% as compared to conventional booth encoding array structures and that the multiplier with the proposed away structure shows a reduction of 40% in the computational time with a relatively lower power consumption.

Optimizations of 3D MRI Techniques in Brain by Evaluating SENSE Factors (삼차원 자기공명영상법의 뇌 구조 영상을 위한 최적화 연구: 센스인자 변화에 따른 신호변화 평가)

  • Park, Myung-Hwan;Lee, Jin-Wan;Lee, Kang-Won;Ryu, Chang-Woo;Jahng, Geon-Ho
    • Investigative Magnetic Resonance Imaging
    • /
    • v.13 no.2
    • /
    • pp.161-170
    • /
    • 2009
  • Purpose : A parallel imaging method provides us to improve temporal resolution to obtain three-dimensional (3D) MR images. The objective of this study was to optimize three 3D MRI techniques by adjusting 2D SESNE factors of the parallel imaging method in phantom and human brain. Materials and Methods : With a 3 Tesla MRI system and an 8-channel phase-array sensitivity-encoding (SENSE) coil, three 3D MRI techniques of 3D T1-weighted imaging (3D T1WI), 3D T2-weighted imaging (3D T2WI) and 3D fluid attenuated inversion recovery (3D FLAIR) imaging were optimized with adjusting SESNE factors in a water phantom and three human brains. The 2D SENSE factor was applied on the phase-encoding and the slice-encoding directions. Signal-to-noise ratio(SNR), percent signal reduction rate(%R), and contrast-to-noise ratio(CNR) were calculated by using signal intensities obtained in specific regions-of-interest (ROI). Results : In the phantom study, SENSE factor = 3 was provided in 0.2% reduction of signals against without using SENSE with imaging within 5 minutes for 3D T1WI. SENSE factor = 2 was provided in 0.98% signal reduction against without using SENSE with imaging within 5 minutes for 3D T2WI. SENSE factor = 4 was provided in 0.2% signal reduction against without using SENSE with imaging around 6 minutes for 3D FLAIR. In the human brain study, SNR and CNR were higher with SENSE factors = 3 than 4 for all three imaging techniques. Conclusion : This study was performed to optimize 2D SENSE factors in the three 3D MRI techniques that can be scanned in clinical time limitations with minimizing SNR reductions. Without compromising SNR and CNR, the optimum 2D SENSE factors were 3 and 4, yielding the scan time of about 5 to 6 minutes. Further studies are necessary to optimize 3D MRI techniques in other areas in human body.

  • PDF