• Title/Summary/Keyword: Parallelization

Search Result 215, Processing Time 0.032 seconds

Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL (OpenCL을 활용한 CPU와 GPU 에서의 CMMB LDPC 복호기 병렬화)

  • Park, Joo-Yul;Hong, Jung-Hyun;Chung, Ki-Seok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.6
    • /
    • pp.325-334
    • /
    • 2016
  • Recently, Open Computing Language (OpenCL) has been proposed to provide a framework that supports heterogeneous computing platforms. By using an OpenCL framework, digital communication systems can support various protocols in a unified computing environment to achieve both high portability and high performance. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes for China Multimedia Mobile Broadcasting (CMMB) on a heterogeneous platform. Each step of LDPC decoding has different parallelization characteristics. In this paper, steps suitable for task-level parallelization are executed on the CPU, and steps suitable for data-level parallelization are processed by the GPU. To improve the performance of the proposed OpenCL kernels for LDPC decoding operations, explicit thread scheduling, loop-unrolling, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance by using heterogeneous multi-core processors on a unified computing framework.

Parallelization of a Purely Functional Bisimulation Algorithm

  • Ahn, Ki Yung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.11-17
    • /
    • 2021
  • In this paper, we demonstrate a performance boost by parallelizing a purely functional bisimulation algorithm on a multicore processor machine. The key idea of this parallelization is exploiting the referential transparency of purely functional programs to minimize refactoring of the original implementation without any parallel constructs. Both original and parallel implementations are written in Haskell, a purely functional programming language. The change from the original program to the parallel program is minuscule, maintaining almost original structure of the program. Through benchmark, we show that the proposed parallelization doubles the performance of the bisimulation test compared to the original non-parallel implementation. We also shaw that similar performance boost is also possible for a memoized version of the bisimulation implementation.

Multi-Threaded Parallel H.264/AVC Decoder for Multi-Core Systems (멀티코어 시스템을 위한 멀티스레드 H.264/AVC 병렬 디코더)

  • Kim, Won-Jin;Cho, Keol;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.11
    • /
    • pp.43-53
    • /
    • 2010
  • Wide deployment of high resolution video services leads to active studies on high speed video processing. Especially, prevalent employment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. In this paper, we propose a novel parallel H.264/AVC decoding scheme on a multi-core platform. Parallel H.264/AVC decoding is challenging not only because parallelization may incur significant synchronization overhead but also because software may have complicated dependencies. To overcome such issues, we propose a novel approach called Multi-Threaded Parallelization(MTP). In MTP, to reduce synchronization overhead, a separate thread is allocated to each stage in the pipeline. In addition, an efficient memory reuse technique is used to reduce the memory requirement. To verify the effectiveness of the proposed approach, we parallelized FFmpeg H.264/AVC decoder with the proposed technique using OpenMP, and carried out experiments on an Intel Quad-Core platform. The proposed design performs better than FFmpeg H.264/AVC decoder before the parallelization by 53%. We also reduced the amount of memory usage by 65% and 81% for a high-definition(HD) and a full high-definition(FHD) video, respectively compared with that of popular existing method called 2Dwave.

Research on parallelization mechanism of inductively coupled plasma for large area plasma source

  • Lee, Jang-Jae;Kim, Si-Jun;Kim, Gwang-Gi;Lee, Ba-Da;Lee, Yeong-Seok;Yeom, Hui-Jung;Kim, Dae-Ung;Yu, Sin-Jae
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2016.02a
    • /
    • pp.183-183
    • /
    • 2016
  • Inductively coupled plasma having the high-density is often used for high productivity in the plasma processing. In large area processing, the plasma can be generated by using the multi-pole connected in parallel. However, in case of this, the power cannot transfer to plasma uniformly. To address the problem, we studied the mechanism of inductively coupled plasma connected in parallel by using transformer model. We also studied about the change of the plasma parameters over the time through the power balance equation and particle balance equation.

  • PDF

A Data Dependency Elimination Algorithm for Extracting Maximum Parallelism (최대 병렬성 추출을 위한 자료 종속성 제거 알고리즘)

  • 송월봉;박두순
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.1
    • /
    • pp.139-139
    • /
    • 1999
  • In most application programs, loops usually comprise most of the computation in a program and the most important source of parallelism. When the data dependency relation is uniformin terms of distance, several compile time parallelization methods were introduced. On the otherhand,when the data dependency relation is non-uniform in distance, the compile time extraction ofparallelism is much complicated. In this paper, a general method the extracting parallelism in nestedloops is presented. This algorithm can be applicable where the dependency relation is both uniform andnon-uniform in distance. According to execution repeatedly the statements in nested loops, thealgorithm which effectively removes these kind of data dependencies is developed in order to presentthe total parallelization of nested loops.

A Technique for Fast Process Creation Based on Creation Location

  • Kim, Byung-Jin;Ahn, Young-Ho;Chung, Ki-Seok
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.4
    • /
    • pp.283-287
    • /
    • 2011
  • Due to the proliferation of software parallelization on multi-core CPUs, the number of concurrently executing processes is rapidly increasing. Unlike processes running in a server environment, those executing in a multi-core desktop or a multi-core mobile platform have various correlations. Therefore, it is crucial to consider correlations among concurrently running processes. In this paper, we exploit the property that for a given created location in the binary image of the parent process, the average running time of child processes residing in the run-queue differs. We claim that this property can be exploited to improve the overall system performance by running processes that have a relatively short running time before those with a longer running time. Experimental results verified that the running time was actually improved by 11%.

High-Performance Computer-Generated Hologram by Optimized Implementation of Parallel GPGPUs

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Yoo, Ji-Sang;Kim, Dong-Wook
    • Journal of the Optical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.698-705
    • /
    • 2014
  • We propose a new development for calculating a computer-generated hologram (CGH) through the use of multiple general-purpose graphics processing units (GPGPUs). For optimization of the implementation, CGH parallelization, object point tiling, memory selection for object point, hologram tiling, CGMA (compute to global memory access) ratio by block size, and memory mapping were considered. The proposed CGH was equipped with a digital holographic video system consisting of a camera system for capturing images (object points) and CPU/GPGPU software (S/W) for various image processing activities. The proposed system can generate about 37 full HD holograms per second using about 6K object points.

Parallelization of an Unstructured Implicit Euler Solver (내재적 방법을 이용한 비정렬 유동해석 기법의 병렬화)

  • Kim J. S.;Kang H. J.;Park Y. M.;Kwon O. J.
    • Journal of computational fluids engineering
    • /
    • v.5 no.2
    • /
    • pp.20-27
    • /
    • 2000
  • An unstructured implicit Euler solver is parallelized on a Cray T3E. Spatial discretization is accomplished by a cell-centered finite volume formulation using an upwind flux differencing. Time is advanced by the Gauss-Seidel implicit scheme. Domain decomposition is accomplished by using the k-way n-partitioning method developed by Karypis. In order to analyze the parallel performance of the solver, flows over a 2-D NACA 0012 airfoil and 3-D F-5 wing were investigated.

  • PDF

Lane Detection using Embedded Multi-core Platform (임베디드 멀티코어 플랫폼을 이용한 차선검출)

  • Lee, Kwang-Yeob;Kim, Dong-Han;Park, Tae-Ryoung
    • Journal of IKEEE
    • /
    • v.15 no.3
    • /
    • pp.255-260
    • /
    • 2011
  • In this paper, we propose a parallelization technique in lane detection by using Hough transform. Hough transform has a weakness that it has a lot computation quantity, because it has to compute ${\rho}$ value in all candidate ${\Theta}$ to be detected in an image. We propose an architecture of parallel processing for this transform in a multi-core environment. The parallel processing has application to Hough transform as well as noise reduction and edge detection. This proposed architecture has 5.17 times improvement in performance compare to the existing algorithm.

Parallelization of an Unstructured Implicit Euler Solver (내재적 방법을 이용한 비정렬 유동해석 기법의 병렬화)

  • Kim J. S.;Kang H. J.;Park Y. M.;Kwon O. J.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 1999.11a
    • /
    • pp.193-200
    • /
    • 1999
  • An unstructured implicit Euler solver is parallelized on a Cray T3E. Spatial discretization is accomplished by a cell-centered finite volume formulation using an unpwind flux differencing. Time is advanced by the Gauss-Seidel implicit scheme. Domain decomposition is accomplished by using the k-way N-partitioning method developed by Karypis. In order to analyze the parallel performance of the solver, flows over a 2-D NACA 0012 airfoil and a 3-D F-5 wing were investigated.

  • PDF