• 제목/요약/키워드: Parallelization

Search Result 212, Processing Time 0.033 seconds

A Study on the Automatic Parallelization Method and Tool Development

  • Shin, Woochang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.87-94
    • /
    • 2020
  • Recently, computer hardware is evolving toward increasing the number of computing cores, not increasing the clock speed. In order to use the performance of parallelized hardware to the maximum, the running program must also be parallelized. However, software developers are accustomed to sequential programs, and in most cases, write programs that operate sequentially. They also have a lot of difficulty designing and developing software in parallel. We propose a method to automatically convert a sequential C/C++ program into a parallelized program, and develop a parallelization tool that supports it. It supports open multiprocessing (OpenMP) and parallel patterns library (PPL) as a parallel framework. Perfect automatic parallelization is difficult due to dynamic features such as pointer operation and polymorphism in C/C++ language. This study focuses on verifying the conditions of parallelization rather than focusing on fully automatic parallelization, and providing advice to developers in detail if parallelization is not possible.

Performance Analysis of HEVC Parallelization Methods for High-Resolution Videos

  • Ryu, Hochan;Ahn, Yong-Jo;Mok, Jung-Soo;Sim, Donggyu
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.1
    • /
    • pp.28-34
    • /
    • 2015
  • Several parallelization methods that can be applied to High Efficiency Video Coding (HEVC) decoders are evaluated. The market requirements of high-resolution videos, such as Full HD and UHD, have been increasing. To satisfy the market requirements, several parallelization methods for HEVC decoders have been studied. Understanding these parallelization methods and objective comparisons of these methods are crucial to the real-time decoding of high-resolution videos. This paper introduces the parallelization methods that can be used in HEVC decoders and evaluates the parallelization methods comparatively. The experimental results show that the average speed-up factors of tile-level parallelism, wavefront parallel processing (WPP), frame-level parallelism, and 2D-wavefront parallelism are observed up to 4.59, 4.00, 2.20, and 3.16, respectively.

Computing Performance Comparison of CPU and GPU Parallelization for Virtual Heart Simulation (가상 심장 시뮬레이션에서 CPU와 GPU 병렬처리의 계산 성능 비교)

  • Kim, Sang Hee;Jeong, Da Un;Setianto, Febrian;Lim, Ki Moo
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.3
    • /
    • pp.128-137
    • /
    • 2020
  • Cardiac electrophysiology studies often use simulation to predict how cardiac will behave under various conditions. To observe the cardiac tissue movement, it needs to use the high--resolution heart mesh with a sophisticated and large number of nodes. The higher resolution mesh is, the more computation time is needed. To improve computation speed and performance, parallel processing using multi-core processes and network computing resources is performed. In this study, we compared the computational speeds of CPU parallelization and GPU parallelization in virtual heart simulation for efficiently calculating a series of ordinary differential equations (ODE) and partial differential equations (PDE) and determined the optimal CPU and GPU parallelization architecture. We used 2D tissue model and 3D ventricular model to compared the computation performance. Then, we measured the time required to the calculation of ODEs and PDEs, respectively. In conclusion, for the most efficient computation, using GPU parallelization rather than CPU parallelization can improve performance by 4.3 times and 2.3 times in calculations of ODEs and PDE, respectively. In CPU parallelization, it is best to use the number of processors just before the communication cost between each processor is incurred.

A Parallelization Technique with Integrated Multi-Threading for Video Decoding on Multi-core Systems

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.10
    • /
    • pp.2479-2496
    • /
    • 2013
  • Increasing demand for Full High-Definition (FHD) video and Ultra High-Definition (UHD) video services has led to active research on high speed video processing. Widespread deployment of multi-core systems has accelerated studies on high resolution video processing based on parallelization of multimedia software. Even if parallelization of a specific decoding step may improve decoding performance partially, such partial parallelization may not result in sufficient performance improvement. Particularly, entropy decoding has often been considered separately from other decoding steps since the entropy decoding step could not be parallelized easily. In this paper, we propose a parallelization technique called Integrated Multi-Threaded Parallelization (IMTP) which takes parallelization of the entropy decoding step, with other decoding steps, into consideration in an integrated fashion. We used the Simultaneous Multi-Threading (SMT) technique with appropriate thread scheduling techniques to achieve the best performance for the entire decoding step. The speedup of the proposed IMTP method is up to 3.35 times faster with respect to the entire decoding time over a conventional decoding technique for H.264/AVC videos.

On-line Trace Based Automatic Parallelization of Java Programs on Multicore Platforms

  • Sun, Yu;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.2
    • /
    • pp.105-118
    • /
    • 2012
  • We propose two new approaches that automatically parallelize Java programs at runtime. These approaches, which rely on run-time trace information collected during program execution, dynamically recompile Java byte code that can be executed in parallel. One approach utilizes trace information to improve traditional loop parallelization, and the other parallelizes traces instead of loop iterations. We also describe a cost/benefit model that makes intelligent parallelization decisions, as well as a parallel execution environment to execute parallelized programs. These techniques are based on Jikes RVM. Our approach is evaluated by parallelizing sequential Java programs, and its performance is compared to that of the manually parallelized code. According to the experimental results, our approach has low overheads and achieves competitive speedups compared to the manually parallelizing code. Moreover, trace parallelization can exploit parallelism beyond loop iterations.

Locality-Conscious Nested-Loops Parallelization

  • Parsa, Saeed;Hamzei, Mohammad
    • ETRI Journal
    • /
    • v.36 no.1
    • /
    • pp.124-133
    • /
    • 2014
  • To speed up data-intensive programs, two complementary techniques, namely nested loops parallelization and data locality optimization, should be considered. Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor. Therefore, locality and parallelization may demand different loop transformations. As such, an integrated approach that combines these two can generate much better results than each individual approach. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate loop transformation. Applying this transformation results in coarse grain parallelism through exploiting the largest possible groups of outer permutable loops in addition to data locality through dependence satisfaction at inner loops. These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions.

Parallelization of a Two-Dimensional Navier-Stokes Solver Using Hybrid Meshes (혼합격자를 이용한 2차원 난류 유동장 해석 프로그램의 병렬화)

  • Ok Honam;Park Seung-O
    • 한국전산유체공학회:학술대회논문집
    • /
    • 1999.11a
    • /
    • pp.115-126
    • /
    • 1999
  • A two-dimensional Navier-Stokes solver using hybrid meshes is parallelized with a domain decompostion method. The focus of this paper is placed on minimizing the amount of effort in parallelizing the serial version of the solver, and this is achieved by adding an additional layer of cells to each decomposed domain. Most subroutines of the serial solver are used without modification, and the information exchange between neighboring domains is achieved using MPI(Message Passing Interface) library. Load balancing among the processors and scheduling of the message passing are implemented to reduce the overhead of parallelization, and the speed-up achieved by parallelization is measured on the transonic invisicd and turbulent flow problems. The parallelization efficiencies of the explicit Runge-Kutta scheme and the implicit point-SGS scheme are compared and the effects of various factors on the results are also studied.

  • PDF

Data Level Parallelism for H.264/AVC Decoder on a Multi-Core Processor and Performance Analysis (멀티코어 프로세서에서의 H.264/AVC 디코더를 위한 데이터 레벨 병렬화 성능 예측 및 분석)

  • Cho, Han-Wook;Jo, Song-Hyun;Song, Yong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.102-116
    • /
    • 2009
  • There have been lots of researches for H.264/AVC performance enhancement on a multi-core processor. The enhancement has been performed through parallelization methods. Parallelization methods can be classified into a task-level parallelization method and a data level parallelization method. A task-level parallelization method for H.264/AVC decoder is implemented by dividing H.264/AVC decoder algorithms into pipeline stages. However, it is not suitable for complex and large bitstreams due to poor load-balancing. Considering load-balancing and performance scalability, we propose a horizontal data level parallelization method for H.264/AVC decoder in such a way that threads are assigned to macroblock lines. We develop a mathematical performance expectation model for the proposed parallelization methods. For evaluation of the mathematical performance expectation, we measured the performance with JM 13.2 reference software on ARM11 MPCore Evaluation Board. The cycle-accurate measurement with SoCDesigner Co-verification Environment showed that expected performance and performance scalability of the proposed parallelization method was accurate in relatively high level

Integrated Parallelization of Video Decoding on Multi-core Systems (멀티코어 시스템에서의 통합된 비디오 디코딩 병렬화)

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.7
    • /
    • pp.39-49
    • /
    • 2012
  • Demand for high resolution video services leads to active studies on high speed video processing. Especially, widespread deployment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. Previously proposed parallelization approach could improve the decoding performance. However, some parallelization methods did not consider the entropy decoding and others considered only a partial decoding parallelization. Therefore, we consider parallel entropy decoding integrated with other parallel video decoding process on a multi-core system. We propose a novel parallel decoding method called Integrated Parallelization. We propose a method on how to optimize the parallelization of video decoding when we have a multi-core system with many cores. We parallelized the KTA 2.7 decoder with the proposed technique on an Intel i7 Quad-Core platform with Intel Hyper-Threading technology and multi-threads scheduling. We achieved up to 70% performance improvement using IP method.

MPI-OpenMP Hybrid Parallelization for Multibody Peridynamic Simulations (다물체 페리다이나믹 해석을 위한 MPI-OpenMP 혼합 병렬화)

  • Lee, Seungwoo;Ha, Youn Doh
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.33 no.3
    • /
    • pp.171-178
    • /
    • 2020
  • In this study, we develop MPI-OpenMP hybrid parallelization for multibody peridynamic simulations. Peridynamics is suitable for analyzing complicated dynamic fractures and various discontinuities. However, compared with a conventional finite element method, nonlocal interactions in peridynamics cost more time and memory. In multibody peridynamic analysis, the costs increase due to the additional interactions that occur when computing the nonlocal contact and ghost interlayer models between adjacent bodies. The costs become excessive when further refinement and smaller time steps are required in cases of high-velocity impact fracturing or similar instances. Thus, high computational efficiency and performance can be achieved by parallelization and optimization of multibody peridynamic simulations. The analytical code is developed using an Intel Fortran MPI compiler and OpenMP in NURION of the KISTI HPC center and parallelized through MPI-OpenMP hybrid parallelization. Further parallelization is conducted by hybridizing with OpenMP threads in each MPI process. We also try to minimize communication operations by model-based decomposition of MPI processes. The numerical results for the impact fracturing of multiple bodies show that the computing performance improves significantly with MPI-OpenMP hybrid parallelization.