• Title/Summary/Keyword: Parallelization method

Search Result 91, Processing Time 0.024 seconds

A Study on the Automatic Parallelization Method and Tool Development

  • Shin, Woochang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.87-94
    • /
    • 2020
  • Recently, computer hardware is evolving toward increasing the number of computing cores, not increasing the clock speed. In order to use the performance of parallelized hardware to the maximum, the running program must also be parallelized. However, software developers are accustomed to sequential programs, and in most cases, write programs that operate sequentially. They also have a lot of difficulty designing and developing software in parallel. We propose a method to automatically convert a sequential C/C++ program into a parallelized program, and develop a parallelization tool that supports it. It supports open multiprocessing (OpenMP) and parallel patterns library (PPL) as a parallel framework. Perfect automatic parallelization is difficult due to dynamic features such as pointer operation and polymorphism in C/C++ language. This study focuses on verifying the conditions of parallelization rather than focusing on fully automatic parallelization, and providing advice to developers in detail if parallelization is not possible.

A Parallelization Technique with Integrated Multi-Threading for Video Decoding on Multi-core Systems

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.10
    • /
    • pp.2479-2496
    • /
    • 2013
  • Increasing demand for Full High-Definition (FHD) video and Ultra High-Definition (UHD) video services has led to active research on high speed video processing. Widespread deployment of multi-core systems has accelerated studies on high resolution video processing based on parallelization of multimedia software. Even if parallelization of a specific decoding step may improve decoding performance partially, such partial parallelization may not result in sufficient performance improvement. Particularly, entropy decoding has often been considered separately from other decoding steps since the entropy decoding step could not be parallelized easily. In this paper, we propose a parallelization technique called Integrated Multi-Threaded Parallelization (IMTP) which takes parallelization of the entropy decoding step, with other decoding steps, into consideration in an integrated fashion. We used the Simultaneous Multi-Threading (SMT) technique with appropriate thread scheduling techniques to achieve the best performance for the entire decoding step. The speedup of the proposed IMTP method is up to 3.35 times faster with respect to the entire decoding time over a conventional decoding technique for H.264/AVC videos.

Data Level Parallelism for H.264/AVC Decoder on a Multi-Core Processor and Performance Analysis (멀티코어 프로세서에서의 H.264/AVC 디코더를 위한 데이터 레벨 병렬화 성능 예측 및 분석)

  • Cho, Han-Wook;Jo, Song-Hyun;Song, Yong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.102-116
    • /
    • 2009
  • There have been lots of researches for H.264/AVC performance enhancement on a multi-core processor. The enhancement has been performed through parallelization methods. Parallelization methods can be classified into a task-level parallelization method and a data level parallelization method. A task-level parallelization method for H.264/AVC decoder is implemented by dividing H.264/AVC decoder algorithms into pipeline stages. However, it is not suitable for complex and large bitstreams due to poor load-balancing. Considering load-balancing and performance scalability, we propose a horizontal data level parallelization method for H.264/AVC decoder in such a way that threads are assigned to macroblock lines. We develop a mathematical performance expectation model for the proposed parallelization methods. For evaluation of the mathematical performance expectation, we measured the performance with JM 13.2 reference software on ARM11 MPCore Evaluation Board. The cycle-accurate measurement with SoCDesigner Co-verification Environment showed that expected performance and performance scalability of the proposed parallelization method was accurate in relatively high level

Load Balancing Based on Transform Unit Partition Information for High Efficiency Video Coding Deblocking Filter

  • Ryu, Hochan;Park, Seanae;Ryu, Eun-Kyung;Sim, Donggyu
    • ETRI Journal
    • /
    • v.39 no.3
    • /
    • pp.301-309
    • /
    • 2017
  • In this paper, we propose a parallelization method for a High Efficiency Video Coding (HEVC) deblocking filter with transform unit (TU) split information. HEVC employs a deblocking filter to boost perceptual quality and coding efficiency. The deblocking filter was designed for data-level parallelism. In this paper, we demonstrate a method of distributing equal workloads to all cores or threads by anticipating the deblocking filter complexity based on the coding unit depth and TU split information. We determined that the average time saving of our proposed deblocking filter parallelization method has a speed-up factor that is 2% better than that of the uniformly distributed parallel deblocking filter, and 6% better than that of coding tree unit row distribution parallelism. In addition, we determined that the speed-up factor of our proposed deblocking filter parallelization method, in terms of percentage run-time, is up to 3.1 compared to the run-time of the HEVC test model 12.0 deblocking filter with a sequential implementation.

Integrated Parallelization of Video Decoding on Multi-core Systems (멀티코어 시스템에서의 통합된 비디오 디코딩 병렬화)

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.7
    • /
    • pp.39-49
    • /
    • 2012
  • Demand for high resolution video services leads to active studies on high speed video processing. Especially, widespread deployment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. Previously proposed parallelization approach could improve the decoding performance. However, some parallelization methods did not consider the entropy decoding and others considered only a partial decoding parallelization. Therefore, we consider parallel entropy decoding integrated with other parallel video decoding process on a multi-core system. We propose a novel parallel decoding method called Integrated Parallelization. We propose a method on how to optimize the parallelization of video decoding when we have a multi-core system with many cores. We parallelized the KTA 2.7 decoder with the proposed technique on an Intel i7 Quad-Core platform with Intel Hyper-Threading technology and multi-threads scheduling. We achieved up to 70% performance improvement using IP method.

Parallelization of a Two-Dimensional Navier-Stokes Solver Using Hybrid Meshes (혼합격자를 이용한 2차원 난류 유동장 해석 프로그램의 병렬화)

  • Ok Honam;Park Seung-O
    • 한국전산유체공학회:학술대회논문집
    • /
    • 1999.11a
    • /
    • pp.115-126
    • /
    • 1999
  • A two-dimensional Navier-Stokes solver using hybrid meshes is parallelized with a domain decompostion method. The focus of this paper is placed on minimizing the amount of effort in parallelizing the serial version of the solver, and this is achieved by adding an additional layer of cells to each decomposed domain. Most subroutines of the serial solver are used without modification, and the information exchange between neighboring domains is achieved using MPI(Message Passing Interface) library. Load balancing among the processors and scheduling of the message passing are implemented to reduce the overhead of parallelization, and the speed-up achieved by parallelization is measured on the transonic invisicd and turbulent flow problems. The parallelization efficiencies of the explicit Runge-Kutta scheme and the implicit point-SGS scheme are compared and the effects of various factors on the results are also studied.

  • PDF

A Parallel Algorithm of Davidson Method for Eigenproblems (고유치 솔버 Davidson Method 의 병렬화)

  • Kim, Hyoung-Joong;Zhu, Yu
    • Proceedings of the KIEE Conference
    • /
    • 1997.07a
    • /
    • pp.12-14
    • /
    • 1997
  • The analysis of eigenvalue and eigenvector is a crucial procedure for many electromagnetic computation problems. However, eigenpair computation is timing-consuming task. Thus, its parallelization is required for designing large-scale and precision three-dimensional electromagnetic machines. In this paper, the Davidson method is parallelized on a cluster of workstations. Performance of the parallelization scheme is reported. This scheme is applied to a ridged waveguide design problem.

  • PDF

Parallel processing in structural reliability

  • Pellissetti, M.F.
    • Structural Engineering and Mechanics
    • /
    • v.32 no.1
    • /
    • pp.95-126
    • /
    • 2009
  • The present contribution addresses the parallelization of advanced simulation methods for structural reliability analysis, which have recently been developed for large-scale structures with a high number of uncertain parameters. In particular, the Line Sampling method and the Subset Simulation method are considered. The proposed parallel algorithms exploit the parallelism associated with the possibility to simultaneously perform independent FE analyses. For the Line Sampling method a parallelization scheme is proposed both for the actual sampling process, and for the statistical gradient estimation method used to identify the so-called important direction of the Line Sampling scheme. Two parallelization strategies are investigated for the Subset Simulation method: the first one consists in the embarrassingly parallel advancement of distinct Markov chains; in this case the speedup is bounded by the number of chains advanced simultaneously. The second parallel Subset Simulation algorithm utilizes the concept of speculative computing. Speedup measurements in context with the FE model of a multistory building (24,000 DOFs) show the reduction of the wall-clock time to a very viable amount (<10 minutes for Line Sampling and ${\approx}$ 1 hour for Subset Simulation). The measurements, conducted on clusters of multi-core nodes, also indicate a strong sensitivity of the parallel performance to the load level of the nodes, in terms of the number of simultaneously used cores. This performance degradation is related to memory bottlenecks during the modal analysis required during each FE analysis.

Parallelization of Multifrontal Solution Method for Shared Memory Architecture (다중프론트 해법의 공유메모리 병렬화)

  • Kim, Min Ki;Kim, Jeong Ho;Park, Chan Yik;Kim, Seung Jo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.40 no.11
    • /
    • pp.972-978
    • /
    • 2012
  • This paper discusses the parallelization of multifrontal solution method, widely used for finite element structural analyses, for a shared memory architecture. Multifrontal method is easier than other linear solution methods because the solution procedure implies that unknowns can be eliminated simultaneously. Two innovative ideas are introduced to achieve optimal solver performance on a shared memory computer. Those are pairing two frontal matrices and splitting the frontal matrix in order to reduce the temporal memory space required by independent computing tasks. Performance comparisons between original algorithm and proposed one prove that proposed method is more computationally efficient on current multicore machines.

MPI-OpenMP Hybrid Parallelization for Multibody Peridynamic Simulations (다물체 페리다이나믹 해석을 위한 MPI-OpenMP 혼합 병렬화)

  • Lee, Seungwoo;Ha, Youn Doh
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.33 no.3
    • /
    • pp.171-178
    • /
    • 2020
  • In this study, we develop MPI-OpenMP hybrid parallelization for multibody peridynamic simulations. Peridynamics is suitable for analyzing complicated dynamic fractures and various discontinuities. However, compared with a conventional finite element method, nonlocal interactions in peridynamics cost more time and memory. In multibody peridynamic analysis, the costs increase due to the additional interactions that occur when computing the nonlocal contact and ghost interlayer models between adjacent bodies. The costs become excessive when further refinement and smaller time steps are required in cases of high-velocity impact fracturing or similar instances. Thus, high computational efficiency and performance can be achieved by parallelization and optimization of multibody peridynamic simulations. The analytical code is developed using an Intel Fortran MPI compiler and OpenMP in NURION of the KISTI HPC center and parallelized through MPI-OpenMP hybrid parallelization. Further parallelization is conducted by hybridizing with OpenMP threads in each MPI process. We also try to minimize communication operations by model-based decomposition of MPI processes. The numerical results for the impact fracturing of multiple bodies show that the computing performance improves significantly with MPI-OpenMP hybrid parallelization.