• Title/Summary/Keyword: Parallelization method

Search Result 91, Processing Time 0.024 seconds

Parallel Topology Optimization on Distributed Memory System (분산 메모리 시스템에서의 병렬 위상 최적설계)

  • Lee Ki-Myung;Cho Seon-Ho
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2006.04a
    • /
    • pp.291-298
    • /
    • 2006
  • A parallelized topology design optimization method is developed on a distributed memory system. The parallelization is based on a domain decomposition method and a boundary communication scheme. For the finite element analysis of structural responses and design sensitivities, the PCG method based on a Krylov iterative scheme is employed. Also a parallelized optimization method of optimality criteria is used to solve large-scale topology optimization problems. Through several numerical examples, the developed method shows efficient and acceptable topology optimization results for the large-scale problems.

  • PDF

Parallelization of 3-dimensional Multigrid DADI Method (3차원 다중격자 DADI 방법의 병렬처리)

  • Seong Chun-Ho;Park Su-Hyeong;Gwon Jang-Hyeok
    • 한국전산유체공학회:학술대회논문집
    • /
    • 1998.05a
    • /
    • pp.49-54
    • /
    • 1998
  • 3-dimensional Euler solver is parallelized. The spatial discretization method is the 2nd order TVD scheme and DADI method with multigrid is used as a time integration. In order to parallelize this solver, the domain decomposition method with overlapped grid and message passing techniques are used. The informations on the each inter-processor bound-aries are communicated with MPI library. Finally, the parallel performance repsented by calculating the ONERA M6 wing at transonic flow condition using CRAY T3E and C90.

  • PDF

Parallelization of Recursive Functions for Recursive Data Structures (재귀적 자료구조에 대한 재귀 함수의 병렬화)

  • An, Jun-Seon;Han, Tae-Suk
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.12
    • /
    • pp.1542-1552
    • /
    • 1999
  • 자료 병렬성이란 자료 집합의 원소들에 대하여 동일한 작업을 동시에 수행하므로써 얻어지는 병렬성을 말한다. 함수형 언어에서 자료 집합에 대한 반복 수행은 재귀적 자료형에 대한 재귀 함수에 의하여 표현된다. 본 논문에서는 이러한 재귀 함수를 자료 병렬 프로그램으로 변환하기 위한 병렬화 방법을 제시한다. 생성되는 병렬 프로그램의 병렬 수행 구조로는 일반적인 형태의 재귀적 자료형에 대하여 정의되는 다형적인 자료 병렬 연산을 사용하여 트리, 리스트 등과 같은 일반적인 재귀적 자료 집합에 대한 자료 병렬 수행이 가능하도록 하였다. 재귀 함수의 병렬화를 위해서는, 함수를 이루는 각각의 계산들의 병렬성을 재귀 호출에 의해 존재하는 의존성에 기반하여 분류하고, 이에 기반하여 각각의 계산들에 대한 적절한 자료 병렬 연산을 사용하는 병렬 프로그램을 생성하였다.Abstract Data parallelism is obtained by applying the same operations to each element of a data collection. In functional languages, iterative computations on data collections are expressed by recursions on recursive data structures. We propose a parallelization method for data-parallel implementation of such recursive functions. We employ polytypic data-parallel primitives to represent the parallel execution structure of the object programs, which enables data parallel execution with general recursive data structures, such as trees and lists. To transform sequential programs to their parallelized versions, we propose a method to classify the types of parallelism in subexpressions, based on the dependencies of the recursive calls, and generate the data-parallel programs using data-parallel primitives appropriately.

The Procedure Transformation using Data Dependency Elimination Methods (자료 종속성 제거 방법을 이용한 프로시저 변환)

  • Jang, Yu-Suk;Park, Du-Sun
    • The KIPS Transactions:PartA
    • /
    • v.9A no.1
    • /
    • pp.37-44
    • /
    • 2002
  • Most researches of transforming sequential programs into parallel programs have been based on the loop structure transformation method. However, most programs have implicit interprocedure parallelism. This paper suggests a way of extracting parallelism from the loops with procedure calls using the data dependency elimination method. Most parallelization of the loop with procedure calls have been conducted for extracting parallelism from the uniform code. In this paper, we propose interprocedural transformation, which can be apply to both uniform and nonuniform code. We show the examples of uniform, nonuniform, and complex code parallelization. We then evaluated the performance of the various transformation methods using the CRAY-T3E system. The comparison results show that the proposed algorithm out-performs other conventional methods.

A Loop Transformation for Parallelism from Single Loops

  • Jeong, Sam-Jin
    • International Journal of Contents
    • /
    • v.2 no.4
    • /
    • pp.8-11
    • /
    • 2006
  • This paper describes several loop partitioning techniques such as loop splitting method by thresholds and Polychronopoulos' loop splitting method for exploiting parallelism from single loop which already developed. We propose improved loop splitting method for maximizing parallelism of single loops with non-constant dependence distances. By using the distance for the source of the first dependence, and by our defined theorems, we present generalized and optimal algorithms for single loops with non-uniform dependences. The algorithms generalize how to transform general single loops into parallel loops.

  • PDF

PARALLEL OPTIMAL CONTROL WITH MULTIPLE SHOOTING, CONSTRAINTS AGGREGATION AND ADJOINT METHODS

  • Jeon, Moon-Gu
    • Journal of applied mathematics & informatics
    • /
    • v.19 no.1_2
    • /
    • pp.215-229
    • /
    • 2005
  • In this paper, constraint aggregation is combined with the adjoint and multiple shooting strategies for optimal control of differential algebraic equations (DAE) systems. The approach retains the inherent parallelism of the conventional multiple shooting method, while also being much more efficient for large scale problems. Constraint aggregation is employed to reduce the number of nonlinear continuity constraints in each multiple shooting interval, and its derivatives are computed by the adjoint DAE solver DASPKADJOINT together with ADIFOR and TAMC, the automatic differentiation software for forward and reverse mode, respectively. Numerical experiments demonstrate the effectiveness of the approach.

DEX2C: Translation of Dalvik Bytecodes into C Code and its Interface in a Dalvik VM

  • Kim, Minseong;Han, Youngsun;Cho, Myeongjin;Park, Chanhyun;Kim, Seon Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.3
    • /
    • pp.169-172
    • /
    • 2015
  • Dalvik is a virtual machine (VM) that is designed to run Java-based Android applications. A trace-based just-in-time (JIT) compilation technique is currently employed to improve performance of the Dalvik VM. However, due to runtime compilation overhead, the trace-based JIT compiler provides only a few simple optimizations. Moreover, because each trace contains only a few instructions, the trace-based JIT compiler inherently exploits fewer optimization and parallelization opportunities than a method-based JIT compiler that compiles method-by-method. So we propose a new method-based JIT compiler, named DEX2C, in order to improve performance by finding more opportunities for both optimization and parallelization in Android applications. We employ C code as an intermediate product in order to find more optimization opportunities by using the GNU C Compiler (GCC), and we will detect parallelism by using the Intel C/C++ parallel compiler and the AESOP compiler in our future work. In this paper, we introduce our DEX2C compiler, which dynamically translates Dalvik bytecodes (DEX) into C code with method granularity. We also describe a new method-based JIT interface in the Dalvik VM for the DEX2C compiler. Our experiment results show that our compiler and its interface achieve significant performance improvement by up to 15.2 times and 3.7 times on average, in Element Benchmark, and up to 2.8 times for FFT in Smartbench.

Improvement and verification of the DeCART code for HTGR core physics analysis

  • Cho, Jin Young;Han, Tae Young;Park, Ho Jin;Hong, Ser Gi;Lee, Hyun Chul
    • Nuclear Engineering and Technology
    • /
    • v.51 no.1
    • /
    • pp.13-30
    • /
    • 2019
  • This paper presents the recent improvements in the DeCART code for HTGR analysis. A new 190-group DeCART cross-section library based on ENDF/B-VII.0 was generated using the KAERI library processing system for HTGR. Two methods for the eigen-mode adjoint flux calculation were implemented. An azimuthal angle discretization method based on the Gaussian quadrature was implemented to reduce the error from the azimuthal angle discretization. A two-level parallelization using MPI and OpenMP was adopted for massive parallel computations. A quadratic depletion solver was implemented to reduce the error involved in the Gd depletion. A module to generate equivalent group constants was implemented for the nodal codes. The capabilities of the DeCART code were improved for geometry handling including an approximate treatment of a cylindrical outer boundary, an explicit border model, the R-G-B checker-board model, and a super-cell model for a hexagonal geometry. The newly improved and implemented functionalities were verified against various numerical benchmarks such as OECD/MHTGR-350 benchmark phase III problems, two-dimensional high temperature gas cooled reactor benchmark problems derived from the MHTGR-350 reference design, and numerical benchmark problems based on the compact nuclear power source experiment by comparing the DeCART solutions with the Monte-Carlo reference solutions obtained using the McCARD code.

A SYNCRO-PARALLEL NONSMOOTH PGD ALGORITHM FOR NONSMOOTH OPTIMIZATION

  • Feng, Shan;Pang, Li-Ping
    • Journal of applied mathematics & informatics
    • /
    • v.24 no.1_2
    • /
    • pp.333-342
    • /
    • 2007
  • A nonsmooth PGD scheme for minimizing a nonsmooth convex function is presented. In the parallelization step of the algorithm, a method due to Pang, Han and Pangaraj (1991), [7], is employed to solve a subproblem for constructing search directions. The convergence analysis is given as well.

An Efficient Parallelization Implementation of PU-level ME for Fast HEVC Encoding (고속 HEVC 부호화를 위한 효율적인 PU레벨 움직임예측 병렬화 구현)

  • Park, Soobin;Choi, Kiho;Park, Sang-Hyo;Jang, Euee Seon
    • Journal of Broadcast Engineering
    • /
    • v.18 no.2
    • /
    • pp.178-184
    • /
    • 2013
  • In this paper, we propose an efficient parallelization technique of PU-level motion estimation (ME) in the next generation video coding standard, high efficiency video coding (HEVC) to reduce the time complexity of video encoding. It is difficult to encode video in real-time because ME has significant complexity (i.e., 80 percent at the encoder). In order to solve this problem, various techniques have been studied, and among them is the parallelization, which is carefully concerned in algorithm-level ME design. In this regard, merge estimation method using merge estimation region (MER) that enables ME to be designed in parallel has been proposed; but, parallel ME based on MER has still unconsidered problems to be implemented ideally in HEVC test model (HM). Therefore, we propose two strategies to implement stable parallel ME using MER in HM. Through experimental results, the excellence of our proposed methods is shown; the encoding time using the proposed method is reduced by 25.64 percent on average of that of HM which uses sequential ME.