• Title/Summary/Keyword: Distributed Parallel Programming

Search Result 36, Processing Time 0.022 seconds

PARALLEL IMPROVEMENT IN STRUCTURED CHIMERA GRID ASSEMBLY FOR PC CLUSTER (PC 클러스터를 위한 정렬 중첩 격자의 병렬처리)

  • Kim, Eu-Gene;Kwon, Jang-Hyuk
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.157-162
    • /
    • 2005
  • Parallel implementation and performance assessment of the grid assembly in a structured chimera grid approach is studied. The grid assembly process, involving hole cutting and searching donor, is parallelized on the PC cluster. A message passing programming model based on the MPI library is implemented using the single program multiple data(SPMD) paradigm. The coarse-grained communication is optimized with the minimized memory allocation because that the parallel grid assembly can access the decomposed geometry data in other processors by only message passing in the distributed memory system such as a PC cluster. The grid assembly workload is based on the static load balancing tied to flow solver. A goal of this work is a development of parallelized grid assembly that is suited for handling multiple moving body problems with large grid size.

  • PDF

Comparison of Message Passing Interface and Hybrid Programming Models to Solve Pressure Equation in Distributed Memory System (분산 메모리 시스템에서 압력방정식의 해법을 위한 MPI와 Hybrid 병렬 기법의 비교)

  • Jeon, Byoung Jin;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.39 no.2
    • /
    • pp.191-197
    • /
    • 2015
  • The message passing interface (MPI) and hybrid programming models for the parallel computation of a pressure equation were compared in a distributed memory system. Both models were based on domain decomposition, and two numbers of the sub-domain were selected by considering the efficiency of the hybrid model. The parallel performances for various problem sizes were measured using up to 96 threads. It was found that in addition to the cache-memory size, the overhead of the MPI communication/OpenMP directives affected the parallel performance. For small problems, the parallel performance was low because the percentage of the overhead of the MPI communication/OpenMP directives increased as the number of threads increased, and MPI was better than the hybrid model because it had a smaller communication overhead. For large problems, the parallel performance was high because, in addition to the cache effect, the percentage of the communication overhead was relatively low compared to that for small problems, and the hybrid model was better than MPI because the communication overhead of MPI was more dominant than that of the OpenMP directives in the hybrid model.

Improving Performance of Large Sparse Linear System Solvers On Distributed Memory Systems By Asynchronous Algorithms (비동기 알고리즘을 이용한 분산 메모리 시스템에서의 초대형 선형 시스템 해법의 성능 향상)

  • Park, Pil-Seong;Sin, Sun-Cheol
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.439-446
    • /
    • 2001
  • The main stream of parallel programming today is using synchronous algorithms, where processor synchronization for correct computation and workload balance are essential. Overall performance of the whole system is dependent upon the performance of the slowest processor, if workload is not well-balanced or heterogeneous clusters are used. Asynchronous iteration is a way to mitigate such problems, but most of the works done so far are for shared memory systems. In this paper, we suggest and implement a parallel large sparse linear system solver that improves performance on distributed memory systems like clusters by reducing processor idle times as much as possible by asynchronous iterations.

  • PDF

A Synchronous/Asynchronous Hybrid Parallel Power Iteration for Large Eigenvalue Problems by the MPMD Methodology (MPMD 방식의 동기/비동기 병렬 혼합 멱승법에 의한 거대 고유치 문제의 해법)

  • Park, Pil-Seong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.1
    • /
    • pp.67-74
    • /
    • 2004
  • Most of today's parallel numerical schemes use synchronous algorithms, where some processors that have finished their tasks earlier than others must wait at synchronization points for correct computation. Hence overall performance of the system is dependent upon the speed of the slowest processor. In this paper, we det·ise a synchronous/asynchronous hybrid algorithm to accelerate convergence of the solution for finding the dominant eigenpair of a large matrix, by reducing the idle times of faster processors using MPMD programming methodology.

Indivisible load scheduling applied to Linear Programming (선형계획법을 적용한 임의 분할 불가능한 부하 분배계획)

  • Son, Kyung-Ho;Lee, Dal-Ho;Kim, Hyoung-Joog
    • 한국정보통신설비학회:학술대회논문집
    • /
    • 2005.08a
    • /
    • pp.382-387
    • /
    • 2005
  • There are many studies on arbitrarily divisible load scheduling problem in a distributed computing network consisting of processors interconnected through communication links. It is not efficient to arbitrarily distribute the load that comes into the system. In this paper, how to schedule in case that arbitrarily indivisible load comes into the system is studied. Also, the cases of the divisible load mixed with the indivisible load that come into network were dealt with optimal load distribution in parallel processing system by scheduling applied to linear programming.

  • PDF

Molecular Docking System using Parallel GPU (병렬 GPU를 이용한 분자 도킹 시스템)

  • Park, Sung-Jun
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.12
    • /
    • pp.441-448
    • /
    • 2008
  • The molecular docking system needs a large amount of computation and requires super-computing power. Since the experiment requires a large amount of time, the experiment is conducted in the distributed environment or in the grid environment. Recently, researches on using parallel GPU of far higher performance than that of CPU in scientific computing have been very actively conducted. CUDA is an open technique by which a parallel GPU programming is made possible. This study proposes the molecular docking system using CUDA. It also proposes algorithm that parallels energy-minimizing-computation. To verify such experiments, this study conducted a comparative analysis on the time required for experimenting molecular docking in general CPU and the time and performance of the parallel GPU-based molecular docking which is proposed in this study.

Thread-Level Parallelism using Java Thread and Network Resources (자바 스레드와 네트워크 자원을 이용한 병렬처리)

  • Kim, Tae-Yong
    • Journal of Advanced Navigation Technology
    • /
    • v.14 no.6
    • /
    • pp.984-989
    • /
    • 2010
  • In this paper, parallel programming technique by using Java Thread is introduced so as to develop parallel design tool to analyze the small micro flow sensor. To estimate computing time for Thread-level parallelism, the performances of two experimental models for potential problem subject to Thermal transfer equation are examined. As a result, if the number of network PC is increase, computing time for parallelism on network environment is enhanced to be almost n times. The micro sensor design tool based on distributed computing can be utilized to analyze a large scale problem.

Implementation of a Wi-Fi Based Cluster System using Raspberry Pi for Multidisciplinary Education

  • Koo, Geum-Seo;Sim, Gab-Sig
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.1
    • /
    • pp.1-7
    • /
    • 2019
  • In this paper, we implemented a Wi-Fi based cluster system using raspberry pi for multidisciplinary education. The cluster implementation on the desktop was more difficult to maintain the complexity, big size, high price, power consumption as the number of nodes increased. In this paper, we implemented a cluster using Raspberry Pi, which is developed for educational purposes, to reduce the cost of connecting nodes. In addition, the complexity of system construction is reduced by replacing the connection between each node with Wi-Fi. Also, the inconvenience of configuration due to node increase was reduced. It is expected that the implementation of the cluster will be a good alternative in the educational environment where distributed processing and parallel processing are performed in the embedded environment. Also, it is confirmed that it can be applied to the multidisciplinary education.

Implementation of Parallel Local Alignment Method for DNA Sequence using Apache Spark (Apache Spark을 이용한 병렬 DNA 시퀀스 지역 정렬 기법 구현)

  • Kim, Bosung;Kim, Jinsu;Choi, Dojin;Kim, Sangsoo;Song, Seokil
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.10
    • /
    • pp.608-616
    • /
    • 2016
  • The Smith-Watrman (SW) algorithm is a local alignment algorithm which is one of important operations in DNA sequence analysis. The SW algorithm finds the optimal local alignment with respect to the scoring system being used, but it has a problem to demand long execution time. To solve the problem of SW, some methods to perform SW in distributed and parallel manner have been proposed. The ADAM which is a distributed and parallel processing framework for DNA sequence has parallel SW. However, the parallel SW of the ADAM does not consider that the SW is a dynamic programming method, so the parallel SW of the ADAM has the limit of its performance. In this paper, we propose a method to enhance the parallel SW of ADAM. The proposed parallel SW (PSW) is performed in two phases. In the first phase, the PSW splits a DNA sequence into the number of partitions and assigns them to multiple nodes. Then, the original Smith-Waterman algorithm is performed in parallel at each node. In the second phase, the PSW estimates the portion of data sequence that should be recalculated, and the recalculation is performed on the portions in parallel at each node. In the experiment, we compare the proposed PSW to the parallel SW of the ADAM to show the superiority of the PSW.

Causal Replay for Cyclic Debugging of MPI Parallel Programs (MPI 병렬 프로그램의 순환 디버깅을 위한 인과관계 재실행)

  • Hong, Cheol-Eui;Kim, Yeong-Joon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.9
    • /
    • pp.424-433
    • /
    • 2001
  • The cyclic debugging approach often fails for message passing parallel programs because they non-deterministic characteristics due to message race conditions. This paper identifies the MPI events that affect non-deterministic executions, and then converts the concurrent execution to the sequential one that is controlled in order to make it equivalent to a reference execution by keeping their orders of events in two executions identical. This paper also presents an efficient algorithm for the causal distributed breakpoint which is initiated by any sequential breakpoint in one process, and restores each process to the earliest state that reflects all events that happened causally before the sequential breakpoint. So a cyclic debugging approach can be used in debugging MPI parallel programs as like as in debugging sequential programming environments.

  • PDF