• Title/Summary/Keyword: Shared-memory parallel programs

Search Result 22, Processing Time 0.043 seconds

An Improving Method of Restructuring Parallel Programs for Data Race Detection

  • Ha, Keum-Sook;Lee, Sung woo;Yoo, Kee-Young
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.715-718
    • /
    • 2000
  • Although shared memory parallel programs are designed to be deterministic both in their final results and intermediate states, the races that occur when different processes access a common memory location in an order not guaranteed by synchronization could result in unintended non-deterministic executions of the program. So, Detecting races, particularly first data races, is important for debugging explicit shared memory parallel programs. It is possible that all data races reported by other on-the-fly algorithms would disappear once the first races were removed. To detect races parallel programs with nested loops and inter-thread coordination, it must guarantee the order of synchronization operations in an execution instance. In this paper, we propose an improved restructuring method that guarantee ordering execution instance and preserve the semantics of original program. This method requires O(np) time and (s + up) space, where n is the number of total operations, s is the number of synchronization operations and p is the number of parallelism in the execution. Also, this method makes on-the-fly detection of parallel program with nested loops and inter-thread coordination more easily in space and time complexity.

  • PDF

On-the -fly Detection of the First Races for Shared-Memory Parallel Programs with Ordered Synchronization (순서적 동기화를 포함하는 공유 메모리 병렬프로그램에서의 수행중 최초경합 탐지 기법)

  • Park, Hui-Dong;Jeon, Yong-Gi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.8
    • /
    • pp.884-894
    • /
    • 1999
  • 순서적 동기화 및 내포 병렬성을 포함하는 공유메모리 병렬 프로그램에서의 경합(race)은 프로그램 수행에서 원하지 않은 비결정성(nondeterminism)을 야기할 수 있기 때문에 반드시 탐지되어져야 한다. 특히 프로그램 수행에서 최초경합(first race)을 탐지하는 것은 중요한데, 그 이유는 이 경합을 제거하면 다른 경합이 나타나지 않을 수도 있기 때문이다. 본 논문에서는 결정적 공유메모리 병렬프로그램을 위한 2단계 수행중 (two-pass on-the-fly) 최초경합 탐지 기법을 제시하며, 이것은 공유메모리 병렬 프로그램의 특정 수행에서 "최초로 발생되는" 경합들을 탐지하는 기법이다. 그리고 HPF 컴파일러를 이용하여 본 탐지 프로토콜을 공인된 벤치마크 프로그램에 적용하여, 병렬 프로그램 디버깅 시 고려하여야 할 파라미터들에 대한 실험으로부터 본 기법의 효율성을 보였다.Abstract Detecting races is important in debugging shared-memory parallel programs which have ordered synchronization and nested parallelism, because the races result in unintended non- deterministic executions of the programs. The first races are important in debugging, because the removal of such races may make other races disappear. It is even possible that all races reported would disappear once the first races are removed. This paper presents a new two-pass on-the-fly algorithm to detect the first races in such parallel programs. The algorithm reported in this paper is an on-the-fly algorithm that detects the races that "occur first" in a particular execution of shared-memory parallel programs. The experiment has accomplished, where two certified benchmark programs which can be executed under High Performance Fortran environments to get some parameters which improve debugging performance with our algorithm. with our algorithm.

Loop Splitting for On-the-fly Race Detection of Sharded-memory Parallel Programs (공유 메모리 병렬 프로그램의 수행중 오류 탐지를 위한 루프 분리)

  • Song, Tae-Seob
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.3
    • /
    • pp.391-398
    • /
    • 2012
  • Detecting races is important for debugging shared-memory parallel programs, because the races result in unintended non-deterministic executions of the programs. Previous on-the-fly techniques to detect races in parallel programs with general inter-thread coordination show serious space overhead which depends on the maximum parallelism of the program. Therefore, this paper presents a loop splitting technique for on-the-fly race detection of parallel programs which is more efficient in space complexity than previous techniques. This loop splitting technique is the debugged program which preserves semantics of the original program. Monitering loop splitting program in on-the-fly can detect first races.

A Study on Efficient Executions of MPI Parallel Programs in Memory-Centric Computer Architecture

  • Lee, Je-Man;Lee, Seung-Chul;Shin, Dongha
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.1
    • /
    • pp.1-11
    • /
    • 2020
  • In this paper, we present a technique that executes MPI parallel programs, that are developed on processor-centric computer architecture, more efficiently on memory-centric computer architecture without program modification. The technique we present here improves performance by replacing low-speed data communication over the network of MPI library functions with high-speed data communication using the property called fast large shared memory of memory-centric computer architecture. The technique we present in the paper is implemented in two programs. The first program is a modified MPI library called MC-MPI-LIB that runs MPI parallel programs more efficiently on memory-centric computer architecture preserving the semantics of MPI library functions. The second program is a simulation program called MC-MPI-SIM that simulates the performance of memory-centric computer architecture on processor-centric computer architecture. We developed and tested the programs on distributed systems environment deployed on Docker based virtualization. We analyzed the performance of several MPI parallel programs and showed that we achieved better performance on memory-centric computer architecture. Especially we could see very high performance on the MPI parallel programs with high communication overhead.

Performance Comparison of Synchronization Methods for CC-NUMA Systems (CC-NUMA 시스템에서의 동기화 기법에 대한 성능 비교)

  • Moon, Eui-Sun;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.394-400
    • /
    • 2000
  • The main goal of synchronization is to guarantee exclusive access to shared data and critical sections, and then it makes parallel programs work correctly and reliably. Exclusive access restricts parallelism of parallel programs, therefor efficient synchronization is essential to achieve high performance in shared-memory parallel programs. Many techniques are devised for efficient synchronization, which utilize features of systems and applications. This paper shows the simulation results that existing synchronization methods have inefficiency under CC-NUMA(Cache Coherent Non-Uniform Memory Access) system, and then compares the performance of Freeze&Melt synchronization that can remove the inefficiency. The simulation results present that Test-and-Test&Set synchronization has inefficiency caused by broadcast operation and the pre-defined order of Queue-On-Lock-Bit (QOLB) synchronization to execute a critical section causes inefficiency. Freeze&Melt synchronization, which removes these inefficiencies, has performance gain by decreasing the waiting time to execute a critical section and the execution time of a critical section, and by reducing the traffic between clusters.

  • PDF

A Preprocessor for Detecting Potential Races in Shared Memory Parallel Programs with Internal Nondeterminism (내부적 비결정성을 가진 공유 메모리 병렬 프로그램에서 잠재적 경합탐지를 위한 전처리기)

  • Kim, Young-Joo;Jung, Min-Sub;Jun, Yong-Kee
    • The KIPS Transactions:PartA
    • /
    • v.17A no.1
    • /
    • pp.9-18
    • /
    • 2010
  • Races that occur in shared-memory parallel programs such as OpenMP programs must be detected for debugging because of causing unintended non-deterministic results. Previous works which verify the existence of these races on-the-fly are limited to the programs without internal non-determinism. But in the programs with internal non-determinism, such works need at least N! execution instances for each critical section to verify the existence of races, where N is the degree of maximum parallelism. This paper presents a preprocessor that statically analyzes the locations of non-deterministic accesses using program slicing and can detect apparent races as well as potential races through single execution using the analyzed information. The suggested tool can deterministically monitor non-deterministic accesses to occur in OpenMP programs so that this tool can verify the existence of races even if it is used any race detection protocol which can apply to programs with critical section. To prove empirically this tool, we have experimented using a set of benchmark programs such as synthetic programs that involve non-deterministic accesses, OpenMP Microbenchmark, NAS Parallel Benchmark, and OpenMP application programs.

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs (GP-GPU의 캐시메모리를 활용하기 위한 병렬 블록 LU 분해 프로그램의 구현)

  • Kim, Youngtae;Kim, Doo-Han;Yu, Myoung-Han
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.41-47
    • /
    • 2013
  • GP-GPUs are general purposed GPUs for numerical computation based on multiple threads which are originally for graphic processing. GP-GPUs provide cache memory in a form of shared memory which user programs can access directly, unlikely typical cache memory. In this research, we implemented the parallel block LU decomposition program to utilize cache memory in GP-GPUs. The parallel blocked LU decomposition program designed with Nvidia CUDA C run 7~8 times faster than nun-blocked LU decomposition program in the same GP-GPU computation environment.

Analyzing Access Histories for Detecting First Races in Shared-memory Programs (공유메모리 프로그램의 최초경합 탐지를 위한 접근역사 분석)

  • 강문혜;김영주;전용기
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.1_2
    • /
    • pp.41-50
    • /
    • 2004
  • Detecting races is important for debugging shared-memory Parallel programs, because races result in unintended nondeterministic executions of the programs. Particularly, the first races to occur in an execution of a program must be detected because they can potentially affect other races that occur later. Previous on-the-fly techniques that detect such first races based on candidate events that are likely to participate in the first races monitor access events in order to collect the candidate events during a program execution, and try to report the races only from determining the concurrency relationships of the candidates. Such races reported in this way. however, are not guaranteed to be first races, because they are not determined by taking into account how they are affected with each other. This paper presents a new post-mortem technique that analyzes, on each nesting level, candidate events collected from an execution of a shared-memory program with nested parallelism in order to report only first races. This technique is efficient, because it guarantees that first races reported by analyzing a nesting level are the races that occur first at the level, and does not require more analyses to the higher nesting levels than the current level. The Proposed technique facilitates more practical and effective debugging than the previous techniques, because it guarantees to detect first races if candidate events are collected from an execution instance of the program with nested parallelism.

A Post-mortem Detection Tool of First Races to Occur in Shared-Memory Programs with Nested Parallelism (내포병렬성을 가진 공유메모리 프로그램에서 최초경합의 수행후 탐지도구)

  • Kang, Mun-Hye;Sim, Gab-Sig
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.17-24
    • /
    • 2014
  • Detecting data races is important for debugging shared-memory programs with nested parallelism, because races result in unintended non-deterministic executions of the program. It is especially important to detect the first occurred data races for effective debugging, because the removal of such races may make other affected races disappear or appear. Previous dynamic detection tools for first race detecting can not guarantee that detected races are unaffected races. Also, the tools does not consider the nesting levels or need support of other techniques. This paper suggests a post-mortem tool which collects candidate accesses during program execution and then detects the first races to occur on the program after execution. This technique is efficient, because it guarantees that first races reported by analyzing a nesting level are the races that occur first at the level, and does not require more analyses to the higher nesting levels than the current level.

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.