• Title/Summary/Keyword: message-passing program

Search Result 28, Processing Time 0.022 seconds

A Java-based Performance Monitor for Networked Computer (네트워크 컴퓨터를 위한 자바 기반의 성능감시기)

  • Kim, Bong-Jun;Kim, Dong-Ho;Hwang, Seog-Chan;Kim, Myung-Ho;Choi, Jae-Young
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.2
    • /
    • pp.160-168
    • /
    • 2000
  • In this paper, we present a performance monitor to trace and evaluate the performance of programs running on networked computers. The performance monitor of the JaNeC is online/batch as well as event/time driven. Since it is implemented with the Java programming language, it provides us with high portability among heterogeneous computer systems, and friendly graphical user interface. This performance monitor consists of various views such as 'Task/Event Filter' and 'TimeLine', 'Task View', 'Task Hoistory', 'Message Passing View', 'Host Cpu View', which allow the user to easily analyze event and time during the program execution.

  • PDF

Implementation and Performance Evaluation of Parallel Programming Translator for High Performance Fortran (High Performance Fortran 병렬 프로그래밍 변환기의 구현 및 성능 평가)

  • Kim, Jung-Gwon;Hong, Man-Pyo;Kim, Dong-Gyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.4
    • /
    • pp.901-915
    • /
    • 1999
  • Parallel computers are known to be excellent in performance per cost also satisfying scalability and high performance. However parallel machines have enjoyed limited success because of difficulty in parallel programming and non-portability between parallel machines. Recently, researchers have sought to develop data parallel language that provides machine independent programming systems. Data parallel language such as High Performance Fortran provides a basis to write a parallel program based on a global name space by partitioning data and computation, generating message-passing function. In this paper, we describe the Parallel Programming Translator(PPTran), source-to-source data parallel compiler, generating MPI SPMD parallel program from HPF input program through four phases such as data dependence analysis, partitioning data, partitioning computation, and code generation with explicit message-passing and verify the performance of PPTran

  • PDF

A Design of An Optimizer For Conversion of Parallel Constructs of Data Parallel Language Programs (자료 병렬 언어 프로그램의 병렬 구조 변환을 위한 최적화기 설계)

  • Gu, Mi-Sun;Park, Myeong-Sun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.3
    • /
    • pp.792-803
    • /
    • 1999
  • Most data parallel language compilers are source-to-source translators. Most Compilers of HPF which is recognized as a standard data parallel language convert a parallel program in PHF in a Fortran 77 program inserted message passing primitives. By the way, they currently generate significant amount of ineffective codes in the course of the conversion. Especially, FORALL construct is converted into several DO loops, so loop overhead of these codes is very increased. In this paper, we define and use relation distance vector to keep necessary informations. Then we evaluate and analyze execution time for the codes converted by our method and by PARADIGM method for various array sizes.

  • PDF

Implementation Of Asymmetric Communication For Asynchronous Iteration By the MPMD Method On Distributed Memory Systems (분산 메모리 시스템에서의 MPMD 방식의 비동기 반복 알고리즘을 위한 비대칭 전송의 구현)

  • Park Pil-Seong
    • Journal of Internet Computing and Services
    • /
    • v.4 no.5
    • /
    • pp.51-60
    • /
    • 2003
  • Asynchronous iteration is a way to reduce performance degradation of some parallel algorithms due to load imbalance or transmission delay between computing nodes, which requires asymmetric communication between the nodes of different speeds. To implement such asynchronous communication on distributed memory systems, we suggest an MPMD method that creates an additional separate server process on each computing node, and compare it with an SPMD method that creates a single process per node.

  • PDF

Efficient Parallel CUDA Random Number Generator on NVIDIA GPUs (NVIDIA GPU 상에서의 난수 생성을 위한 CUDA 병렬프로그램)

  • Kim, Youngtae;Hwang, Gyuhyeon
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1467-1473
    • /
    • 2015
  • In this paper, we implemented a parallel random number generation program on GPU's, which are known for high performance computing, using LCG (Linear Congruential Generator). Random numbers are important in all fields requiring the use of randomness, and LCG is one of the most widely used methods for the generation of pseudo-random numbers. We explained the parallel program using the NVIDIA CUDA model and MPI(Message Passing Interface) and showed uniform distribution and performance results. We also used a Monte Carlo algorithm to calculate pi(${\pi}$) comparing the parallel random number generator with cuRAND, which is a CUDA library function, and showed that our program is much more efficient. Finally we compared performance results using multi-GPU's with those of ideal speedups.

Numerical Simulation of Natural Convection in Annuli with Internal Fins

  • Ha, Man-Yeong;Kim, Joo-Goo
    • Journal of Mechanical Science and Technology
    • /
    • v.18 no.4
    • /
    • pp.718-730
    • /
    • 2004
  • The solution for the natural convection in internally finned horizontal annuli is obtained by using a numerical simulation of time-dependent and two-dimensional governing equations. The fins existing in annuli influence the flow pattern, temperature distribution and heat transfer rate. The variations of the On configuration suppress or accelerate the free convective effects compared to those of the smooth tubes. The effects of fin configuration, number of fins and ratio of annulus gap width to the inner cylinder radius on the fluid flow and heat transfer in annuli are demonstrated by the distribution of the velocity vector, isotherms and streamlines. The governing equations are solved efficiently by using a parallel implementation. The technique is adopted for reduction of the computation cost. The parallelization is performed with the domain decomposition technique and message passing between sub-domains on the basis of the MPI library. The results from parallel computation reveal in consistency with those of the sequential program. Moreover, the speed-up ratio shows linearity with the number of processor.

An Application-Level Fault Tolerant Linear System Solver Using an MPMD Type Asynchronous Iteration (MPMD 방식의 비동기 연산을 이용한 응용 수준의 무정지 선형 시스템의 해법)

  • Park, Pil-Seong
    • The KIPS Transactions:PartA
    • /
    • v.12A no.5 s.95
    • /
    • pp.421-426
    • /
    • 2005
  • In a large scale parallel computation, some processor or communication link failure results in a waste of huge amount of CPU hours. However, MPI in its current specification gives the user no possibility to handle such a problem. In this paper, we propose an application-level fault tolerant linear system solver by using an MPMD-type asynchronous iteration, purely on the basis of the MPI standard without using any non-standard fault-tolerant MPI library.

The Design and Implementation of On-Line Performance Monitor for JaNeC (JaNeC을 위한 온라인 성능감시기의 설계 및 구현)

  • Kim, Myung-Ho;Kim, Nam-Hoon;Choi, Jae-young
    • The KIPS Transactions:PartA
    • /
    • v.9A no.4
    • /
    • pp.563-572
    • /
    • 2002
  • A performance monitor is indispensable to trace and evaluate performance of a program under distributed processing environment. A performance monitor il classified as off-line and on-line according to its output method. An off-line performance monitor analyzes its performance after a program terminates, and an on-line performance monitor analyzes its one while a program runs. Therefore, the on-line function is essential to analyzing and debugging the program fast. JaNeC, distributed processing environment that is implemented in Java, contains an off-line performance monitor for this. However, this performance monitor may not analyze the program running on JaNeC efficiently. Consequently, this paper explains that an on-line performance monitor is designed and implemented for fast analysis and debugging of the program running on JaNeC. This on-line performance monitor is designed to minimize effects on a program to analyze, and provides various forms of graphic output, to analyze the program effectively. In addition, even after a program terminates, it provides interface with the off-line performance monitor, to analyze again.

Scalarization of HPF FORALL Construct (HPF FORALL 구조의 스칼라화(Scalarization))

  • Koo, Mi-Soon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.5
    • /
    • pp.121-129
    • /
    • 2007
  • Scalarization is a process that a parallel construct like an array statement of Fortran 90 or FORALL of HPF is converted into sequential loops that maintain the correct semantics. Most compilers of HPF, recognized as a standard data parallel language, convert a HPF program into a Fortran 77 program inserted message passing primitives. During scalariztion, a parallel construct FORALL should be translated into Fortran 77 DO loops maintaining the semantics of FORALL. In this paper, we propose a scalarization algorithm which converts a FORALL construct into a DO loop with improved performance. For this, we define and use a relation distance vector to keep necessary dependence informations. Then we evaluate execution times of the codes generated by our method and by PARADIGM compiler method for various array sizes.

  • PDF

DEVS 형식론을 이용한 다중프로세서 운영체제의 모델링 및 성능평가

  • 홍준성
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 1994.10a
    • /
    • pp.32-32
    • /
    • 1994
  • In this example, a message passing based multicomputer system with general interdonnedtion network is considered. After multicomputer systems are developed with morm-hole routing network, topologies of interconecting network are not major considertion for process management and resource sharing. Tehre is an independeent operating system kernel oneach node. It communicates with other kernels using message passingmechanism. Based on this architecture, the problem is how mech does performance degradation will occur in the case of processor sharing on multicomputer systems. Processor sharing between application programs is veryimprotant decision on system performance. In almost cases, application programs running on massively parallel computer systems are not so much user-interactive. Thus, the main performance index is system throughput. Each application program has various communication patterns. and the sharing of processors causes serious performance degradation in hte worst case such that one processor is shared by two processes and another processes are waiting the messages from those processes. As a result, considering this problem is improtant since it gives the reason whether the system allows processor sharingor not. Input data has many parameters in this simulation . It contains the number of threads per task , communication patterns between threads, data generation and also defects in random inupt data. Many parallel aplication programs has its specific communication patterns, and there are computation and communication phases. Therefore, this phase informatin cannot be obtained random input data. If we get trace data from some real applications. we can simulate the problem more realistic . On the other hand, simualtion results will be waseteful unless sufficient trace data with varisous communication patterns is gathered. In this project , random input data are used for simulation . Only controllable data are the number of threads of each task and mapping strategy. First, each task runs independently. After that , each task shres one and more processors with other tasks. As more processors are shared , there will be performance degradation . Form this degradation rate , we can know the overhead of processor sharing . Process scheduling policy can affects the results of simulation . For process scheduling, priority queue and FIFO queue are implemented to support round-robin scheduling and priority scheduling.

  • PDF