• Title/Summary/Keyword: parallel communication

Search Result 1,114, Processing Time 0.022 seconds

Parallel Implementation of Radon Transform on TMS320C80-based System (TMS320C80시스템에서 Radon 변환의 병렬 구현)

  • 송정호;성효경최흥문
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.727-730
    • /
    • 1998
  • In this paper, we propose an implementation of an efficient parallel Radon transform on TMS320C80-based system. For an N$\times$N SAR image, we can obtain O(NM/p) of the conventional parallel Radon transform, by representing the projection patterns in Radon space variables instead of the image space variables, and pipelining the algorithm, where p is the number of processors and M is the number of projection angles. Also, we can reduce the time for the dynamic load distribution among the nodes and the communication overheads of accessing the global memories, by pipelining the memory and processing operations by using tripple buffer structure. Experimental results show an efficient parallel Radon transform of speedup Sp=3.9 and efficiency E=97.5% for 256$\times$256 image, when implemented on TMS320C80 composed of four parallel slave processors with three memory blocks.

  • PDF

Parallel Performance of Preconditioned Navier-Stokes Code on Myrinet Environment (Myrinet 환경에서 예조건화 Navier-Stokes 코드의 병렬처리 성능)

  • Kim M.-H.;Lee G. S.;Choi J.-Y.;Kim K. S.;Kim S.-L.;Jeung I.-S.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2001.05a
    • /
    • pp.149-154
    • /
    • 2001
  • Parallel performance of a Myrinet based PC-cluster was tested and compared with a conventional Fast-Ethernet system. A preconditioned Navier-Stokes code was parallelized with domain decomposition technique, and used for the parallel performance test. Speed-up ratio was examined as a major performance parameter depending on the number of processor and the network topology. As was expected, Myrinet system shows a superior parallel performance to the Fast-Ethernet system even with a single network adpater for a dual processor SMP machine. A test for the dependency on problem size also shows that network communication speed is a crucial factor for parallelized computational fluid dynamics analysis and the Myrinet system is a plausible candidate for high performance parallel computing system.

  • PDF

Efficient m-step Generalization of Iterative Methods

  • Kim, Sun-Kyung
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.11 no.5
    • /
    • pp.163-169
    • /
    • 2006
  • In order to use parallel computers in specific applications, algorithms need to be developed and mapped onto parallel computer architectures. Main memory access for shared memory system or global communication in message passing system deteriorate the computation speed. In this paper, it is found that the m-step generalization of the block Lanczos method enhances parallel properties by forming in simultaneous search direction vector blocks. QR factorization, which lowers the speed on parallel computers, is not necessary in the m-step block Lanczos method. The m-step method has the minimized synchronization points, which resulted in the minimized global communications and main memory access compared to the standard methods.

  • PDF

A Parallel Control Scheme for ABR Services in ATM Networks

  • Ding, Q.L.;Liew, S.C.
    • Journal of Communications and Networks
    • /
    • v.4 no.2
    • /
    • pp.118-127
    • /
    • 2002
  • This paper proposes a new scheme - parallel control scheme with feedback control (PCFC) for ABR services in ATM networks. The information from a source is split into a number of streams, for delivery over separate parallel connections with particular coding. At the receiver, the original information is reconstructed by the received packet from the parallel connections. The effects of PCFC on the network performance are due to two factors: Traffic splitting and load balancing. By combinations of analysis and simulation, this paper studies the implications of PCFC for how the ABR parameters should be scaled and the advantages of PCFC compared with other existing schemes.

Parallel Procedure and Evaluation of Parallel Performance of Impact Simulation Based on Two-Step Eulerian Scheme (Two-Step Eulerian 기법에 기반 한 충돌 해석의 병렬처리 및 병렬효율 평가)

  • Kim Seung-Jo;Lee Min-Hyung;Paik Seung-Hoon
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.30 no.10 s.253
    • /
    • pp.1320-1327
    • /
    • 2006
  • Parallel procedure and performance of two-step Eulerian code have not been reported sufficiently yet even though it was developed and utilized widely in the impact simulation. In this study, parallel strategy of two-step Eulerian code was proposed and described in detail. The performance was evaluated in the self-made linux cluster computer. Compared with commercial code, a relatively good performance is achieved. Through the performance evaluation of each computation stage, remap is turned out to be the most time consuming part among the other part such as FE processing, communication, time marching etc.

Parallel lProcessing of Pre-conditioned Navier-Stokes Code on the Myrinet and Fast-Ethernet PC Cluster (Myrinet과 Fast-Ethernet PC Cluster에서 예조건화 Navier-Stokes코드의 병렬처리)

  • Lee, G.S.;Kim, M.H.;Choi, J.Y.;Kim, K.S.;Kim, S.L.;Jeung, I.S.
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.30 no.6
    • /
    • pp.21-30
    • /
    • 2002
  • A preconditioned Navier-Stokes code was parallelized by the domain decomposition technique, and the accuracy of the parallelized code was verified through a comparison with the result of a sequential code and experimental data. Parallel performance of the code was examined on a Myrinet based PC-cluster and a Fast-Ethernet system. Speed-up ratio was examined as a major performance parameter depending on the number of processor and the network communication topology. In this test, Myrinet system shows a superior parallel performance to the Fast-Ethernet system as was expected. A test for the dependency on problem size also shows that network communication speed in a crucial factor for parallel performance, and the Myrinet based PC-cluster is a plausible candidate for high performance parallel computing system.

A Performance Comparison between Coarray and MPI for Parallel Wave Propagation Modeling and Reverse-time Migration (코어레이와 MPI를 이용한 병렬 파동 전파 모델링과 거꿀 참반사 보정 성능 비교)

  • Ryu, Donghyun;Kim, Ahreum;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.19 no.3
    • /
    • pp.131-135
    • /
    • 2016
  • Coarray is a parallel processing technique introduced in the Fortran 2008 standard. Coarray can implement parallel processing using simple syntax. In this research, we examined applicability of Coarray to seismic parallel processing by comparing performance of seismic data processing programs using Coarray and MPI. We compared calculation time using seismic wave propagation modeling and one to one communication time using domain decomposition technique. We also compared performance of parallel reverse-time migration programs using Coarray and MPI. Test results show that the computing speed of Coarray method is similar to that of MPI. On the other hand, MPI has superior communication speed to that of Coarray.

Performance Improvement of Prediction-Based Parallel Gate-Level Timing Simulation Using Prediction Accuracy Enhancement Strategy (예측정확도 향상 전략을 통한 예측기반 병렬 게이트수준 타이밍 시뮬레이션의 성능 개선)

  • Yang, Seiyang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.12
    • /
    • pp.439-446
    • /
    • 2016
  • In this paper, an efficient prediction accuracy enhancement strategy is proposed for improving the performance of the prediction-based parallel event-driven gate-level timing simulation. The proposed new strategy adopts the static double prediction and the dynamic prediction for input and output values of local simulations. The double prediction utilizes another static prediction data for the secondary prediction once the first prediction fails, and the dynamic prediction tries to use the on-going simulation result accumulated dynamically during the actual parallel simulation execution as prediction data. Therefore, the communication overhead and synchronization overhead, which are the main bottleneck of parallel simulation, are maximally reduced. Throughout the proposed two prediction enhancement techniques, we have observed about 5x simulation performance improvement over the commercial parallel multi-core simulation for six test designs.

The Survey of Parallel Programming Techniques for Developing Optimized Software in Multi-core System (멀티코어 시스템에서 최적화된 소프트웨어 개발을 위한 병렬처리 프로그래밍 기법 조사)

  • Lee, Ki-Hong;Kim, Jee-Hong;Eom, Young-Ik
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.36-38
    • /
    • 2012
  • 이제 멀티코어 CPU가 보편화 되었지만 대다수의 프로그래밍 언어가 단일 코어를 대상으로 발전되었기 때문에 병렬화에 어려움이 있다. 이를 해결하고자 병렬처리 기법들이 연구되고 있지만 오히려 개발자는 여러 기법들 사이에서 혼란스러울 수 있다. 본 논문에서는 개발자들이 처한 상황에서 적절한 기법을 선택하는데 도움이 되고자 주요 병렬처리 기법인 OpenMP, Threading Building Blocks, Cilk Plus, Parallel Patterns Library를 비교 및 평가하였다. 각 기법마다 지원 기능, 지원 방식, 스케줄링 기법 등 개발자가 프로그램을 개발함에 있어 고려해야 할 특징들이 서로 다르고 각기 장단점이 존재한다. 따라서 병렬처리 기법을 선택하고 구현함에 있어 특정한 하나의 기법에 의존하는 것보다는 여러 기법들의 특성을 파악하여 상황에 맞는 기법을 선택한다면 보다 효율적이면서도 쉽게 병렬처리를 구현할 수 있다.

A Study on Modular Min (Modular MIN에 관한 연구)

  • 장창수;최창훈;유창하
    • The Journal of the Korea Contents Association
    • /
    • v.2 no.2
    • /
    • pp.103-111
    • /
    • 2002
  • In parallel application programs with a localized communication, even if the MINs have lour diameters, overall system performance degrades when compared to the hypercube and tree structure. The reason is that it is impossible for MINs to provide some mechanisms for clustering to exploit the locality of reference. However proposed MIN can be constructed suitable for localized communication by providing the shortcut path and multiple paths inside the processor-memory duster which has frequent data communications. Therefore proposed MIN achieves enhanced performance in parallel application program with a localized communication.

  • PDF