• 제목/요약/키워드: parallel performance

검색결과 2,865건 처리시간 0.028초

Performance Optimization of Parallel Algorithms

  • Hudik, Martin;Hodon, Michal
    • Journal of Communications and Networks
    • /
    • 제16권4호
    • /
    • pp.436-446
    • /
    • 2014
  • The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.

빅데이터 분석을 위한 슈퍼컴퓨터 환경에서 R의 병렬처리 (Parallel Computing Environment for R with on Supercomputer Systems)

  • 이상열;원중호
    • 한국경영과학회지
    • /
    • 제39권4호
    • /
    • pp.19-31
    • /
    • 2014
  • We study parallel processing techniques for the R programming language of high performance computing technology. In this study, we used massively parallel computing system which has 25,408 cpu cores. We conducted a performance evaluation of a distributed memory system using MPI and of a the shared memory system using OpenMP. Our findings are summarized as follows. First, For some particular algorithms, parallel processing is about 150 times faster than serial processing in R. Second, the distributed memory system gets faster as the number of nodes increases while shared memory system is limited in the improvement of performance, due to the limit of the number of cpus in a single system.

평행류 열교환기의 열.유동 해석 및 최적화 (Thermal and flow analysis for the optimization of a parallel flow heat exchanger)

  • 이관수;정지완;유재흥
    • 대한기계학회논문집B
    • /
    • 제22권2호
    • /
    • pp.229-239
    • /
    • 1998
  • The present paper examines the thermal and flow characteristics of a parallel flow heat exchanger and investigates the effects of the parameters on thermal performance by defining the flow nonuniformity. Thermal performance of a parallel flow heat exchanger is maximized by the optimization using Newton's searching method. The flow nonuniformity is chosen as an object function. The parameters such as the locations of separator, inlet, and outlet are expected to have a large influence on thermal performance of a parallel flow heat exchanger. The effect of these parameters are quantified by flow nonuniformity. The results show that the optimal locations of inlet and outlet are 19.73 mm and 10.9 mm, respectively. It is also shown that the heat transfer increases by 7.6% and the pressure drop decreases by 4.7%, compared to the reference model.

대향류와 평행류형 판형 증발기에서 운전방식에 따른 성능특성 분석 (Analysis of Performance Characteristics in the Counter and Parallel Type Plate Evaporator with Operating Methods)

  • 배경진;차동안;권오경
    • 동력기계공학회지
    • /
    • 제17권3호
    • /
    • pp.50-56
    • /
    • 2013
  • The analysis of performance characteristics was carried out in the plate type evaporator with counter and parallel flow. To investigate performance of evaporator with water inlet temperature and refrigerant mass flow rate were changed. As a result, when the inlet temperature of water is $8^{\circ}C$, capacity of parallel flow evaporator higher than counter flow is 0.35%. But as the inlet temperature of water rises from $8^{\circ}C$ to $16^{\circ}C$, capacity of counter flow type evaporator higher than parallel flow type is 0.12%, 0.27%, 1.1%, 1.6%, respectively. The findings showed that counter flow type evaporator has a larger capacity than those that were parallel flow type evaporator. As the refrigerant mass flow rate rises, capacity and pressure drop increases in the counter and parallel flow type evaporator.

Improved Disparity Map Computation on Stereoscopic Streaming Video with Multi-core Parallel Implementation

  • Kim, Cheong Ghil;Choi, Yong Soo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권2호
    • /
    • pp.728-741
    • /
    • 2015
  • Stereo vision has become an important technical issue in the field of 3D imaging, machine vision, robotics, image analysis, and so on. The depth map extraction from stereo video is a key technology of stereoscopic 3D video requiring stereo correspondence algorithms. This is the matching process of the similarity measure for each disparity value, followed by an aggregation and optimization step. Since it requires a lot of computational power, there are significant speed-performance advantages when exploiting parallel processing available on processors. In this situation, multi-core CPU may allow many parallel programming technologies to be realized in users computing devices. This paper proposes parallel implementations for calculating disparity map using a shared memory programming and exploiting the streaming SIMD extension technology. By doing so, we can take advantage both of the hardware and software features of multi-core processor. For the performance evaluation, we implemented a parallel SAD algorithm with OpenMP and SSE2. Their processing speeds are compared with non parallel version on stereoscopic streaming video. The experimental results show that both technologies have a significant effect on the performance and achieve great improvements on processing speed.

소규모 클러스터 시스템에서의 PVFS 성능 최적화에 관한 연구 (An Analysis of PVFS Performance Optimization on Small Cluster System)

  • 조혜영;차광호;김성호
    • 한국콘텐츠학회:학술대회논문집
    • /
    • 한국콘텐츠학회 2007년도 추계 종합학술대회 논문집
    • /
    • pp.547-549
    • /
    • 2007
  • 고속 네트워크로 연결된 병렬 컴퓨터 및 클러스터 시스템의 응용 분야가 다양화되고 사용자가 증가함에 따라 분산 및 병렬 파일 시스템에 대한 관심이 높아지고 있다. 특히 복잡한 네트워크로 구성된 클러스터 시스템을 보다 효율적으로 사용하기 위해서 분산 및 병렬 파일 시스템의 성능을 최적화하려는 많은 연구가 진행 중이다. 본 논문에서는 소규모 클러스터 시스템에서 널리 사용되고 있는 파일 시스템인 PVFS(Parallel Virtual File System)의 성능을 분석하고, 주어진 네트워크 환경에 따라 성능을 최적화할 수 있는 방법인 FlowBuffer의 따른 변화에 PVFS의 성능을 비교 분석하였다.

  • PDF

윈도우즈 기반의 병렬컴퓨팅 환경 구축 및 성능평가 (Construction and Performance Evaluation of Windows- based Parallel Computing Environment)

  • 신재렬;김명호;최정열
    • 한국전산유체공학회:학술대회논문집
    • /
    • 한국전산유체공학회 2001년도 추계 학술대회논문집
    • /
    • pp.58-62
    • /
    • 2001
  • Aparallel computing environment was constructed based on Windows 2000 operating system. This cluster was configured using Fast-Ethernet system to hold up together the clients within a network domain. For the parallel computation, MPI implements for Windows such as MPICH.NT.1.2.2 and MP-MPICHNT.1.2 were used with Compaq Visual Fortran compiler which produce a well optimized executives for x86 systems. The evaluation of this cluster performance was carried out using a preconditioned Navier-Stokes code for the 2D analysis of a compressible and viscous flow around a compressor blade. The parallel performance was examined in comparison with those of Linux clusters studied previously by changing a number of processors, problem size and MPI libraries. The result from the test problems presents that parallel performance of the low cost Fast-Ethernet Windows cluster is superior to that of a Linux cluster of similar configuration and is comparable to that of a Myrinet cluster.

  • PDF

PERFORMANCE ENHANCEMENT OF PARALLEL MULTIFRONTAL SOLVER ON BLOCK LANCZOS METHOD

  • Byun, Wan-Il;Kim, Seung-Jo
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • 제13권1호
    • /
    • pp.13-20
    • /
    • 2009
  • The IPSAP which is a finite element analysis program has been developed for high parallel performance computing. This program consists of various analysis modules - stress, vibration and thermal analysis module, etc. The M orthogonal block Lanczos algorithm with shiftinvert transformation is used for solving eigenvalue problems in the vibration module. And the multifrontal algorithm which is one of the most efficient direct linear equation solvers is applied to factorization and triangular system solving phases in this block Lanczos iteration routine. In this study, the performance enhancement procedures of the IPSAP are composed of the following stages: 1) communication volume minimization of the factorization phase by modifying parallel matrix subroutines. 2) idling time minimization in triangular system solving phase by partial inverse of the frontal matrix and the LCM (least common multiple) concept.

  • PDF

IS-95역방향 링크에서 단일 적분 및 이중 적분 검색 방식의 성능 분석 (Performance evaluation of the single-dwell and double-dwell detection schemes in the IS-95 reverse link)

  • 강법주;박형래;손정영;강창언
    • 한국통신학회논문지
    • /
    • 제21권2호
    • /
    • pp.383-393
    • /
    • 1996
  • This paper considers the evaluation of the ecquistion performance for an accesschannel preamble based on a random access procedure of direct sequence code division multiple access(DS/CDMA) reverse link. The parallel acquistion technique that employs the single-well detection scheme and the multiple-dwell(double-dwell) detection scheme is mentioned. The acquisition performance for two detection schemes is compared in therms of the acquisition probability and the acquisition time. The parallel acquisition is done by a bank of N parallel I/Q noncoherent correlators. Expressions on the detection, false alarm, and miss probabilities of the single-dwell and multiple-dwell(double-well) detection schemes are derived for multiple H$_{1}$ cells and multipath Rayleight fading channel. comparing the single-dwell detection scheme with the multiple-dwell(double-dwell) detection scheme in the case of employing the parallel acquisition technique in the reverse link,the numerical results show that the single-dwell detection scheme deomonstrates a better performance.

  • PDF

이질적 계산 능력을 가진 NOW를 위한 공간 공유 스케쥴링 기법 (Space-Sharing Scheduling Schemes for NOW with Heterogeneous Computing Power)

  • 김진성;심영철
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제27권7호
    • /
    • pp.650-664
    • /
    • 2000
  • NOW(Network of Workstations)는 병렬 프로그램들을 수행하기 위한 플랫폼으로 많이 고려되어지고 있다. NOW에서 병렬 프로그램이 좋은 성능으로 실행되기 위해 해결되어야할 기본적인 문제들 중 하나가 작업의 스케쥴링 문제를 효율적으로 결정하는 것이다. 현재 NOW에 관한 대부분의 연구는 NOW를 구성하는 모든 워크스테이션이 같은 처리 능력을 가지고 있다고 가정하고 있다. 본 논문에서는 NOW를 구성하는 워크스테이션들이 다른 계산 능력을 가지고 있는 것을 고려한다. 이질적인 계산 능력을 가지고 있는 워크스테이션들로 구성된 NOW에 적용할 수 있는 10가지 공간 분할 스케쥴링 방법을 제시하 고, 시뮬레이터를 통하여 이 스케쥴링 정책들을 비교한다. 시뮬레이터는 합성된 순차/병렬 부하를 입력으로 받아 병렬 작업의 응답 사간과 기다림 시간을 성능 지표로 발생시킨다. 실험 결과 워크스테이션의 계산 능력에 비례하여 병렬 프로그램을 이질적으로 분할하는 경우가 균등 분할하는 경우보다 성능이 우수함을 알 수 있었다. 병렬 프로세스를 수행하는 워크스테이션에 소유자가 돌아온 경우 병렬 프로세스를 새 유휴 워크스테이션에 이주하는 것보다는 단지 우선 순위를 낮추는 것이 높은 성능을 보여 주었다. 우선 순위 낮춤을 사용하는 이질적 분할의 경우 적응 할당 정책이 넓은 범위의 병렬 프로그램 도착시간에서 좋은 성능을 보이나 부하 불균형이 높아지는 경우 수정된 적응 할당 정책이 높은 성능을 보여준다 .

  • PDF