• Title/Summary/Keyword: Linpack

Search Result 9, Processing Time 0.025 seconds

Performance Analysis of Cluster Network Interfaces for Parallel Computing of Computational Fluid Dynamics (전산유체역학 병렬해석을 위한 클러스터 네트웍 장치 성능분석)

  • Lee, Bo Seong;Hong, Jeong U;Lee, Dong Ho;Lee, Sang San
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.31 no.5
    • /
    • pp.37-43
    • /
    • 2003
  • Parallel computing method is widely used in the computational fluid dynamics for efficient numerical analysis. Nowadays, low cost Linux cluster computers substitute for traditional supercomputers with parallel computing shcemes. The performance of nemerical solvers on an Linux cluster computer is highly dependent not on the performance of processors but on the performance of network devices in the cluster system. In this paper, we investigated the effects of the network devices such as Myrinet2000, gigabit ethernet, and fast ethernet on the performance of the cluster system by using some benchmark programs such as Netpipe, LINPACK, NAS NPB, and MPINS2D Navier-Stokes solvers. Finally, upon this investigation, we will suggest the method for building high performance low cost Linux cluster system in the computational fluid dynamics analysis.

Performance Improvement of Reorder Buffer in Out-of-order Issue Superscalar Processors (비순차이슈 수퍼스칼라 프로세서에서 리오더버퍼의 성능개선)

  • Jang, Mun-Seok;Lee, Jeong-U;Choe, Sang-Bang
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.1_2
    • /
    • pp.90-102
    • /
    • 2001
  • 리오더버퍼는 명령어를 비순차로 이슈하는 수퍼스칼라 파이프라인에서의 명령어 실행을 순차적으로 완료하는데 사용된다. 본 논문에서는 리오더버퍼에 의하여 발생할 수 있는 명령어의 스테그네이션(stagnation)을 효율적으로 제거시킬 뿐만 아니라 리오더버퍼의 크기를 감소시킬 수 있는 쉘터버퍼를 사용한 리오더버퍼 구조를 제안하였다. 시뮬레이션을 수행한 결과 리오더버퍼의 엔트리 개수가 8개에서 32개 사이일 때 쉘터버퍼는 단지 1개 또는 2개만 사용하여도 뚜렷한 성능 향상을 얻을 수 있음을 보여준다. 쉘터버퍼를 4개 사용했을 때는 2개만 사용했을 경우와 비교하여 주목할만한 성능 향상은 없었다. 이는 쉘터버퍼를 2개만 사용하여도 대부분의 스테그네이션을 제거하는데 충분함을 보여준다. 실행율의 손실이 없는 상태에서 2개의 쉘터버퍼를 사용하면 Whetstone 벤치마크 프로그램에서는 44%, FFT 벤치마크 프로그램에서는 50%, FM 벤치마크 프로그램에서는 60%, Linpack 벤치마크 프로그램에서는 75%의 리오더버퍼의 크기를 줄일 수 있었다. 쉘터버퍼를 사용했을 때 수행 시간 역시 Whetstone에서는 19.78%, FFT에서는 19.67%, FM에서는 23.93%, Linpack에서는 8.65%의 성능 향상을 얻을 수 있었다.

  • PDF

Multi-communication layered HPL model and its application to GPU clusters

  • Kim, Young Woo;Oh, Myeong-Hoon;Park, Chan Yeol
    • ETRI Journal
    • /
    • v.43 no.3
    • /
    • pp.524-537
    • /
    • 2021
  • High-performance Linpack (HPL) is among the most popular benchmarks for evaluating the capabilities of computing systems and has been used as a standard to compare the performance of computing systems since the early 1980s. In the initial system-design stage, it is critical to estimate the capabilities of a system quickly and accurately. However, the original HPL mathematical model based on a single core and single communication layer yields varying accuracy for modern processors and accelerators comprising large numbers of cores. To reduce the performance-estimation gap between the HPL model and an actual system, we propose a mathematical model for multi-communication layered HPL. The effectiveness of the proposed model is evaluated by applying it to a GPU cluster and well-known systems. The results reveal performance differences of 1.1% on a single GPU. The GPU cluster and well-known large system show 5.5% and 4.1% differences on average, respectively. Compared to the original HPL model, the proposed multi-communication layered HPL model provides performance estimates within a few seconds and a smaller error range from the processor/accelerator level to the large system level.

A Multistriped Checkpointing Scheme for the Fault-tolerant Cluster Computers (다중 분할된 구조를 가지는 클러스터 검사점 저장 기법)

  • Chang, Yun-Seok
    • The KIPS Transactions:PartA
    • /
    • v.13A no.7 s.104
    • /
    • pp.607-614
    • /
    • 2006
  • The checkpointing schemes should reduce the process delay through managing the checkpoints of each node to fit the network load to enhance the performance of the process running on the cluster system that write the checkpoints into its global stable storage. For this reason, a cluster system with single IO space on a distributed RAID chooses a suitable checkpointng scheme to get the maximum IO performance and the best rollback recovery efficiency. In this paper, we improved the striped checkpointing scheme with dynamic stripe group size by adapting to the network bandwidth variation at the point of checkpointing. To analyze the performance of the multi striped checkpointing scheme, we applied Linpack HPC benchmark with MPI on our own cluster system with maximum 512 virtual nodes. The benchmark results showed that the multistriped checkpointing scheme has better performance than the striped checkpointing scheme on the checkpoint writing efficiency and rollback recovery at heavy system load.

A Striped Checkpointing Scheme for the Cluster System with the Distributed RAID (분산 RAID 기반의 클러스터 시스템을 위한 분할된 결함허용정보 저장 기법)

  • Chang, Yun-Seok
    • The KIPS Transactions:PartA
    • /
    • v.10A no.2
    • /
    • pp.123-130
    • /
    • 2003
  • This paper presents a new striped checkpointing scheme for serverless cluster computers, where the local disks are attached to the cluster nodes collectively form a distributed RAID with a single I/O space. Striping enables parallel I/O on the distributed disks and staggering avoids network bottleneck in the distributed RAID. We demonstrate how to reduce the checkpointing overhead and increase the availability by striping and staggering dynamically for communication intensive applications. Linpack HPC Benchamark and MPI programs are applied to these checkpointing schemes for performance evaluation on the 16-nodes cluster system. Benchmark results prove the benefits of the striped checkpointing scheme compare to the existing schemes, and these results are useful to design the efficient checkpointing scheme for fast rollback recovery from any single node failure in a cluster system.

Performance Analysis of Network Devices for High Performance Computing Cluster (HPC 클러스터 구축을 위한 다양한 네트워크 성능 분석)

  • Hong, Jeong-Woo;Lee, Bo-Sung;Park, Hyung-Woo;Lee, Sang-San
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04a
    • /
    • pp.319-322
    • /
    • 2002
  • 최근 주목받고 있는 그리드 컴퓨팅 연구등에 주요한 요소로서 기대되어지는 고성능 클러스터 시스템들은 주로 과학 기술 응용연구를 위해 사용되어진다. 이러한 종류의 병렬 시스템은 특정 부품들을 사용하는데 그중 네트워크를 구성하는 부품들이 통상의 분산/병렬컴퓨팅에 주요한 역할요소로서 주목을 받아오고 있다. 이 논문에서는 myrinet, Gbit ethernet, Fast ethernet 장비에 대하여 각각 Netpipe, Linpack, NPB 등의 벤치마크를, 성능 실험을 동해 선정한 Pentium IV 1.7Mhz/1Gb Mem 16노드로 구성한 클러스터에 대하여 2종의 컴파일러를 사용하여 테스트하고 그 결과를 분서하였다. 상이한 성능 차를 보이는 장비간의 성능 비교를 통해 2002년 2월 현재 가능한 응용문제가 사용하고 있는 알고리즘에 따른 최적의 클러스터 시스템의 최적 구성을 도출 할 수 있다.

  • PDF

Performance Analysis of Cluster Network Interfaces for Parallel Computing of Computational Fluid Dynamics (전산유체역학 병렬해석을 위한 클러스터 네트웍 장치 성능분석)

  • Lee Bo-sung;Hong Jeong-Woo;Lee Sangsan;Lee Dong Ho
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2002.05a
    • /
    • pp.152-157
    • /
    • 2002
  • 전산유체역학분야의 고속 연산을 위해서 병렬처리가 보편화되고 있으며 이러한 병렬해석은 주로 클러스터에서 저렴한 비용으로 수행되고 있다. 전산유체역학을 위한 클러스터 컴퓨터에서의 해석프로그램의 성능은 클러스터에 사용되는 프로세서의 성능뿐만 아니라 클러스터 내부의 통신 장비의 성능에 크게 좌우된다. 본 논문에서는 클러스터 컴퓨터의 구축에 널리 사용되고 있는 Myrinet2000, Gigabit Ethernet, Fast Ethernet 등의 네트웍 장치에 대해서 Netpipe, Linpack, NAS NPB, 그리고 MPINS2D Navier-Stokes 해석프로그램을 사용하여 성능을 비교하였다. 이를 통해서 향후 전산유체역학을 위한 클러스터 구축시 최대의 가격대 성능비를 얻을 수 있는 방법을 제시하고자 한다.

  • PDF

Proposal of Container-Based HPC Structures and Performance Analysis

  • Yong, Chanho;Lee, Ga-Won;Huh, Eui-Nam
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1398-1404
    • /
    • 2018
  • High-performance computing (HPC) provides to researchers a powerful ability to resolve problems with intensive computations, such as those in the math and medical fields. When an HPC platform is provided as a service, users may suffer from unexpected obstacles in developing and running applications due to restricted development environments and dependencies. In this context, operating system level virtualization can be a solution for HPC service to ensure lightweight virtualization and consistency in Dev-Ops environments. Therefore, this paper proposes three types of typical HPC structure for container environments built with HPC container and Docker. The three structures focus on smooth integration with existing HPC job framework, message passing interface (MPI). Lastly, the performance of the structures is analyzed with High Performance Linpack benchmark from the aspect of performance degradation in network communications under Docker.

Performance Optimization of Numerical Ocean Modeling on Cloud Systems (클라우드 시스템에서 해양수치모델 성능 최적화)

  • JUNG, KWANGWOOG;CHO, YANG-KI;TAK, YONG-JIN
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.27 no.3
    • /
    • pp.127-143
    • /
    • 2022
  • Recently, many attempts to run numerical ocean models in cloud computing environments have been tried actively. A cloud computing environment can be an effective means to implement numerical ocean models requiring a large-scale resource or quickly preparing modeling environment for global or large-scale grids. Many commercial and private cloud computing systems provide technologies such as virtualization, high-performance CPUs and instances, ether-net based high-performance-networking, and remote direct memory access for High Performance Computing (HPC). These new features facilitate ocean modeling experimentation on commercial cloud computing systems. Many scientists and engineers expect cloud computing to become mainstream in the near future. Analysis of the performance and features of commercial cloud services for numerical modeling is essential in order to select appropriate systems as this can help to minimize execution time and the amount of resources utilized. The effect of cache memory is large in the processing structure of the ocean numerical model, which processes input/output of data in a multidimensional array structure, and the speed of the network is important due to the communication characteristics through which a large amount of data moves. In this study, the performance of the Regional Ocean Modeling System (ROMS), the High Performance Linpack (HPL) benchmarking software package, and STREAM, the memory benchmark were evaluated and compared on commercial cloud systems to provide information for the transition of other ocean models into cloud computing. Through analysis of actual performance data and configuration settings obtained from virtualization-based commercial clouds, we evaluated the efficiency of the computer resources for the various model grid sizes in the virtualization-based cloud systems. We found that cache hierarchy and capacity are crucial in the performance of ROMS using huge memory. The memory latency time is also important in the performance. Increasing the number of cores to reduce the running time for numerical modeling is more effective with large grid sizes than with small grid sizes. Our analysis results will be helpful as a reference for constructing the best computing system in the cloud to minimize time and cost for numerical ocean modeling.