• Title/Summary/Keyword: high performance computing

Search Result 1,110, Processing Time 0.024 seconds

Comparative and Combined Performance Studies of OpenMP and MPI Codes (OpenMP와 MPI 코드의 상대적, 혼합적 성능 고찰)

  • Lee Myung-Ho
    • The KIPS Transactions:PartA
    • /
    • v.13A no.2 s.99
    • /
    • pp.157-162
    • /
    • 2006
  • Recent High Performance Computing (HPC) platforms can be classified as Shared-Memory Multiprocessors (SMP), Massively Parallel Processors (MPP), and Clusters of computing nodes. These platforms are deployed in many scientific and engineering applications which require very high demand on computing power. In order to realize an optimal performance for these applications, it is crucial to find and use the suitable computing platforms and programming paradigms. In this paper, we use SPEC HPC 2002 benchmark suite developed in various parallel programming models (MPI, OpenMP, and hybrid of MPI/OpenMP) to find an optimal computing environments and programming paradigms for them through their performance analyses.

HTCaaS(High Throughput Computing as a Service) in Supercomputing Environment (슈퍼컴퓨팅환경에서의 대규모 계산 작업 처리 기술 연구)

  • Kim, Seok-Kyoo;Kim, Jik-Soo;Kim, Sangwan;Rho, Seungwoo;Kim, Seoyoung;Hwang, Soonwook
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.5
    • /
    • pp.8-17
    • /
    • 2014
  • Petascale systems(so called supercomputers) have been mainly used for supporting communication-intensive and tightly-coupled parallel computations based on message passing interfaces such as MPI(HPC: High-Performance Computing). On the other hand, computing paradigms such as High-Throughput Computing(HTC) mainly target compute-intensive (relatively low I/O requirements) applications consisting of many loosely-coupled tasks(there is no communication needed between them). In Korea, recently emerging applications from various scientific fields such as pharmaceutical domain, high-energy physics, and nuclear physics require a very large amount of computing power that cannot be supported by a single type of computing resources. In this paper, we present our HTCaaS(High-Throughput Computing as a Service) which can leverage national distributed computing resources in Korea to support these challenging HTC applications and describe the details of our system architecture, job execution scenario and case studies of various scientific applications.

Torus Network Based Distributed Storage System for Massive Multimedia Contents (토러스 연결망 기반의 대용량 멀티미디어용 분산 스토리지 시스템)

  • Kim, Cheiyol;Kim, Dongoh;Kim, Hongyeon;Kim, Youngkyun;Seo, Daewha
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.8
    • /
    • pp.1487-1497
    • /
    • 2016
  • Explosively growing service of digital multimedia data increases the need for highly scalable low-cost storage. This paper proposes the new storage architecture based on torus network which does not need network switch and erasure coding for efficient storage usage for high scalability and efficient disk utilization. The proposed model has to compensate for the disadvantage of long network latency and network processing overhead of torus network. The proposed storage model was compared to two most popular distributed file system, GlusterFS and Ceph distributed file systems through a prototype implementation. The performance of prototype system shows outstanding results than erasure coding policy of two file systems and mostly even better results than replication policy of them.

The Technology Trend of Interconnection Network for High Performance Computing (고성능 컴퓨팅을 위한 인터커넥션 네트워크 기술 동향)

  • Cho, Hyeyoung;Jun, Tae Joon;Han, Jiyong
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.8
    • /
    • pp.9-15
    • /
    • 2017
  • With the development of semiconductor integration technology, central processing units and storage devices have been miniaturized and performance has been rapidly developed, interconnection network technology is becoming a more important factor in terms of the performance of high performance computing system. In this paper, we analyze the trend of interconnection network technology used in high performance computing. Interconnect technology, which is the most widely used in the Supercomputer Top 500(2017. 06.), is an Infiniband. Recently, Ethernet is the second highest share after InfiniBand due to the emergence of 40/100Gbps Gigabit Ethernet technology. Gigabit Ethernet, where latency performance is lower than InfiniBand, is preferred in cost-effective medium-sized data centers. In addition, top-end HPC systems that demand high performance are devoting themselves from Ethernet and InfiniBand technologies and are attempting to maximize system performance by introducing their own interconnect networks. In the future, high-performance interconnects are expected to utilize silicon-based optical communication technology to exchange data with light.

Efficient Data Pre-fetching Scheme for InfiniBand based High Performance Clusters (인피니밴드 기반 고성능 클러스터를 위한 효율적인 데이터 선반입 기법)

  • Kim, Bongjae;Jung, Jinman;Min, Hong;Heo, Junyoung;Jung, Hyedong
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.293-298
    • /
    • 2017
  • Recently, much research has been devoted to implementing and provisioning high-performance computing environment using clusters with multiple computers and high-performance networking technologies. In-memory based Key-Value stores, such as Redis or Memcached, are widely used in high performance cluster environments to improve the data processing performance. We can distribute data at different storage nodes, and each computing node can access it at a high speed using these In-memory based Key-Value stores. InfiniBand is a de-facto technology that is widely used to interconnect each node of a cluster. In this paper, we propose a new data pre-fetching scheme for Key-Value store based on high performance clusters to improve the performance. The proposed scheme utilizes the data transfer characteristics of InfiniBand. The results of the simulation show that the proposed scheme can reduce the data transfer time by up to about 28%.

A Multi-Class Task Scheduling Strategy for Heterogeneous Distributed Computing Systems

  • El-Zoghdy, S.F.;Ghoneim, Ahmed
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.1
    • /
    • pp.117-135
    • /
    • 2016
  • Performance enhancement is one of the most important issues in high performance distributed computing systems. In such computing systems, online users submit their jobs anytime and anywhere to a set of dynamic resources. Jobs arrival and processes execution times are stochastic. The performance of a distributed computing system can be improved by using an effective load balancing strategy to redistribute the user tasks among computing resources for efficient utilization. This paper presents a multi-class load balancing strategy that balances different classes of user tasks on multiple heterogeneous computing nodes to minimize the per-class mean response time. For a wide range of system parameters, the performance of the proposed multi-class load balancing strategy is compared with that of the random distribution load balancing, and uniform distribution load balancing strategies using simulation. The results show that, the proposed strategy outperforms the other two studied strategies in terms of average task response time, and average computing nodes utilization.

Implementation and Performance Analysis of High Performance Computing Library for Parallel Processing (병렬처리를 위한 고성능 라이브러리의 구현과 성능 평가)

  • 김영태;이용권
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.7
    • /
    • pp.379-386
    • /
    • 2004
  • We designed a portable parallel library HPCL(High Performance Computing Library) with following objectives: (1) to provide a close relationship between the parallel code and the original sequential code that will help future versions of the sequential code and (2) to enhance performance of the parallel code. The library is an interface written in C and Fortran programming languages between MPI(Message Passing Interface) and parallel programs in Fortran. Performance results were determined on clusters of PC's and IBM SP4.

A Comparative Performance Study for Compute Node Sharing

  • Park, Jeho;Lam, Shui F.
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.4
    • /
    • pp.287-293
    • /
    • 2012
  • We introduce a methodology for the study of the application-level performance of time-sharing parallel jobs on a set of compute nodes in high performance clusters and report our findings. We assume that parallel jobs arriving at a cluster need to share a set of nodes with the jobs of other users, in that they must compete for processor time in a time-sharing manner and other limited resources such as memory and I/O in a space-sharing manner. Under the assumption, we developed a methodology to simulate job arrivals to a set of compute nodes, and gather and process performance data to calculate the percentage slowdown of parallel jobs. Our goal through this study is to identify a better combination of jobs that minimize performance degradations due to resource sharing and contention. Through our experiments, we found a couple of interesting behaviors for overlapped parallel jobs, which may be used to suggest alternative job allocation schemes aiming to reduce slowdowns that will inevitably result due to resource sharing on a high performance computing cluster. We suggest three job allocation strategies based on our empirical results and propose further studies of the results using a supercomputing facility at the San Diego Supercomputing Center.

Exploiting Static Non-Uniform Cache Architectures for Hard Real-Time Computing

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.4
    • /
    • pp.177-189
    • /
    • 2015
  • High-performance processors using Non-Uniform Cache Architecture (NUCA) are increasingly used to deal with the growing wire delays in multicore/manycore processors. Due to the convergence of high-performance computing with embedded computing, NUCA caches are expected to benefit high-end embedded systems as well. However, for real-time systems that use multicore processors with NUCA caches, it is crucial to bound worst-case execution time (WCET) accurately and safely. In this paper, we developed a WCET analysis approach by considering the effect of static NUCA caches on WCET. We compared the WCET in real-time applications with different topologies of static NUCA caches. Our experimental results demonstrated that the static NUCA cache could improve the worst-case performance of realtime applications using multicore processor compared to the cache with uniform access time.