• Title/Summary/Keyword: 고성능 컴퓨팅 시스템

Search Result 172, Processing Time 0.024 seconds

Performance Analysis of Cluster Network Interfaces for Parallel Computing of Computational Fluid Dynamics (전산유체역학 병렬해석을 위한 클러스터 네트웍 장치 성능분석)

  • Lee, Bo Seong;Hong, Jeong U;Lee, Dong Ho;Lee, Sang San
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.31 no.5
    • /
    • pp.37-43
    • /
    • 2003
  • Parallel computing method is widely used in the computational fluid dynamics for efficient numerical analysis. Nowadays, low cost Linux cluster computers substitute for traditional supercomputers with parallel computing shcemes. The performance of nemerical solvers on an Linux cluster computer is highly dependent not on the performance of processors but on the performance of network devices in the cluster system. In this paper, we investigated the effects of the network devices such as Myrinet2000, gigabit ethernet, and fast ethernet on the performance of the cluster system by using some benchmark programs such as Netpipe, LINPACK, NAS NPB, and MPINS2D Navier-Stokes solvers. Finally, upon this investigation, we will suggest the method for building high performance low cost Linux cluster system in the computational fluid dynamics analysis.

Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU (GPU 성능 저하 해결을 위한 내부 자원 활용/비활용 상태 분석)

  • Choi, Hongjun;Son, Dongoh;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.1-11
    • /
    • 2015
  • In recent high performance computing system, GPGPU has been widely used to process general-purpose applications as well as graphics applications, since GPU can provide optimized computational resources for massive parallel processing. Unfortunately, GPGPU doesn't exploit computational resources on GPU in executing general-purpose applications fully, because the applications cannot be optimized to GPU architecture. Therefore, we provide GPU research guideline to improve the performance of computing systems using GPGPU. To accomplish this, we analyze the negative factors on GPU performance. In this paper, in order to clearly classify the cause of the negative factors on GPU performance, GPU core status are defined into 5 status: fully active status, partial active status, idle status, memory stall status and GPU core stall status. All status except fully active status cause performance degradation. We evaluate the ratio of each GPU core status depending on the characteristics of benchmarks to find specific reasons which degrade the performance of GPU. According to our simulation results, partial active status, idle status, memory stall status and GPU core stall status are induced by computational resource underutilization problem, low parallelism, high memory requests, and structural hazard, respectively.

A Benchmark of Micro Parallel Computing Technology for Real-time Control in Smart Farm (MPICH vs OpenMP) (제목을스마트 시설환경 실시간 제어를 위한 마이크로 병렬 컴퓨팅 기술 분석)

  • Min, Jae-Ki;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.161-161
    • /
    • 2017
  • 스마트 시설환경의 제어 요소는 난방기, 창 개폐, 수분/양액 밸브 개폐, 환풍기, 제습기 등 직접적으로 시설환경의 조절에 관여하는 인자와 정보 교환을 위한 통신, 사용자 인터페이스 등 간접적으로 제어에 관련된 요소들이 복합적으로 존재한다. PID 제어와 같이 하는 수학적 논리를 바탕으로 한 제어와 전문 관리자의 지식을 기반으로 한 비선형 학습 모델에 의한 제어 등이 공존할 수 있다. 이러한 다양한 요소들을 복합적으로 연동시키기 위해선 기존의 시퀀스 기반 제어 방식에는 한계가 있을 수 있다. 관행의 방식과 같이 시계열 상에서 획득한 충분한 데이터를 이용하여 제어의 양과 시점을 결정하는 방식은 예외 상황에 충분히 대처하기 어려운 단점이 있을 수 있다. 이러한 예외 상황은 자연적인 조건의 변화에 따라 불가피하게 발생하는 경우와 시스템의 오류에 기인하는 경우로 나뉠 수 있다. 본 연구에서는 실시간으로 변하는 시설환경 내의 다양한 환경요소를 실시간으로 분석하고 상응하는 제어를 수행하여 수학적이며 예측 가능한 논리에 의해 준비된 제어시스템을 보완할 방법을 연구하였다. 과거의 고성능 컴퓨팅(HPC; High Performance Computing)은 다수의 컴퓨터를 고속 네트워크로 연동하여 집적적으로 연산능력을 향상시킨 기술로 비용과 규모의 측면에서 많은 투자를 필요로 하는 첨단 고급 기술이었다. 핸드폰과 모바일 장비의 발달로 인해 소형 마이크로프로세서가 발달하여 근래 2 Ghz의 클럭 속도에 이르는 어플리케이션 프로세서(AP: Application Processor)가 등장하기도 하였다. 상대적으로 낮은 성능에도 불구하고 저전력 소모와 플랫폼의 소형화를 장점으로 한 AP를 시설환경의 실시간 제어에 응용하기 위한 방안을 연구하였다. CPU의 클럭, 메모리의 양, 코어의 수량을 다음과 같이 달리한 3가지 시스템을 비교하여 AP를 이용한 마이크로 클러스터링 기술의 성능을 비교하였다.1) 1.5 Ghz, 8 Processors, 32 Cores, 1GByte/Processor, 32Bit Linux(ARMv71). 2) 2.0 Ghz, 4 Processors, 32 Cores, 2GByte/Processor, 32Bit Linux(ARMv71). 3) 1.5 Ghz, 8 Processors, 32 Cores, 2GByte/Processor, 64Bit Linux(Arch64). 병렬 컴퓨팅을 위한 개발 라이브러리로 MPICH(www.mpich.org)와 Open-MP(www.openmp.org)를 이용하였다. 2,500,000,000에 이르는 정수 중 소수를 구하는 연산에 소요된 시간은 1)17초, 2)13초, 3)3초 이었으며, $12800{\times}12800$ 크기의 행렬에 대한 2차원 FFT 연산 소요시간은 각각 1)10초, 2)8초, 3)2초 이었다. 3번 경우는 클럭속도가 3Gh에 이르는 상용 데스크탑의 연산 속도보다 빠르다고 평가할 수 있다. 라이브러리의 따른 결과는 근사적으로 동일하였다. 선행 연구에서 획득한 3차원 계측 데이터를 1초 단위로 3차원 선형 보간법을 수행한 경우 코어의 수를 4개 이하로 한 경우 근소한 차이로 동일한 결과를 보였으나, 코어의 수를 8개 이상으로 한 경우 앞선 결과와 유사한 경향을 보였다. 현장 보급 가능성, 구축비용 및 전력 소모 등을 종합적으로 고려한 AP 활용 마이크로 클러스터링 기술을 지속적으로 연구할 것이다.

  • PDF

Design and Implementation of Initial OpenSHMEM Based on PCI Express (PCI Express 기반 OpenSHMEM 초기 설계 및 구현)

  • Joo, Young-Woong;Choi, Min
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.3
    • /
    • pp.105-112
    • /
    • 2017
  • PCI Express is a bus technology that connects the processor and the peripheral I/O devices that widely used as an industry standard because it has the characteristics of high-speed, low power. In addition, PCI Express is system interconnect technology such as Ethernet and Infiniband used in high-performance computing and computer cluster. PGAS(partitioned global address space) programming model is often used to implement the one-sided RDMA(remote direct memory access) from multi-host systems, such as computer clusters. In this paper, we design and implement a OpenSHMEM API based on PCI Express maintaining the existing features of OpenSHMEM to implement RDMA based on PCI Express. We perform experiment with implemented OpenSHMEM API through a matrix multiplication example from system which PCs connected with NTB(non-transparent bridge) technology of PCI Express. The PCI Express interconnection network is currently very expensive and is not yet widely available to the general public. Nevertheless, we actually implemented and evaluated a PCI Express based interconnection network on the RDK evaluation board. In addition, we have implemented the OpenSHMEM software stack, which is of great interest recently.

Performance Improvement for PVM by Zero-copy Mechanism (Zero-copy 기술을 이용한 PVM의 성능 개선)

  • 임성택;심재홍;최경희;정기현;김재훈;문성근
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.5B
    • /
    • pp.899-912
    • /
    • 2000
  • PVM provides users with a single image of high performance parallel computing machine by collecting machines distributed over a network. Low communication overhead is essential to effectively run applications on PVM based platforms. In the original PVM, three times of memory copies are required for a PVM task to send a message to a remote task, which results in performance degradation. We propose a zero-copy model using global shared memory that can be accessed by PVM tasks, PVM daemon, and network interface card(NIC). In the scheme, a task packs data into global shared memory, and notify daemon that the data is ready to be sent, then daemon routes the data to a remote task to which it is sent with no virtual data copy overhead. Experimental result reveals that the message round trip time between two machines is reduced significantly in the proposed zero-copy scheme.

  • PDF

A Study on the Transmission Characteristics and Channel Capacity of Telephone Line Communication System (전화선 통신 시스템의 전송특성 및 채널용량에 관한 연구)

  • Roh, Jae-Sung;Chang, Tae-Hwa
    • Journal of Digital Contents Society
    • /
    • v.10 no.2
    • /
    • pp.233-238
    • /
    • 2009
  • The advances in the digital communication and network technology, Internet technology and the proliferation of smart appliances in home, have dramatically increased the need for a high speed/high quality home network. As consumer electronic devices and computing devices are increasing in the home network, it is obvious that the data traffic of home network increases as well. Various home network devices want to access Internet servers to get multimedia contents. Therefore, we introduce TLC(Telephone Line Carrier) system for networked digital consumer electronic appliances within a house using Ethernet or wire/wireless technology. In the future home network environment, the primary purposes of the smart home network based TLC are to create low-cost, easily deployable, high performance, and wide coverage throughout the home. In this paper, the channel capacity of telephone line communication system is evaluated and compared as a function of transmission power, number of OFDM carrier, channel loss, and noise loss for smart home network.

  • PDF

Image Classification of Damaged Bolts using Convolution Neural Networks (합성곱 신경망을 이용한 손상된 볼트의 이미지 분류)

  • Lee, Soo-Byoung;Lee, Seok-Soon
    • Journal of Aerospace System Engineering
    • /
    • v.16 no.4
    • /
    • pp.109-115
    • /
    • 2022
  • The CNN (Convolution Neural Network) algorithm which combines a deep learning technique, and a computer vision technology, makes image classification feasible with the high-performance computing system. In this thesis, the CNN algorithm is applied to the classification problem, by using a typical deep learning framework of TensorFlow and machine learning techniques. The data set required for supervised learning is generated with the same type of bolts. some of which have undamaged threads, but others have damaged threads. The learning model with less quantity data showed good classification performance on detecting damage in a bolt image. Additionally, the model performance is reviewed by altering the quantity of convolution layers, or applying selectively the over and under fitting alleviation algorithm.

Performance Analysis of Real-Time Big Data Search Platform Based on High-Capacity Persistent Memory (대용량 영구 메모리 기반 실시간 빅데이터 검색 플랫폼 성능 분석)

  • Eunseo Lee;Dongchul Park
    • Journal of Platform Technology
    • /
    • v.11 no.4
    • /
    • pp.50-61
    • /
    • 2023
  • The advancement of various big data technologies has had a tremendous impact on many industries. Diverse big data research studies have been conducted to process and analyze massive data quickly. Under these circumstances, new emerging technologies such as high-capacity persistent memory (PMEM) and Compute Express Link (CXL) have lately attracted significant attention. However, little investigation into a big data "search" platform has been made. Moreover, most big data software platforms have been still optimized for traditional DRAM-based computing systems. This paper first evaluates the basic performance of Intel Optane PMEM, and then investigates both indexing and searching performance of Elasticsearch, a widely-known enterprise big data search platform, on the PMEM-based computing system to explore its effectiveness and possibility. Extensive and comprehensive experiments shows that the proposed Optane PMEM-based Elasticsearch achieves indexing and searching performance improvement by an average of 1.45 times and 3.2 times respectively compared to DRAM-based system. Consequently, this paper demonstrates the high I/O, high-capacity, and nonvolatile PMEM-based computing systems are very promising for big data search platforms.

  • PDF

An Efficient Implementation of MPI over VMMC for Myrinet (Myrinet 상에서 VMMC를 기반으로 하는 효율적인 MPI 구현)

  • Kim, Ho-Joong;Maeng, Seung-Ryoul
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.5
    • /
    • pp.539-547
    • /
    • 2001
  • Cluster systems employ high speed interconnection networks and use efficient communication layers to gain high performance and scalability. But the diversity in implementation mechanism among these communication layers causes lack of portability. A solution is to provide communication standard APIs such as MPI. This paper introduces MPI-VMMC: an MPI implementation on VMMC. Though the direct deposit transfer mechanism used in VMMC is not suitable for Send/Recv mechanism used in MPI, the proposed sub-layer laid between MPI and VMMC efficiently translates from one mechanism to the other. We also use the lazy pointer and selective zero-copy transfer technique to gain high performance. The peak performance of MPI-VMMC is 90.7Mbytes/sec, which is about 95% of the base communication layer\`s.

  • PDF

Economic Impact of HEMOS-Cloud Services for M&S Support (M&S 지원을 위한 HEMOS-Cloud 서비스의 경제적 효과)

  • Jung, Dae Yong;Seo, Dong Woo;Hwang, Jae Soon;Park, Sung Uk;Kim, Myung Il
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.10
    • /
    • pp.261-268
    • /
    • 2021
  • Cloud computing is a computing paradigm in which users can utilize computing resources in a pay-as-you-go manner. In a cloud system, resources can be dynamically scaled up and down to the user's on-demand so that the total cost of ownership can be reduced. The Modeling and Simulation (M&S) technology is a renowned simulation-based method to obtain engineering analysis and results through CAE software without actual experimental action. In general, M&S technology is utilized in Finite Element Analysis (FEA), Computational Fluid Dynamics (CFD), Multibody dynamics (MBD), and optimization fields. The work procedure through M&S is divided into pre-processing, analysis, and post-processing steps. The pre/post-processing are GPU-intensive job that consists of 3D modeling jobs via CAE software, whereas analysis is CPU or GPU intensive. Because a general-purpose desktop needs plenty of time to analyze complicated 3D models, CAE software requires a high-end CPU and GPU-based workstation that can work fluently. In other words, for executing M&S, it is absolutely required to utilize high-performance computing resources. To mitigate the cost issue from equipping such tremendous computing resources, we propose HEMOS-Cloud service, an integrated cloud and cluster computing environment. The HEMOS-Cloud service provides CAE software and computing resources to users who want to experience M&S in business sectors or academics. In this paper, the economic ripple effect of HEMOS-Cloud service was analyzed by using industry-related analysis. The estimated results of using the experts-guided coefficients are the production inducement effect of KRW 7.4 billion, the value-added effect of KRW 4.1 billion, and the employment-inducing effect of 50 persons per KRW 1 billion.