• Title/Summary/Keyword: Parallel Computing(병렬컴퓨팅)

Search Result 229, Processing Time 0.027 seconds

A Methodology for Performance Modeling and Prediction of Large-Scale Cluster Servers (대규모 클러스터 서버의 성능 모델링 및 예측 방법론)

  • Jang, Hye-Churn;Jin, Hyun-Wook;Kim, Hag-Young
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.11
    • /
    • pp.1041-1045
    • /
    • 2010
  • Clusters can provide scalable and flexible architectures for parallel computing servers and data centers. Their performance prediction has been a very challenging issue. Existing performance measurement methodologies are able to measure the performance of servers already constructed. Thus they cannot provide a way to predict the overall system performance in advance when designing the system at the initial phase or adding more nodes for more capacity. Therefore, the performance modeling and prediction methodology for large-scale clusters is highly required. In this paper, we suggest a methodology to predict the performance of large-scale clusters, which consists of measurement, modeling and prediction steps. We apply the methodology to a real cluster server and show its usefulness.

Evaluation of Alignment Methods for Genomic Analysis in HPC Environment (HPC 환경의 대용량 유전체 분석을 위한 염기서열정렬 성능평가)

  • Lim, Myungeun;Jung, Ho-Youl;Kim, Minho;Choi, Jae-Hun;Park, Soojun;Choi, Wan;Lee, Kyu-Chul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.107-112
    • /
    • 2013
  • With the progress of NGS technologies, large genome data have been exploded recently. To analyze such data effectively, the assistance of HPC technique is necessary. In this paper, we organized a genome analysis pipeline to call SNP from NGS data. To organize the pipeline efficiently under HPC environment, we analyzed the CPU utilization pattern of each pipeline steps. We found that sequence alignment is computing centric and suitable for parallelization. We also analyzed the performance of parallel open source alignment tools and found that alignment method utilizing many-core processor can improve the performance of genome analysis pipeline.

A Benchmark of Micro Parallel Computing Technology for Real-time Control in Smart Farm (MPICH vs OpenMP) (제목을스마트 시설환경 실시간 제어를 위한 마이크로 병렬 컴퓨팅 기술 분석)

  • Min, Jae-Ki;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.161-161
    • /
    • 2017
  • 스마트 시설환경의 제어 요소는 난방기, 창 개폐, 수분/양액 밸브 개폐, 환풍기, 제습기 등 직접적으로 시설환경의 조절에 관여하는 인자와 정보 교환을 위한 통신, 사용자 인터페이스 등 간접적으로 제어에 관련된 요소들이 복합적으로 존재한다. PID 제어와 같이 하는 수학적 논리를 바탕으로 한 제어와 전문 관리자의 지식을 기반으로 한 비선형 학습 모델에 의한 제어 등이 공존할 수 있다. 이러한 다양한 요소들을 복합적으로 연동시키기 위해선 기존의 시퀀스 기반 제어 방식에는 한계가 있을 수 있다. 관행의 방식과 같이 시계열 상에서 획득한 충분한 데이터를 이용하여 제어의 양과 시점을 결정하는 방식은 예외 상황에 충분히 대처하기 어려운 단점이 있을 수 있다. 이러한 예외 상황은 자연적인 조건의 변화에 따라 불가피하게 발생하는 경우와 시스템의 오류에 기인하는 경우로 나뉠 수 있다. 본 연구에서는 실시간으로 변하는 시설환경 내의 다양한 환경요소를 실시간으로 분석하고 상응하는 제어를 수행하여 수학적이며 예측 가능한 논리에 의해 준비된 제어시스템을 보완할 방법을 연구하였다. 과거의 고성능 컴퓨팅(HPC; High Performance Computing)은 다수의 컴퓨터를 고속 네트워크로 연동하여 집적적으로 연산능력을 향상시킨 기술로 비용과 규모의 측면에서 많은 투자를 필요로 하는 첨단 고급 기술이었다. 핸드폰과 모바일 장비의 발달로 인해 소형 마이크로프로세서가 발달하여 근래 2 Ghz의 클럭 속도에 이르는 어플리케이션 프로세서(AP: Application Processor)가 등장하기도 하였다. 상대적으로 낮은 성능에도 불구하고 저전력 소모와 플랫폼의 소형화를 장점으로 한 AP를 시설환경의 실시간 제어에 응용하기 위한 방안을 연구하였다. CPU의 클럭, 메모리의 양, 코어의 수량을 다음과 같이 달리한 3가지 시스템을 비교하여 AP를 이용한 마이크로 클러스터링 기술의 성능을 비교하였다.1) 1.5 Ghz, 8 Processors, 32 Cores, 1GByte/Processor, 32Bit Linux(ARMv71). 2) 2.0 Ghz, 4 Processors, 32 Cores, 2GByte/Processor, 32Bit Linux(ARMv71). 3) 1.5 Ghz, 8 Processors, 32 Cores, 2GByte/Processor, 64Bit Linux(Arch64). 병렬 컴퓨팅을 위한 개발 라이브러리로 MPICH(www.mpich.org)와 Open-MP(www.openmp.org)를 이용하였다. 2,500,000,000에 이르는 정수 중 소수를 구하는 연산에 소요된 시간은 1)17초, 2)13초, 3)3초 이었으며, $12800{\times}12800$ 크기의 행렬에 대한 2차원 FFT 연산 소요시간은 각각 1)10초, 2)8초, 3)2초 이었다. 3번 경우는 클럭속도가 3Gh에 이르는 상용 데스크탑의 연산 속도보다 빠르다고 평가할 수 있다. 라이브러리의 따른 결과는 근사적으로 동일하였다. 선행 연구에서 획득한 3차원 계측 데이터를 1초 단위로 3차원 선형 보간법을 수행한 경우 코어의 수를 4개 이하로 한 경우 근소한 차이로 동일한 결과를 보였으나, 코어의 수를 8개 이상으로 한 경우 앞선 결과와 유사한 경향을 보였다. 현장 보급 가능성, 구축비용 및 전력 소모 등을 종합적으로 고려한 AP 활용 마이크로 클러스터링 기술을 지속적으로 연구할 것이다.

  • PDF

Design and Implementation of a Scalable Framework for Parallel Program Performance Visualization (병렬 프로그램 성능가시화를 위한 확장성 있는 프레임워크 설계 및 구현)

  • Moon, Sang-Su;Moon, Young-Shik;Kim, Jung-Sun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.2
    • /
    • pp.109-120
    • /
    • 2001
  • In this paper, we propose the design and implementation of a portable, extensible, and efficient performance visualization framework for high performance parallel program development. The framework adopts a layered architecture:consists of three independent layers instrumentation layer, trace interface layer and visualization layer. The instrumentation layer was constructed as an ECL which captures generated events, and the EDL/JPAL constitutes the trace interface layer to provide problem-oriented interfaces between visualization layer and instrumentation layer. Finally, the visualization layer was designed as plug-and-play style for easy elimination, addition and composition of various filters, views and view groups, The proposed performance visualization framework is expected to be used as an independent performance debugging and analysis tool and as a core component in an integrated parallel programming environment.

  • PDF

An Iterative Algorithm for the Bottom Up Computation of the Data Cube using MapReduce (맵리듀스를 이용한 데이터 큐브의 상향식 계산을 위한 반복적 알고리즘)

  • Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.4
    • /
    • pp.455-464
    • /
    • 2012
  • Due to the recent data explosion, methods which can meet the requirement of large data analysis has been studying. This paper proposes MRIterativeBUC algorithm which enables efficient computation of large data cube by distributed parallel processing with MapReduce framework. MRIterativeBUC algorithm is developed for efficient iterative operation of the BUC method with MapReduce, and overcomes the limitations about the storage size and processing ability caused by large data cube computation. It employs the idea from the iceberg cube which computes only the interesting aspect of analysts and the distributed parallel process of cube computation by partitioning and sorting. Thus, it reduces data emission so that it can reduce network overload, processing amount on each node, and eventually the cube computation cost. The bottom-up cube computation and iterative algorithm using MapReduce, proposed in this paper, can be expanded in various way, and will make full use of many applications.

Livestock Disease Forecasting and Smart Livestock Farm Integrated Control System based on Cloud Computing (클라우드 컴퓨팅기반 가축 질병 예찰 및 스마트 축사 통합 관제 시스템)

  • Jung, Ji-sung;Lee, Meong-hun;Park, Jong-kweon
    • Smart Media Journal
    • /
    • v.8 no.3
    • /
    • pp.88-94
    • /
    • 2019
  • Livestock disease is a very important issue in the livestock industry because if livestock disease is not responded quickly enough, its damage can be devastating. To solve the issues involving the occurrence of livestock disease, it is necessary to diagnose in advance the status of livestock disease and develop systematic and scientific livestock feeding technologies. However, there is a lack of domestic studies on such technologies in Korea. This paper, therefore, proposes Livestock Disease Forecasting and Livestock Farm Integrated Control System using Cloud Computing to quickly manage livestock disease. The proposed system collects a variety of livestock data from wireless sensor networks and application. Moreover, it saves and manages the data with the use of the column-oriented database Hadoop HBase, a column-oriented database management system. This provides livestock disease forecasting and livestock farm integrated controlling service through MapReduce Model-based parallel data processing. Lastly, it also provides REST-based web service so that users can receive the service on various platforms, such as PCs or mobile devices.

A Scheme on High-Performance Caching and High-Capacity File Transmission for Cloud Storage Optimization (클라우드 스토리지 최적화를 위한 고속 캐싱 및 대용량 파일 전송 기법)

  • Kim, Tae-Hun;Kim, Jung-Han;Eom, Young-Ik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.8C
    • /
    • pp.670-679
    • /
    • 2012
  • The recent dissemination of cloud computing makes the amount of data storage to be increased and the cost of storing the data grow rapidly. Accordingly, data and service requests from users also increases the load on the cloud storage. There have been many works that tries to provide low-cost and high-performance schemes on distributed file systems. However, most of them have some weaknesses on performing parallel and random data accesses as well as data accesses of frequent small workloads. Recently, improving the performance of distributed file system based on caching technology is getting much attention. In this paper, we propose a CHPC(Cloud storage High-Performance Caching) framework, providing parallel caching, distributed caching, and proxy caching in distributed file systems. This study compares the proposed framework with existing cloud systems in regard to the reduction of the server's disk I/O, prevention of the server-side bottleneck, deduplication of the page caches in each client, and improvement of overall IOPS. As a results, we show some optimization possibilities on the cloud storage systems based on some evaluations and comparisons with other conventional methods.

Parallelization and Performance Optimization of the Boyer-Moore Algorithm on GPU (Boyer-Moore 알고리즘을 위한 GPU상에서의 병렬 최적화)

  • Jeong, Yosang;Tran, Nhat-Phuong;Lee, Myungho;Nam, Dukyun;Kim, Jik-Soo;Hwang, Soonwook
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.138-143
    • /
    • 2015
  • The Boyer-Moore algorithm is a single pattern string matching algorithm that is widely used in various applications such as computer and internet security, and bioinformatics. This algorithm is computationally demanding and requires high-performance parallel processing. In this paper, we propose a parallelization and performance optimization methodology for the BM algorithm on a GPU. Our methodology adopts an algorithmic cascading technique. This results in significant reductions in the mapping overheads for the threads participating in the parallel string matching. It also results in the efficient utilization of the multithreading capability of the GPU which improves the load balancing among threads. Our experimental results show that this approach achieves a 45-times speedup at maximum, in comparison with a serial execution.

Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability (오디세우스/Parallel-OOSQL: 오디세우스 정보검색용 밀결합 DBMS를 사용한 병렬 정보 검색 엔진)

  • Ryu, Jae-Joon;Whang, Kyu-Young;Lee, Jae-Gil;Kwon, Hyuk-Yoon;Kim, Yi-Reun;Heo, Jun-Suk;Lee, Ki-Hoon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.4
    • /
    • pp.412-429
    • /
    • 2008
  • As the amount of electronic documents increases rapidly with the growth of the Internet, a parallel search engine capable of handling a large number of documents are becoming ever important. To implement a parallel search engine, we need to partition the inverted index and search through the partitioned index in parallel. There are two methods of partitioning the inverted index: 1) document-identifier based partitioning and 2) keyword-identifier based partitioning. However, each method alone has the following drawbacks. The former is convenient in inserting documents and has high throughput, but has poor performance for top h query processing. The latter has good performance for top-k query processing, but is inconvenient in inserting documents and has low throughput. In this paper, we propose a hybrid partitioning method to compensate for the drawback of each method. We design and implement a parallel search engine that supports the hybrid partitioning method using the Odysseus DBMS tightly coupled with information retrieval capability. We first introduce the architecture of the parallel search engine-Odysseus/parallel-OOSQL. We then show the effectiveness of the proposed system through systematic experiments. The experimental results show that the query processing time of the document-identifier based partitioning method is approximately inversely proportional to the number of blocks in the partition of the inverted index. The results also show that the keyword-identifier based partitioning method has good performance in top-k query processing. The proposed parallel search engine can be optimized for performance by customizing the methods of partitioning the inverted index according to the application environment. The Odysseus/parallel OOSQL parallel search engine is capable of indexing, storing, and querying 100 million web documents per node or tens of billions of web documents for the entire system.

A Methodolgy to Evaluate Program Tuning Alternatives (프로그램 성능조율 대안을 평가하는 방법론)

  • Eom, Hyeon-Sang
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.12 no.6
    • /
    • pp.349-357
    • /
    • 2006
  • We introduce a new performance evaluation methodology that helps programmers evaluate different tuning alternatives in order to improve program performance. This methodology permits measuring performance implications of using tuning alternatives. Specifically, the methodology predicts performance after workload migration for a distributed or parallel program in contrast to traditional performance methodlogies that quantify time spent in program components for bottleneck identification. The methodology thus provides guidance on workload migration. The methodology also permits predicting the performance impact of changing the underlying network. The methodology may evaluate performance incrementally and online during the execution of the program to be tuned. We show that our methodology, when it is implemented and used, permits accurately predicting the performance of different tuning alternatives for a test suite of six programs.