• Title/Summary/Keyword: parallel file system

Search Result 72, Processing Time 0.023 seconds

Table Comparison Prefetching Algorithm Adoped in Parallel File System (병렬 파일 시스템에 적용한 테이블 비교 선반입 기법)

  • Lee, Yoon-Young;Kim, Chei-Yol;Seo, Dae-Wha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.04a
    • /
    • pp.243-246
    • /
    • 2001
  • 과도한 파일 입출력이 요구되는 병렬파일 시스템의 성능을 결정하는 중요한 요소로서 캐슁과 선반입을 들 수 있다. 본 논문은 캐쉬의 크기에 비해 상대적으로 근 파일을 요청하는 경우에 시스템 성능에 막대한 영향을 미치는 선반입에 대해서 선반입할 데이터를 결정하는 알고리즘으로 테이블 비교법을 제안하고, 이와 더불어 예측된 데이터의 선반입 여부와 선반입 시기를 결정하는 경우 현재의 가용 입출력 대역폭을 고려하는 기법을 제안한다. 제안하는 선반입 알고리즘은 실제 병렬파일시스템에 구현되었으며, 파일시스템 성능향상에 크게 기여하였다.

  • PDF

A Novel Method of Improving Cache Hit-rate in Hadoop MapReduce using SSD Cache

  • Kim, Jong-Chan;An, Jae-Hoon;Kim, Young-Hwan;Jeon, Ki-Man
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.8
    • /
    • pp.1-6
    • /
    • 2015
  • The MapReduce Program of Hadoop Distributed File System operates on any unspecified nodes due to distributed-parallel process and block replicate for data stability. Since it is difficult to guarantee the cache locality when a Solid State Drive is used as a cache in hadoop, cache hit-rate is decreased. In this paper, we suggest a method to improve cache hit rate by pre-loading the input data of the MapReduce onto the SSD cache. To perform this method, we estimated the blocks that are used on each node by using capacity scheduler and block metadata. Eventually we could increase the performance of SSD cache by loading the blocks onto SSD cache before the Map Task run.

A Genetic-Based Optimization Model for Clustered Node Allocation System in a Distributed Environment (분산 환경에서 클러스터 노드 할당 시스템을 위한 유전자 기반 최적화 모델)

  • Park, Kyeong-mo
    • The KIPS Transactions:PartA
    • /
    • v.10A no.1
    • /
    • pp.15-24
    • /
    • 2003
  • In this paper, an optimization model for the clustered node allocation systems in the distributed computing environment is presented. In the presented model with a distributed file system framework, the dynamics of system behavior over times is carefully thought over the nodes and hence the functionality of the cluster monitor node to check the feasibility of the current set of clustered node allocation is given. The cluster monitor node of the node allocation system capable of distributing the parallel modules to clustered nodes provides a good allocation solution using Genetic Algorithms (GA). As a part of the experimental studies, the solution quality and computation time effects of varying GA experimental parameters, such as the encoding scheme, the genetic operators (crossover, mutations), the population size, and the number of node modules, and the comparative findings are presented.

An Implementation and Verification of Performance Monitor for Parallel Signal Processing System (병렬신호처리시스템을 위한 성능 모니터의 구현 및 검증)

  • Lee Won-Joo;Kim Hyo-Nam
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.313-322
    • /
    • 2005
  • In this paper, we implement and verify performance monitor for parallel signal processing system, using DSP Starter Kit(DSK) of which the basic Processor is TMS302C6711 chip. The key ideas of this performance monitor is, using Real Time Data Exchange(RTDX) for the Purpose of real-time data transfer and function of DSP/BIOS, the ability to measure the Performance measure like DSP workload, memory usage, and bridge traffic. In the simulation, FFT, 2D FFT, Matrix Multiplication, and Fir Filter, which are widely used DSP algorithms, have been employed. Using performance monitor and Code Composer Studio from Texas Instrument(Tl) , the result has been recorded according to different frequencies, data sizes, and buffer sizes for a single wave file. The accuracy of our performance monitor has been verified by comparing those recorded results.

  • PDF

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

An Efficient Disk Sharing Technique supporting Single Disk I/O Space in Linux Cluster Systems (리눅스 클러스터 시스템에서 단일 디스크 입출력 공간을 지원하는 효율적 디스크 공유 기법)

  • 김태호;이종우;이재원;김성동;채진석
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.6
    • /
    • pp.635-645
    • /
    • 2003
  • One of very important features that are necessarily supported by clustered parallel computer systems is a single I/O system image in which users can access both the local and remote I/O resources transparently. In this paper, we propose an efficient disk sharing technique supporting a single disk I/O system image architecture. The design separates the I/O subsystem of a cluster into the file system and a set of virtual hard disk drivers. The virtual hard disk driver deals with a hard disk in the remote node as a local hard disk. All services provided by it are performed in the device driver level without any modification of file systems. Users can, therefore, access all the disks in the cluster regardless of their locations. Our virtual hard disk driver is implemented under the linux, and also tested in a linux cluster system. We find by experiments that it can successfully support a single disk I/O space, and at the same time it shows better performance than NFS. We are sure that this paper can be a guideline for single I/O space of other devices to be easily constructed.

Parallelization of Genome Sequence Data Pre-Processing on Big Data and HPC Framework (빅데이터 및 고성능컴퓨팅 프레임워크를 활용한 유전체 데이터 전처리 과정의 병렬화)

  • Byun, Eun-Kyu;Kwak, Jae-Hyuck;Mun, Jihyeob
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.10
    • /
    • pp.231-238
    • /
    • 2019
  • Analyzing next-generation genome sequencing data in a conventional way using single server may take several tens of hours depending on the data size. However, in order to cope with emergency situations where the results need to be known within a few hours, it is required to improve the performance of a single genome analysis. In this paper, we propose a parallelized method for pre-processing genome sequence data which can reduce the analysis time by utilizing the big data technology and the highperformance computing cluster which is connected to the high-speed network and shares the parallel file system. For the reliability of analytical data, we have chosen a strategy to parallelize the existing analytical tools and algorithms to the new environment. Parallelized processing, data distribution, and parallel merging techniques have been developed and performance improvements have been confirmed through experiments.

Performance Analysis of Global Shared Filesystem for the PLSI (국가 슈퍼컴퓨팅 공동활용체제 구축을 위한 글로벌공유파일시스템 성능 분석)

  • Woo, Joon;Park, SeokJung;Lee, SangDong;Kim, HyongShik
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.509-512
    • /
    • 2007
  • The purpose of the PLSI(Patnership & Leadership for the national Supercomputing Infarastructure) is to maximize a utilization of public supercomputing resources by linking with each other. When someone performs a simulation and visualization of an application using it's resources on each sites, it needs to construct the infrastructure, so that afford to access the data globally. So, in this research, I implemented the global shared filesystem mutually to share remote filesystem's data between KISTI and Pusan National University's supercomputing center based on GPFS of parallel file system, and analyzed a performance of network and filesystem on 1Gbps WAN.

  • PDF

A Parallel Implementation of Purge Process for Lustre File System (Lustre 파일 시스템을 위한 Purge 기능의 병렬화 구현)

  • Kwon, Min-Woo;Yoon, Jun-Weon;Hong, Tae-Young;Park, Chan-Yeol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.64-65
    • /
    • 2016
  • 슈퍼컴퓨터는 대용량의 데이터를 효율적으로 관리하기 위해 Lustre 파일 시스템과 같은 고성능의 병렬 파일 시스템을 이용한다. 한국과학기술정보연구원의 슈퍼컴퓨터 4호기 Tachyon 2차 시스템과 같이 다수의 사용자가 접속하는 슈퍼컴퓨터는 사용자의 데이터가 한없이 누적됨으로 Lustre 파일 시스템의 성능이 저하되는 이슈가 있다. 본 논문에서는 사용자의 데이터가 누적되는 것을 방지하기 위해 장기간 사용하지 않는 데이터를 자동 삭제하는 기능인 Purge기능을 구현하였다. 특히, 기하급수적으로 늘어나는 병렬 파일 시스템의 용량에 대처하기 위해 병렬 컴퓨팅 기술을 이용해 고속 Purge 기능을 구현하였다. 단일 컴퓨팅 노드와 병렬 환경에서 구현한 결과를 비교하였을 때, 단일 컴퓨팅 노드에서는 1,517GB 용량을 지우는데 221.2초가 걸렸으며 16개의 컴퓨팅 노드를 이용한 병렬 환경에서는 49.9초가 걸렸다. 이 결과를 비교했을 때 단일 컴퓨팅 노드에서 구현한 결과 대비 병렬 환경에서 구현했을 때 약 4.4배의 성능향상을 얻을 수 있었다.

Research on Big Data Integration Method

  • Kim, Jee-Hyun;Cho, Young-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.49-56
    • /
    • 2017
  • In this paper we propose the approach for big data integration so as to analyze, visualize and predict the future of the trend of the market, and that is to get the integration data model using the R language which is the future of the statistics and the Hadoop which is a parallel processing for the data. As four approaching methods using R and Hadoop, ff package in R, R and Streaming as Hadoop utility, and Rhipe and RHadoop as R and Hadoop interface packages are used, and the strength and weakness of four methods are described and analyzed, so Rhipe and RHadoop are proposed as a complete set of data integration model. The integration of R, which is popular for processing statistical algorithm and Hadoop contains Distributed File System and resource management platform and can implement the MapReduce programming model gives us a new environment where in R code can be written and deployed in Hadoop without any data movement. This model allows us to predictive analysis with high performance and deep understand over the big data.