• Title/Summary/Keyword: distributed file system

Search Result 251, Processing Time 0.027 seconds

A study on high availability of the linux clustering web server (리눅스 클러스터링 웹 서버의 고가용성에 대한 연구)

  • 박지현;이상문;홍태화;김학배
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.88-88
    • /
    • 2000
  • As more and more critical commercial applications move on the Internet, providing highly available servers becomes increasingly important. One of the advantages of a clustered system is that it has hardware and software redundancy. High availability can be provided by detecting node or daemon failure and reconfiguring the system appropriately so that the workload can be taken over bi the remaining nodes in the cluster. This paper presents how to provide the guaranteeing high availability of clustering web server. The load balancer becomes a single failure point of the whole system. In order to prevent the failure of the load balancer, we setup a backup server using heartbeat, fake, mon, and checkpointing fault-tolerance method. For high availability of file servers in the cluster, we setup coda file system. Coda is a advanced network fault-tolerance distributed file system.

  • PDF

Dynamic Cluster Management of Hadoop Distributed Filesystem (하둡 분산 파일시스템의 동적 클러스터 관리 기법)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.435-437
    • /
    • 2016
  • Hadoop Distributed File System(HDFS) is a file system for distributed processing of big data by replicating data to distributed data nodes. HDFS cluster shows a great scalability up to thousands of nodes, but it assumes a exclusive node cluster with numerous nodes for the big data processing. Various operational-purpose worker systems used by office are hardly considered as a part of cluster. This paper discusses this problem and proposes a dynamic cluster management technique to increase storage capability and analytic performance of hadoop cluster. The propsed technique can add legacy systems to the cluster and can remove them from the cluster dynamically depending on their availability.

  • PDF

A Study on Security Improvement in Hadoop Distributed File System Based on Kerberos (Kerberos 기반 하둡 분산 파일 시스템의 안전성 향상방안)

  • Park, So Hyeon;Jeong, Ik Rae
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.5
    • /
    • pp.803-813
    • /
    • 2013
  • As the developments of smart devices and social network services, the amount of data has been exploding. The world is facing Big data era. For these reasons, the Big data processing technology which is a new technology that can handle such data has attracted much attention. One of the most representative technologies is Hadoop. Hadoop Distributed File System(HDFS) designed to run on commercial Linux server is an open source framework and can store many terabytes of data. The initial version of Hadoop did not consider security because it only focused on efficient Big data processing. As the number of users rapidly increases, a lot of sensitive data including personal information were stored on HDFS. So Hadoop announced a new version that introduces Kerberos and token system in 2009. However, this system is vulnerable to the replay attack, impersonation attack and other attacks. In this paper, we analyze these vulnerabilities of HDFS security and propose a new protocol which complements these vulnerabilities and maintains the performance of Hadoop.

P2P DICOM System using Multiagent Systems Communicating with XML Encoded ACL (XML 기반 ACL로 통신하는 멀티에이전트 시스템을 이용한 P2P DICOM 시스템)

  • Kwon, Gi-Beom;Kim, Il-Kon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.5
    • /
    • pp.598-606
    • /
    • 2002
  • We suggest a distributed communication and management methodology using PC to PC Query multicasting strategy for efficient management of medical images produced by DICOM(Digital Imaging and Communications in Medicine) Modalities. It is absolutely necessary to reduce strict degradation of PACS system due to large sire of medical images and their very high transport rates. DICOM PC to PC Component is composed of a Service Manager to execute requested queries, a Communication Manager to take charge of file transmission, and a DICOM Manager to manage stored data and system behavior Each Manager itself is a component to search for requested file by interaction or to transmit the file to other PCs. Distributed management and transformation of medical information based on PC to PC Query multicasting methodology will enhance performance of central server and network capacity, reducing overload on both. We organize three major components for system operation. Each component is implemented as Agent. Communication between agents uses XML encoded Agent Communication Language.

File Block Management for Energy-Efficient Distributed Storages (파일 분산 저장 시스템의 에너지 효율성 증대를 위한 파일 블록 관리 기술)

  • Suh, Min-Kook;Kim, Seong-Woo;Seo, Seung-Woo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.1
    • /
    • pp.97-104
    • /
    • 2016
  • Because of rapid growth of data size, the number of data storage has been increased. When using multiple data storages, a distribute file system is essential to insure the availability of data files. The power consumption is a major problem when using a distributed file system with many data storages. Previous works have aimed at reducing the energy consumption with efficient file block layout by changing some data servers into stand-by mode. The file block migration has not been seriously considered because migration causes large cost. But when we consider addition of a new data server or file, file block migration is needed. This paper formulates the minimization of data block migration as an ILP optimization problem and solves it using branch-and-bound method. Using this technique, we can maximize the number of stand-by data servers with the minimum number of file block movement. However, computation time of branch-and-bound method of an ILP optimization problem increases exponentially as the problem size grows. Therefore this paper also proposes a data block and data server grouping method to solve many small ILP problems.

A Scheme on High-Performance Caching and High-Capacity File Transmission for Cloud Storage Optimization (클라우드 스토리지 최적화를 위한 고속 캐싱 및 대용량 파일 전송 기법)

  • Kim, Tae-Hun;Kim, Jung-Han;Eom, Young-Ik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.8C
    • /
    • pp.670-679
    • /
    • 2012
  • The recent dissemination of cloud computing makes the amount of data storage to be increased and the cost of storing the data grow rapidly. Accordingly, data and service requests from users also increases the load on the cloud storage. There have been many works that tries to provide low-cost and high-performance schemes on distributed file systems. However, most of them have some weaknesses on performing parallel and random data accesses as well as data accesses of frequent small workloads. Recently, improving the performance of distributed file system based on caching technology is getting much attention. In this paper, we propose a CHPC(Cloud storage High-Performance Caching) framework, providing parallel caching, distributed caching, and proxy caching in distributed file systems. This study compares the proposed framework with existing cloud systems in regard to the reduction of the server's disk I/O, prevention of the server-side bottleneck, deduplication of the page caches in each client, and improvement of overall IOPS. As a results, we show some optimization possibilities on the cloud storage systems based on some evaluations and comparisons with other conventional methods.

Hybrid Channel Model in Parallel File System (병렬 파일 시스템에서의 하이브리드 채널 모델)

  • Lee, Yoon-Young;Hwangbo, Jun-Hyung;Seo, Dae-Wha
    • The KIPS Transactions:PartA
    • /
    • v.10A no.1
    • /
    • pp.25-34
    • /
    • 2003
  • Parallel file system solves I/O bottleneck to store a file distributedly and read it parallel exchanging messages among computers that is connected multiple computers with high speed networks. However, they do not consider the message characteristics and performances are decreased. Accordingly, the current study proposes the Hybrid Channel model (HCM) as a message-management method, whereby the messages of a parallel file system are classified by a message characteristic between control messages and file data blocks, and the communication channel is divided into a message channel and data channel. The message channel then transfers the control messages through TCP/IP with reliability, while the data channel that is implemented by Virtual Interface Architecture (VIA) transfers the file data blocks at high speed. In tests, the proposed parallel file system that is implemented by HCM exhibited a considerably improved performance.

A Mobile Agent Programming System for Efficient Distributed Applications (효율적 분산 응용을 위한 이동 에이전트 프로그래밍 시스템)

  • Jeong, Won-Ho;Kang, Mi-Yeon;Kim, Yun-Su
    • The KIPS Transactions:PartA
    • /
    • v.10A no.5
    • /
    • pp.439-452
    • /
    • 2003
  • Mobile agent is one of the good technologies for overcoming network load and latency in distributed applications, and it may be a promising way of base technology of distributed applications because of its high adaptability for various network environments. In this paper, a mobile agent programming system, called HUMAN, is designed and implemented efficient use in various distributed applications based on mobile agents. HUMAN supports such high level utilities as file searhing, addressing by groups of nodes, storing path information, storing search information, and thus it gives us high easiness in agent-based programming. And it provides various itinerary modes and flexible reply modes for easy adaptation to given network environment. It also provides a management server for registering and active agents. Thus it can be efficiently applied for such varous distributed applications as searching distributed information, remote control, and file sharing in networks. A simple electronic commerce system is designed is designed and implemented as a HUMAN based illustrative application.

A Dynamic Data Replica Deletion Strategy on HDFS using HMM (HMM을 이용한 HDFS 기반 동적 데이터 복제본 삭제 전략)

  • Seo, Young-Ho;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.07a
    • /
    • pp.241-244
    • /
    • 2014
  • 본 논문에서는 HDFS(Hadoop Distributed File System)에서 문제되고 있는 복제정책의 개선을 위해 HMM(Hidden Markov Model)을 이용한 동적 데이터 복제본 삭제 전략을 제안한다. HDFS는 대용량 데이터를 효과적으로 처리할 수 있는 분산 파일 시스템으로 높은 Fault-Tolerance를 제공하며, 데이터의 접근에 높은 처리량을 제공하여 대용량 데이터 집합을 갖는 응용 프로그램에 최적화 되어있는 장점을 가지고 있다. 하지만 HDFS 에서의 복제 메커니즘은 시스템의 안정성과 성능을 향상시키지만, 추가 블록 복제본이 많은 디스크 공간을 차지하여 유지보수 비용 또한 증가하게 된다. 본 논문에서는 HMM과 최상의 상태 순서를 찾는 알고리즘인 Viterbi Algorithm을 이용하여 불필요한 데이터 복제본을 탐색하고, 탐색된 복제본의 삭제를 통하여 HDFS의 디스크 공간과 유지보수 비용을 절약 할 수 있는 전략을 제안한다.

  • PDF

Sim-Hadoop : Leveraging Hadoop Distributed File System and Parallel I/O for Reliable and Efficient N-body Simulations (Sim-Hadoop : 신뢰성 있고 효율적인 N-body 시뮬레이션을 위한 Hadoop 분산 파일 시스템과 병렬 I / O)

  • Awan, Ammar Ahmad;Lee, Sungyoung;Chung, Tae Choong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.476-477
    • /
    • 2013
  • Gadget-2 is a scientific simulation code has been used for many different types of simulations like, Colliding Galaxies, Cluster Formation and the popular Millennium Simulation. The code is parallelized with Message Passing Interface (MPI) and is written in C language. There is also a Java adaptation of the original code written using MPJ Express called Java Gadget. Java Gadget writes a lot of checkpoint data which may or may not use the HDF-5 file format. Since, HDF-5 is MPI-IO compliant, we can use our MPJ-IO library to perform parallel reading and writing of the checkpoint files and improve I/O performance. Additionally, to add reliability to the code execution, we propose the usage of Hadoop Distributed File System (HDFS) for writing the intermediate (checkpoint files) and final data (output files). The current code writes and reads the input, output and checkpoint files sequentially which can easily become bottleneck for large scale simulations. In this paper, we propose Sim-Hadoop, a framework to leverage HDFS and MPJ-IO for improving the I/O performance of Java Gadget code.