• Title/Summary/Keyword: File Cluster

Search Result 114, Processing Time 0.024 seconds

Delayed Block Replication Scheme of Hadoop Distributed File System for Flexible Management of Distributed Nodes (하둡 분산 파일시스템에서의 유연한 노드 관리를 위한 지연된 블록 복제 기법)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.2
    • /
    • pp.367-374
    • /
    • 2017
  • This paper discusses management problems of Hadoop distributed node, which is a platform for big data processing, and proposes a novel technique for enabling flexible node management of Hadoop Distributed File System. Hadoop cannot configure Hadoop cluster dynamically because it judges temporarily unavailable nodes as a failure. Delayed block replication scheme proposed in this paper delays the removal of unavailable node as much as possible so as to be easily rejoined. Experimental results show that the proposed scheme increases flexibility of node management with little impact on distributed processing performance when the cluster size changes.

Design and Implementation of a Metadata Structure for Large-Scale Shared-Disk File System (대용량 공유디스크 파일 시스템에 적합한 메타 데이타 구조의 설계 및 구현)

  • 이용주;김경배;신범주
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.1
    • /
    • pp.33-49
    • /
    • 2003
  • Recently, there have been large storage demands for manipulating multimedia data. To solve the tremendous storage demands, one of the major researches is the SAN(Storage Area Network) that provides the local file requests directly from shared-disk storage and also eliminates the server bottlenecks to performance and availability. SAN also improve the network latency and bandwidth through new channel interface like FC(Fibre Channel). But to manipulate the efficient storage network like SAN, traditional local file system and distributed file system are not adaptable and also are lack of researches in terms of a metadata structure for large-scale inode object such as file and directory. In this paper, we describe the architecture and design issues of our shared-disk file system and provide the efficient bitmap for providing the well-formed block allocation in each host, extent-based semi flat structure for storing large-scale file data, and two-phase directory structure of using Extendible Hashing. Also we describe a detailed algorithm for implementing the file system's device driver in Linux Kernel and compare our file system with the general file system like EXT2 and shard disk file system like GFS in terms of file creation, directory creation and I/O rate.

A Content-Aware toad Balancing Technique Based on Histogram Transformation in a Cluster Web Server (클러스터 웹 서버 상에서 히스토그램 변환을 이용한 내용 기반 부하 분산 기법)

  • Hong Gi Ho;Kwon Chun Ja;Choi Hwang Kyu
    • Journal of Internet Computing and Services
    • /
    • v.6 no.2
    • /
    • pp.69-84
    • /
    • 2005
  • As the Internet users are increasing rapidly, a cluster web server system is attracted by many researchers and Internet service providers. The cluster web server has been developed to efficiently support a larger number of users as well as to provide high scalable and available system. In order to provide the high performance in the cluster web server, efficient load distribution is important, and recently many content-aware request distribution techniques have been proposed. In this paper, we propose a new content-aware load balancing technique that can evenly distribute the workload to each node in the cluster web server. The proposed technique is based on the hash histogram transformation, in which each URL entry of the web log file is hashed, and the access frequency and file size are accumulated as a histogram. Each user request is assigned into a node by mapping of (hashed value-server node) in the histogram transformation. In the proposed technique, the histogram is updated periodically and then the even distribution of user requests can be maintained continuously. In addition to the load balancing, our technique can exploit the cache effect to improve the performance. The simulation results show that the performance of our technique is quite better than that of the traditional round-robin method and we can improve the performance more than $10\%$ compared with the existing workload-aware load balancing(WARD) method.

  • PDF

An Implementation and Evaluation of Large-Scale Dynamic Hashing Directories (대규모 동적 해싱 디렉토리의 구현 및 평가)

  • Kim, Shin-Woo;Lee, Yong-Kyu
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.7
    • /
    • pp.924-942
    • /
    • 2005
  • Recently, large-scale directories have been developed for LINUX cluster file systems to store and retrieve huge amount of data. One of them, GFS directory, has attracted much attention because it is based on extendible hashing, one of dynamic hashing techniques, to support fast access to files. One distinctive feature of the GFS directory is the flat structure where all the leaf nodes are located at the same level of the tree. Hut one disadvantage of the mode structure is that the height of the mode tree has to be increased to make the tree flat after a byte is inserted to a full tree which cannot accommodate it. Thus, one byte addition makes the height of the whole mode tree grow, and each data block of the new tree needs one more link access than the old one. Another dynamic hashing technique which can be used for directories is linear hashing and a couple of researches have shown that it can get better performance at file access times than extendible hashing. [n this research, we have designed and implemented an extendible hashing directory and a linear hashing directory for large-scale LINUX cluster file systems and have compared performance between them. We have used the semi-flat structure which is known to have better access performance than the flat structure. According to the results of the performance evaluation, the linear hashing directory has shown slightly better performance at file inserts and accesses in most cases, whereas the extendible hashing directory is somewhat better at space utilization.

  • PDF

A Non-Shared Metadata Management Scheme for Large Distributed File Systems (대용량 분산파일시스템을 위한 비공유 메타데이타 관리 기법)

  • Yun, Jong-Byeon;Park, Yang-Bun;Lee, Seok-Jae;Jang, Su-Min;Yoo, Jae-Soo;Kim, Hong-Yeon;Kim, Young-Kyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.4
    • /
    • pp.259-273
    • /
    • 2009
  • Most of large-scale distributed file systems decouple a metadata operation from read and write operations for a file. In the distributed file systems, a certain server named a metadata server (MDS) maintains metadata information in file system such as access information for a file, the position of a file in the repository, the namespace of the file system, and so on. But, the existing systems used restrictive metadata management schemes, because most of the distributed file systems designed to focus on the distributed management and the input/output performance of data rather than the metadata. Therefore, in the existing systems, the metadata throughput and expandability of the metadata server are limited. In this paper, we propose a new non-shared metadata management scheme in order to provide the high metadata throughput and scalability for a cluster of MDSs. First, we derive a dictionary partitioning scheme as a new metadata distribution technique. Then, we present a load balancing technique based on the distribution technique. It is shown through various experiments that our scheme outperforms existing metadata management schemes in terms of scalability and load balancing.

Distributed File Systems Architectures of the Large Data for Cloud Data Services (클라우드 데이터 서비스를 위한 대용량 데이터 처리 분산 파일 아키텍처 설계)

  • Lee, Byoung-Yup;Park, Jun-Ho;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.2
    • /
    • pp.30-39
    • /
    • 2012
  • In these day, some of IT venders already were going to cloud computing market, as well they are going to expand their territory for the cloud computing market through that based on their hardware and software technology, making collaboration between hardware and software vender. Distributed file system is very mainly technology for the cloud computing that must be protect performance and safety for high levels service requests as well data store. This paper introduced distributed file system for cloud computing and how to use this theory such as memory database, Hadoop file system, high availability database system. now In the market, this paper define a very large distributed processing architect as a reference by kind of distributed file systems through using technology in cloud computing market.

Sim-Hadoop : Leveraging Hadoop Distributed File System and Parallel I/O for Reliable and Efficient N-body Simulations (Sim-Hadoop : 신뢰성 있고 효율적인 N-body 시뮬레이션을 위한 Hadoop 분산 파일 시스템과 병렬 I / O)

  • Awan, Ammar Ahmad;Lee, Sungyoung;Chung, Tae Choong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.476-477
    • /
    • 2013
  • Gadget-2 is a scientific simulation code has been used for many different types of simulations like, Colliding Galaxies, Cluster Formation and the popular Millennium Simulation. The code is parallelized with Message Passing Interface (MPI) and is written in C language. There is also a Java adaptation of the original code written using MPJ Express called Java Gadget. Java Gadget writes a lot of checkpoint data which may or may not use the HDF-5 file format. Since, HDF-5 is MPI-IO compliant, we can use our MPJ-IO library to perform parallel reading and writing of the checkpoint files and improve I/O performance. Additionally, to add reliability to the code execution, we propose the usage of Hadoop Distributed File System (HDFS) for writing the intermediate (checkpoint files) and final data (output files). The current code writes and reads the input, output and checkpoint files sequentially which can easily become bottleneck for large scale simulations. In this paper, we propose Sim-Hadoop, a framework to leverage HDFS and MPJ-IO for improving the I/O performance of Java Gadget code.

GPU-Accelerated Password Cracking of PDF Files

  • Kim, Keon-Woo;Lee, Sang-Su;Hong, Do-Won;Ryou, Jae-Cheol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.11
    • /
    • pp.2235-2253
    • /
    • 2011
  • Digital document file such as Adobe Acrobat or MS-Office is encrypted by its own ciphering algorithm with a user password. When this password is not known to a user or a forensic inspector, it is necessary to recover the password to open the encrypted file. Password cracking by brute-force search is a perfect approach to discover the password but a time consuming process. This paper presents a new method of speeding up password recovery on Graphic Processing Unit (GPU) using a Compute Unified Device Architecture (CUDA). PDF files are chosen as a password cracking target, and the Abode Acrobat password recovery algorithm is examined. Experimental results show that the proposed method gives high performance at low cost, with a cluster of GPU nodes significantly speeding up the password recovery by exploiting a number of computing nodes. Password cracking performance is increased linearly in proportion to the number of computing nodes and GPUs.

Performance Improvement of Cluster File System Regarding Usage Frequency of File Resources (파일 자원 사용 빈도를 고려한 클러스터 파일시스템의 성능 향상)

  • Choi, Chang-Yeol;Chung, Ji-Yung;Kim, Sung-Soo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10b
    • /
    • pp.949-952
    • /
    • 2000
  • 공유 저장장치에 접근할 수 있는 경로를 다중화함으로써 클러스터 파일시스템 내 한 노드에서 발생하는 결함을 허용할 수 있다. 공유 저장장치의 다중접근은 파일 이동과 파일 중복으로 이루어진다. 본 논문에서는 전역 파일 시스템의 평균 응답 시간을 줄이기 위해 파일 자원의 접근률에 따라 우선 순위를 두어 파일이동과 과일 중복 시 발생할 수 있는 클러스터 파일 시스템의 성능 저하 및 서비스 중단을 막기 위한 방법을 제안한다. 파일 자원의 접근률은 파일 처리 시에 사용되는 블록, 아이노드(mode), 슈퍼블록과 같은 자원 사용 빈도를 통해 얻어질 수 있다.

  • PDF

An Efficient Disk Sharing Technique supporting Single Disk I/O Space in Linux Cluster Systems (리눅스 클러스터 시스템에서 단일 디스크 입출력 공간을 지원하는 효율적 디스크 공유 기법)

  • 김태호;이종우;이재원;김성동;채진석
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.6
    • /
    • pp.635-645
    • /
    • 2003
  • One of very important features that are necessarily supported by clustered parallel computer systems is a single I/O system image in which users can access both the local and remote I/O resources transparently. In this paper, we propose an efficient disk sharing technique supporting a single disk I/O system image architecture. The design separates the I/O subsystem of a cluster into the file system and a set of virtual hard disk drivers. The virtual hard disk driver deals with a hard disk in the remote node as a local hard disk. All services provided by it are performed in the device driver level without any modification of file systems. Users can, therefore, access all the disks in the cluster regardless of their locations. Our virtual hard disk driver is implemented under the linux, and also tested in a linux cluster system. We find by experiments that it can successfully support a single disk I/O space, and at the same time it shows better performance than NFS. We are sure that this paper can be a guideline for single I/O space of other devices to be easily constructed.