• Title/Summary/Keyword: data deduplication

Search Result 47, Processing Time 0.028 seconds

Performance Analysis of Open Source Based Distributed Deduplication File System (오픈 소스 기반 데이터 분산 중복제거 파일 시스템의 성능 분석)

  • Jung, Sung-Ouk;Choi, Hoon
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.623-631
    • /
    • 2014
  • Comparison of two representative deduplication file systems, LessFS and SDFS, shows that Lessfs is better in execution time and CPU utilization while SDFS is better in storage usage (around 1/8 less than general file systems). In this paper, a new system is proposed where the advantages of SDFS and Lessfs are combined. The new system uses multiple DFEs and one DSE to maintain the integrity and consistency of the data. An evaluation study to compare between Single DFE and Dual DFE indicates that the Dual DFE was better than the Single DFE. The Dual DFE reduced the CPU usage and provided fast deduplication time. This reveals that proposed system can be used to solve the problem of an increase in large data storage and power consumption.

Data Deduplication Method using Locality-based Chunking policy for SSD-based Server Storages (SSD 기반 서버급 스토리지를 위한 지역성 기반 청킹 정책을 이용한 데이터 중복 제거 기법)

  • Lee, Seung-Kyu;Kim, Ju-Kyeong;Kim, Deok-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.2
    • /
    • pp.143-151
    • /
    • 2013
  • NAND flash-based SSDs (Solid State Drive) have advantages of fast input/output performance and low power consumption so that they could be widely used as storages on tablet, desktop PC, smart-phone, and server. But, SSD has the disadvantage of wear-leveling due to increase of the number of writes. In order to improve the lifespan of the SSD, a variety of data deduplication techniques have been introduced. General fixed-size splitting method allocates fixed size of chunk without considering locality of data so that it may execute unnecessary chunking and hash key generation, and variable-size splitting method occurs excessive operation since it compares data byte-by-byte for deduplication. This paper proposes adaptive chunking method based on application locality and file name locality of written data in SSD-based server storage. The proposed method split data into 4KB or 64KB chunks adaptively according to application locality and file name locality of duplicated data so that it can reduce the overhead of chunking and hash key generation and prevent duplicated data writing. The experimental results show that the proposed method can enhance write performance, reduce power consumption and operation time compared to existing variable-size splitting method and fixed size splitting method using 4KB.

Improving the Lifetime of NAND Flash-based Storages by Min-hash Assisted Delta Compression Engine (MADE (Minhash-Assisted Delta Compression Engine) : 델타 압축 기반의 낸드 플래시 저장장치 내구성 향상 기법)

  • Kwon, Hyoukjun;Kim, Dohyun;Park, Jisung;Kim, Jihong
    • Journal of KIISE
    • /
    • v.42 no.9
    • /
    • pp.1078-1089
    • /
    • 2015
  • In this paper, we propose the Min-hash Assisted Delta-compression Engine(MADE) to improve the lifetime of NAND flash-based storages at the device level. MADE effectively reduces the write traffic to NAND flash through the use of a novel delta compression scheme. The delta compression performance was optimized by introducing min-hash based LSH(Locality Sensitive Hash) and efficiently combining it with our delta compression method. We also developed a delta encoding technique that has functionality equivalent to deduplication and lossless compression. The results of our experiment show that MADE reduces the amount of data written on NAND flash by up to 90%, which is better than a simple combination of deduplication and lossless compression schemes by 12% on average.

Performance Analysis and Improvement of WANProxy (WANProxy의 성능 분석 및 개선)

  • Kim, Haneul;Ji, Seungkyu;Chung, Kyusik
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.3
    • /
    • pp.45-58
    • /
    • 2020
  • In the current trend of increasing network traffic due to the popularization of cloud service and mobile devices, WAN bandwidth is very low compared to LAN bandwidth. In a WAN environment, a WAN optimizer is needed to overcome performance problems caused by transmission protocol, packet loss, and network bandwidth limitations. In this paper, we analyze the data deduplication algorithm of WANProxy, an open source WAN optimizer, and evaluate its performance in terms of network latency and WAN bandwidth. Also, we evaluate the performance of the two-stage compression method of WANProxy and Zstandard. We propose a new method to improve the performance of WANProxy by revising its data deduplication algorithm and evaluate its performance improvement. We perform experiments using 12 data files of Silesia with a data segment size of 2048 bytes. Experimental results show that the average compression rate by WANProxy is 150.6, and the average network latency reduction rates by WANProxy are 95.2% for a 10 Mbps WAN environment and 60.7% for a 100 Mbps WAN environment, respectively. Compared with WANProxy, the two-stage compression of WANProxy and Zstandard increases the average compression rate by 33%. However, it increases the average network latency by 2.1% for a 10 Mbps WAN environment and 5.27% for a 100 Mbps WAN environment, respectively. Compared with WANProxy, our proposed method increases the average compression rate by 34.8% and reduces the average network latency by 13.8% for a 10 Mbps WAN and 12.9% for a 100 Mbps WAN, respectively. Performance analysis results of WANProxy show that its performance improvement in terms of network latency and WAN bandwidth is excellent in a 10Mbps or less WAN environment while superior in a 100 Mbps WAN environment.

An analysis of Data Deduplication techniques (데이터 중복 제거 기술 분석)

  • Jho, Min-Jeong;Lee, Chang-hoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.305-308
    • /
    • 2016
  • 저장하는 데이터의 용량이 증가함에 따라 데이터들은 효율적으로 보관될 필요성이 증가하였다. 이에 따라, 데이터 용량을 줄이는 기술로 많은 서비스들이 데이터 중복 제거 기술을 사용한다. 본 연구에서는 일부 서비스의 데이터 중복 제거 기술을 분석하고, 데이터 중복 제거 기술의 발전 동향을 예측하고자한다.

Sanitization of Open-Source Based Deduplicated Filesystem (오픈 소스 중복 제거 파일시스템에서의 완전 삭제)

  • Cho, Hyeonwoong;Kim, SeulGi;Kwon, Taekyoung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.5
    • /
    • pp.1141-1149
    • /
    • 2016
  • Deduplicated filesystem can reduce usage of storage. However, it be able to recover deleted block. We studied sanitization of deduplicated filesystem, LessFS which is based on FUSE(Filesystem in USErspace). First, we show a vulnerability recover deleted data in the deduplicated filesystem. We implement sanitization of deduplicated filesystem considering the part of fingerprint DB with data blocks. It takes 60~70 times compared to without sanitization. Which means access time to fingerprint DB and overhead derived from increase of number of chunk have a critical impact on sanitization time. But in case of more than 65,536 Byte of chunksize, it is faster than normal filesystem without deduplication.

Priority-based Hint Management Scheme for Improving Page Sharing Opportunity of Virtual Machines (가상머신의 페이지 공유 기회를 향상시키기 위한 우선순위 큐 기반 힌트 관리 기법)

  • Nam, Yeji;Lee, Minho;Lee, Dongwoo;Eom, Young Ik
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.947-952
    • /
    • 2016
  • Most data centers attempt to consolidate servers using virtualization technology to efficiently utilize limited physical resources. Moreover, virtualized systems have commonly adopted contents-based page sharing mechanism for page deduplication among virtual machines (VMs). However, previous page sharing schemes are limited by the inability to effectively manage accumulated hints which mean sharable pages in stack. In this paper, we propose a priority-based hint management scheme to efficiently manage accumulated hints, which are sent from guest to host for improving page sharing opportunity in virtualized systems. Experimental results show that our scheme removes pages with low sharing potential, as compared with the previous schemes, by efficiently managing the accumulated pages.

Using Data Deduplication In A Cloud Environment, Efficient Data Synchronization Algorithm Design (클라우드 환경에서 데이터 중복제거를 활용한 효율적인 데이터 동기화 알고리즘 설계)

  • Lim, Kwang-Soo;Park, Suk-chun;Kim, Young-Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.626-628
    • /
    • 2015
  • 빅 데이터의 시대가 도래 하면서 데이터의 양이 기하급수적으로 증가 하고 있으며, 이에 따라 데이터를 효율적으로 처리하는 기술의 중요성이 부각 되고 있다. 데이터를 효율적으로 처리하기 위한 기술 중 하나인, 데이터 중복제거 기술은 저장 시스템 공간을 효율적으로 사용 할 수 있게 할 뿐만 아니라, 네트워크 환경에서 전송되는 데이터의 양도 획기적으로 줄여 주어 통신비용을 절감하게 한다. 기존의 데이터 중복제거 기술과 데이터 동기화 기법을 분석하고, 이를 바탕으로 클라우드 환경에서 데이터 중복제거를 통한 효율적인 데이터 동기화 기법을 제안하고자 한다.

Hybrid Data Deduplication Method for reducing wear-level of SSD (SSD의 마모도 감소를 위한 복합적 데이터 중복 제거 기법)

  • Lee, Seung-Kyu;Yang, Yu-Seok;Kim, Deok-Hwan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.543-546
    • /
    • 2011
  • SSD는 일반적으로 사용되는 HDD와는 달리 기계적 동작이 없는 반도체 메모리를 사용하여 데이터를 저장하는 장치이다. 플래시 기반의 SSD는 읽기 성능이 뛰어난 반면 덮어쓰기 연산이 안되는 단점이 있다. 즉 마모도가 존재하여 SSD의 수명에 영향을 준다. 하지만 HDD보다 뛰어난 성능 때문에 노트북이나 중요한 데이터 등을 다루는 시스템 등에서 많이 사용하고 있다. 본 논문에서는 이러한 SSD를 서버 스토리지로 사용할 때 기존의 데이터 중복 제거 기법의 장점만을 조합한 복합적 데이터 중복 제거 기술을 제안하고 이 기법이 SSD의 마모도 측면에서 훨씬 효율적임을 검증하였다.

A Safe and Efficient Secure Data Deduplication for Cloud Storage Environment (클라우드 스토리지 환경을 위한 안전하고 효율적인 암호데이터 중복제거 기술)

  • Kim, Won-Bin;Lee, Im-Yeong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.714-717
    • /
    • 2015
  • 기존의 암호데이터 중복제거 기술은 데이터의 중복 여부를 판단하기 위해 다양한 방식으로 데이터를 전송하고 이를 기존에 저장된 데이터와 비교하여 중복여부를 판단하게 된다. 이러한 데이터 중복제거 기술의 중복제거 효율성을 높이기 위해 최근 블록 단위의 중복제거 기술이 사용되고 있다. 하지만 블록 단위 중복제거 기술의 적용 과정에서 다양한 보안 위협이 발생하는데, 이 중 포이즌 어택은 무결성 및 데이터 저장 시 저장되는 데이터에 대한 검증이 이루어지지 않는 시스템에서 발생하는 위협 중 하나이다. 이러한 위협을 해결하기 위해 암호화 기술을 적용한 여러 기술들이 연구되어 제안되었지만 과도하게 많은 통신 횟수와 연산이 발생되어 효율성이 떨어지는 문제가 존재한다. 따라서 본 논문에서는 클라우드 스토리지에 저장되는 데이터의 기밀성과 무결성을 보장하며, 연산량과 통신량에서 보다 효율적인 암호데이터 중복제거 기술을 제안한다.