• Title/Summary/Keyword: data de-duplication

Search Result 18, Processing Time 0.028 seconds

Data De-duplication and Recycling Technique in SSD-based Storage System for Increasing De-duplication Rate and I/O Performance (SSD 기반 스토리지 시스템에서 중복률과 입출력 성능 향상을 위한 데이터 중복제거 및 재활용 기법)

  • Kim, Ju-Kyeong;Lee, Seung-Kyu;Kim, Deok-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.12
    • /
    • pp.149-155
    • /
    • 2012
  • SSD is a storage device of having high-performance controller and cache buffer and consists of many NAND flash memories. Because NAND flash memory does not support in-place update, valid pages are invalidated when update and erase operations are issued in file system and then invalid pages are completely deleted via garbage collection. However, garbage collection performs many erase operations of long latency and then it reduces I/O performance and increases wear leveling in SSD. In this paper, we propose a new method of de-duplicating valid data and recycling invalid data. The method de-duplicates valid data and then recycles invalid data so that it improves de-duplication ratio. Due to reducing number of writes and garbage collection, the method could increase I/O performance and decrease wear leveling in SSD. Experimental result shows that it can reduce maximum 20% number of garbage collections and 9% I/O latency than those of general case.

Design and Implementation of SANique Smart Vault Backup System for Massive Data Services (대용량 데이터 서비스를 위한 SANique Smart Vault 백업 시스템의 설계 및 구현)

  • Lee, Kyu Woong
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.2
    • /
    • pp.97-106
    • /
    • 2014
  • There is a lot of interest in the data storage and backup systems according to increasing the data intensive services and related user's data. The overhead of backup performance in massive storage system is a critical issue because the traditional incremental backup strategies causes the time consuming bottleneck in the SAN environment. The SANique Smart Vault system is a high performance backup solution with data de-duplication technology and it guarantees these requirements. In this paper, we describe the architecture of SANique Smart Vault system and illustrate efficient delta incremental backup method based on journaling files. We also present the record-level data de-duplication method in our proposed backup system. The proposed forever incremental backup and data de-duplication algorithms are analyzed and investigated by performance evaluation of other commercial backup solutions.

  • PDF

De-Duplication Performance Test for Massive Data (대용량 데이터의 중복제거(De-Duplication) 성능 실험)

  • Lee, Choelmin;Kim, Jai-Hoon;Kim, Young Gyu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.271-273
    • /
    • 2012
  • 중복 제거(De-duplication) 여러 데이터를 저장한 스토리지에서 같은 내용을 담고 있는 파일자체나 블록단위의 chunk 등을 찾아 중복된 내용을 제거하여 중복된 부분은 하나의 데이터 단위를 유지함으로써 스토리지 공간을 절약할 수 있다. 본 논문에서는 실험적인 데이터가 아닌 실제 업무 환경에서 적용될만한 대용량의 데이터 백업을 가정한 상황에 대해 중복 제거 기법을 테스트해봄으로써 중복제거율과 성능을 측정하였으며 이를 시각적으로 표현하는 방법을 제안함으로써 평가자 및 사용자가 알아보기 쉽게 하였다.

Protection of a Multicast Connection Request in an Elastic Optical Network Using Shared Protection

  • BODJRE, Aka Hugues Felix;ADEPO, Joel;COULIBALY, Adama;BABRI, Michel
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.119-124
    • /
    • 2021
  • Elastic Optical Networks (EONs) allow to solve the high demand for bandwidth due to the increase in the number of internet users and the explosion of multicast applications. To support multicast applications, network operator computes a tree-shaped path, which is a set of optical channels. Generally, the demand for bandwidth on an optical channel is enormous so that, if there is a single fiber failure, it could cause a serious interruption in data transmission and a huge loss of data. To avoid serious interruption in data transmission, the tree-shaped path of a multicast connection may be protected. Several works have been proposed methods to do this. But these works may cause the duplication of some resources after recovery due to a link failure. Therefore, this duplication can lead to inefficient use of network resources. Our work consists to propose a method of protection that eliminates the link that causes duplication so that, the final backup path structure after link failure is a tree. Evaluations and analyses have shown that our method uses less backup resources than methods for protection of a multicast connection.

Storage System Performance Enhancement Using Duplicated Data Management Scheme (중복 데이터 관리 기법을 통한 저장 시스템 성능 개선)

  • Jung, Ho-Min;Ko, Young-Woong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.1
    • /
    • pp.8-18
    • /
    • 2010
  • Traditional storage server suffers from duplicated data blocks which cause an waste of storage space and network bandwidth. To address this problem, various de-duplication mechanisms are proposed. Especially, lots of works are limited to backup server that exploits Contents-Defined Chunking (CDC). In backup server, duplicated blocks can be easily traced by using Anchor, therefore CDC scheme is widely used for backup server. In this paper, we propose a new de-duplication mechanism for improving a storage system. We focus on efficient algorithm for supporting general purpose de-duplication server including backup server, P2P server, and FTP server. The key idea is to adapt stride scheme on traditional fixed block duplication checking mechanism. Experimental result shows that the proposed mechanism can minimize computation time for detecting duplicated region of blocks and efficiently manage storage systems.

A Clustering File Backup Server Using Multi-level De-duplication (다단계 중복 제거 기법을 이용한 클러스터 기반 파일 백업 서버)

  • Ko, Young-Woong;Jung, Ho-Min;Kim, Jin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.7
    • /
    • pp.657-668
    • /
    • 2008
  • Traditional off-the-shelf file server has several potential drawbacks to store data blocks. A first drawback is a lack of practical de-duplication consideration for storing data blocks, which leads to worse storage capacity waste. Second drawback is the requirement for high performance computer system for processing large data blocks. To address these problems, this paper proposes a clustering backup system that exploits file fingerprinting mechanism for block-level de-duplication. Our approach differs from the traditional file server systems in two ways. First, we avoid the data redundancy by multi-level file fingerprints technology which enables us to use storage capacity efficiently. Second, we applied a cluster technology to I/O subsystem, which effectively reduces data I/O time and network bandwidth usage. Experimental results show that the requirement for storage capacity and the I/O performance is noticeably improved.

Data Deduplication Method using PRAM Cache in SSD Storage System (SSD 스토리지 시스템에서 PRAM 캐시를 이용한 데이터 중복제거 기법)

  • Kim, Ju-Kyeong;Lee, Seung-Kyu;Kim, Deok-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.4
    • /
    • pp.117-123
    • /
    • 2013
  • In the recent cloud storage environment, the amount of SSD (Solid-State Drive) replacing with the traditional hard disk drive is increasing. Management of SSD for its space efficiency has become important since SSD provides fast IO performance due to no mechanical movement whereas it has wearable characteristics and does not provide in place update. In order to manage space efficiency of SSD, data de-duplication technique is frequently used. However, this technique occurs much overhead because it consists of data chunking, hasing and hash matching operations. In this paper, we propose new data de-duplication method using PRAM cache. The proposed method uses hierarchical hash tables and LRU(Least Recently Used) for data replacement in PRAM. First hash table in DRAM is used to store hash values of data cached in the PRAM and second hash table in PRAM is used to store hash values of data in SSD storage. The method also enhance data reliability against power failure by maintaining backup of first hash table into PRAM. Experimental results show that average writing frequency and operation time of the proposed method are 44.2% and 38.8% less than those of existing data de-depulication method, respectively, when three workloads are used.

A Lightweight HL7 Message Strategy for Real-Time ECG Monitoring (실시간 심전도 모니터링을 위한 HL7 메시지 간소화 전략)

  • Lee, Kuyeon;Kang, Kyungtae;Lee, Jaemyoun;Park, Juyoung
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.3
    • /
    • pp.183-191
    • /
    • 2015
  • Recent developments in IT have made real-time ECG monitoring possible, and this represents a promising application for the emerging HL7 standard for the exchange of clinical information. However, applying the HL7 standard directly to real-time ECG monitoring causes problems, because the partial duplication of data within an HL7 message increases the amount of data to be transmitted, and the time taken to process it. We reduce these overheads by Feature Scaling, by standardizing the range of independent variables or features of data, while nevertheless generating HL7-compliant messages. We also use a De-Duplication algorithm to eliminate the partial repetition of the OBX field in an HL7 ORU message. Our strategy shortens the time required to create messages by 51%, and reduces the size of messages by 1/8, compared to naive HL7 coding.

A Study on the "Kor-T", a Modified Tapered h-index, by Applying the Ranking According to the Number of Citations of Journals in Evaluating Korean Journals (학술지의 피인용횟수 순위를 적용한 tapered h-지수의 변형지표 "Kor-hT"에 관한 연구)

  • Ko, Young Man;Cho, Soo-Ryun;Park, Ji Young
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.4
    • /
    • pp.111-131
    • /
    • 2013
  • This study describes the meaning of and the formula for Kor-$h_T$, which is a modified index built on the tapered h-index by applying 'the ranking according to the number of citations of journals'. This study evaluated the de-duplication rate of index values of Kor-$h_T$ and analyzed the change in the correlation between the index values and evaluation elements using the Korea Citation Index data from 2008 to 2010. Kor-$h_T$ is compared with h-index, tapered h-index, and IF. As a result, Kor-$h_T$ appeared to be superior to other indexes on de-duplication rate. It is also shown that there is a very strong positive correlation between the evaluation elements, the number of citations and the number of articles of journals, and the index values of Kor-$h_T$.

De-duplication of Parity Disk in SSD-Based RAID System (SSD 기반의 RAID 시스템에서 패리티 디스크의 중복 제거)

  • Yang, Yu-Seok;Lee, Seung-Kyu;Kim, Deok-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.1
    • /
    • pp.105-113
    • /
    • 2013
  • RAID systems have been widely used by connecting several disks in parallel structure. to resolve the delay and bottleneck of data I/O. Recently, SSD based RAID systems are emerging since SSDs have better I/O performance than HDD. However, endurance and power consumption problems due to frequent write operation in SSD based RAID system should be resolved. In this paper, we propose a de-duplication method of parity disk in SSD based RAID system with expensive update cost. The proposed method segments chunk of parity data into small pieces and removes duplicate data, therefore, it can reduce wear-leveling and power consumption by decreasing write operation for duplicated parity data. Experimental results show that bit update rate of the proposed method is 16% in total disk, 31% in parity disk less than that of existing method in RAID-6 system using EVENODD erasure code, and the power consumption of the proposed method is 30% less than that of existing method. Besides the proposed method is 12% in total disk, 32% in parity disk less than that of existing method in RAID-5 system, and the power consumption of the proposed method is 36% less than that of existing method.