• 제목/요약/키워드: Data deduplication

Search Result 47, Processing Time 0.022 seconds

Information Dispersal Algorithm and Proof of Ownership for Data Deduplication in Dispersed Storage Systems (분산 스토리지 시스템에서 데이터 중복제거를 위한 정보분산 알고리즘 및 소유권 증명 기법)

  • Shin, Youngjoo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.1
    • /
    • pp.155-164
    • /
    • 2015
  • Information dispersal algorithm guarantees high availability and confidentiality for data and is one of the useful solutions for faulty and untrusted dispersed storage systems such as cloud storages. As the amount of data stored in storage systems increases, data deduplication which allows to save IT resources is now being considered as the most promising technology. Hence, it is necessary to study on an information dispersal algorithm that supports data deduplication. In this paper, we propose an information dispersal algorithm and proof of ownership for client-side data deduplication in the dispersed storage systems. The proposed solutions allow to save the network bandwidth as well as the storage space while giving robust security guarantee against untrusted storage servers and malicious clients.

Distributed data deduplication technique using similarity based clustering and multi-layer bloom filter (SDS 환경의 유사도 기반 클러스터링 및 다중 계층 블룸필터를 활용한 분산 중복제거 기법)

  • Yoon, Dabin;Kim, Deok-Hwan
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.14 no.5
    • /
    • pp.60-70
    • /
    • 2018
  • A software defined storage (SDS) is being deployed in cloud environment to allow multiple users to virtualize physical servers, but a solution for optimizing space efficiency with limited physical resources is needed. In the conventional data deduplication system, it is difficult to deduplicate redundant data uploaded to distributed storages. In this paper, we propose a distributed deduplication method using similarity-based clustering and multi-layer bloom filter. Rabin hash is applied to determine the degree of similarity between virtual machine servers and cluster similar virtual machines. Therefore, it improves the performance compared to deduplication efficiency for individual storage nodes. In addition, a multi-layer bloom filter incorporated into the deduplication process to shorten processing time by reducing the number of the false positives. Experimental results show that the proposed method improves the deduplication ratio by 9% compared to deduplication method using IP address based clusters without any difference in processing time.

Privacy Preserving Source Based Deduplication In Cloud Storage (클라우드 스토리지 상에서의 프라이버시 보존형 소스기반 중복데이터 제거기술)

  • Park, Cheolhee;Hong, Dowon;Seo, Changho;Chang, Ku-Young
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.1
    • /
    • pp.123-132
    • /
    • 2015
  • In cloud storage, processing the duplicated data, namely deduplication, is necessary technology to save storage space. Users who store sensitive data in remote storage want data be encrypted. However Cloud storage server do not detect duplication of conventionally encrypted data. To solve this problem, Convergent Encryption has been proposed. But it inherently have weakness due to brute-force attack. On the other hand, to save storage space as well as save bandwidths, client-side deduplication have been applied. Recently, various client-side deduplication technology has been proposed. However, this propositions still cannot solve the security problem. In this paper, we suggest a secure source-based deduplication technology, which encrypt data to ensure the confidentiality of sensitive data and apply proofs of ownership protocol to control access to the data, from curious cloud server and malicious user.

Cloud Storage Security Deduplication Scheme Based on Dynamic Bloom Filter

  • Yan, Xi-ai;Shi, Wei-qi;Tian, Hua
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1265-1276
    • /
    • 2019
  • Data deduplication is a common method to improve cloud storage efficiency and save network communication bandwidth, but it also brings a series of problems such as privacy disclosure and dictionary attacks. This paper proposes a secure deduplication scheme for cloud storage based on Bloom filter, and dynamically extends the standard Bloom filter. A public dynamic Bloom filter array (PDBFA) is constructed, which improves the efficiency of ownership proof, realizes the fast detection of duplicate data blocks and reduces the false positive rate of the system. In addition, in the process of file encryption and upload, the convergent key is encrypted twice, which can effectively prevent violent dictionary attacks. The experimental results show that the PDBFA scheme has the characteristics of low computational overhead and low false positive rate.

Study of Efficient Algorithm for Deduplication of Complex Structure (복잡한 구조의 데이터 중복제거를 위한 효율적인 알고리즘 연구)

  • Lee, Hyeopgeon;Kim, Young-Woon;Kim, Ki-Young
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.1
    • /
    • pp.29-36
    • /
    • 2021
  • The amount of data generated has been growing exponentially, and the complexity of data has been increasing owing to the advancement of information technology (IT). Big data analysts and engineers have therefore been actively conducting research to minimize the analysis targets for faster processing and analysis of big data. Hadoop, which is widely used as a big data platform, provides various processing and analysis functions, including minimization of analysis targets through Hive, which is a subproject of Hadoop. However, Hive uses a vast amount of memory for data deduplication because it is implemented without considering the complexity of data. Therefore, an efficient algorithm has been proposed for data deduplication of complex structures. The performance evaluation results demonstrated that the proposed algorithm reduces the memory usage and data deduplication time by approximately 79% and 0.677%, respectively, compared to Hive. In the future, performance evaluation based on a large number of data nodes is required for a realistic verification of the proposed algorithm.

Analysis of Security Weakness on Secure Deduplication Schemes in Cloud Storage (클라우드 스토리지에서 안전한 중복 제거 기법들에 대한 보안 취약점 분석)

  • Park, Ji Sun;Shin, Sang Uk
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.909-916
    • /
    • 2018
  • Cloud storage services have many advantages. As a result, the amount of data stored in the storage of the cloud service provider is increasing rapidly. This increase in demand forces cloud storage providers to apply deduplication technology for efficient use of storages. However, deduplication technology has inherent security and privacy concerns. Several schemes have been proposed to solve these problems, but there are still some vulnerabilities to well-known attacks on deduplication techniques. In this paper, we examine some of the existing schemes and analyze their security weaknesses.

Parallel Rabin Fingerprinting on GPGPU for Efficient Data Deduplication (효율적인 데이터 중복제거를 위한 GPGPU 병렬 라빈 핑거프린팅)

  • Ma, Jeonghyeon;Park, Sejin;Park, Chanik
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.611-616
    • /
    • 2014
  • Rabin fingerprinting used for chunking requires the largest amount computation time in data deduplication, In this paper, therefore, we proposed parallel Rabin fingerprinting on GPGPU for efficient data deduplication. In addition, for efficient parallelism in Rabin fingerprinting, four issues are considered. Firstly, when dividing input data stream into data sections, we consider the data located near the boundaries between data sections to calculate Rabin fingerprint continuously. Secondly, we consider exploiting the characteristics of Rabin fingerprinting for efficient operation. Thirdly, we consider the chunk boundaries which can be changed compared to sequential Rabin fingerprinting when adapting parallel Rabin fingerprinting. Finally, we consider optimizing GPGPU memory access. Parallel Rabin fingerprinting on GPGPU shows 16 times and 5.3 times better performance compared to sequential Rabin fingerprinting on CPU and compared to parallel Rabin fingerprinting on CPU, respectively. These throughput improvement of Rabin fingerprinting can lead to total performance improvement of data deduplication.

Improving Efficiency of Encrypted Data Deduplication with SGX (SGX를 활용한 암호화된 데이터 중복제거의 효율성 개선)

  • Koo, Dongyoung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.8
    • /
    • pp.259-268
    • /
    • 2022
  • With prosperous usage of cloud services to improve management efficiency due to the explosive increase in data volume, various cryptographic techniques are being applied in order to preserve data privacy. In spite of the vast computing resources of cloud systems, decrease in storage efficiency caused by redundancy of data outsourced from multiple users acts as a factor that significantly reduces service efficiency. Among several approaches on privacy-preserving data deduplication over encrypted data, in this paper, the research results for improving efficiency of encrypted data deduplication using trusted execution environment (TEE) published in the recent USENIX ATC are analysed in terms of security and efficiency of the participating entities. We present a way to improve the stability of a key-managing server by integrating it with individual clients, resulting in secure deduplication without independent key servers. The experimental results show that the communication efficiency of the proposed approach can be improved by about 30% with the effect of a distributed key server while providing robust security guarantees as the same level of the previous research.

Provably-Secure Public Auditing with Deduplication

  • Kim, Dongmin;Jeong, Ik Rae
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.4
    • /
    • pp.2219-2236
    • /
    • 2017
  • With cloud storage services, users can handle an enormous amount of data in an efficient manner. However, due to the widespread popularization of cloud storage, users have raised concerns about the integrity of outsourced data, since they no longer possess the data locally. To address these concerns, many auditing schemes have been proposed that allow users to check the integrity of their outsourced data without retrieving it in full. Yuan and Yu proposed a public auditing scheme with a deduplication property where the cloud server does not store the duplicated data between users. In this paper, we analyze the weakness of the Yuan and Yu's scheme as well as present modifications which could improve the security of the scheme. We also define two types of adversaries and prove that our proposed scheme is secure against these adversaries under formal security models.

Design and Implementation of Multiple Filter Distributed Deduplication System Applying Cuckoo Filter Similarity (쿠쿠 필터 유사도를 적용한 다중 필터 분산 중복 제거 시스템 설계 및 구현)

  • Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.10
    • /
    • pp.1-8
    • /
    • 2020
  • The need for storage, management, and retrieval techniques for alternative data has emerged as technologies based on data generated from business activities conducted by enterprises have emerged as the key to business success in recent years. Existing big data platform systems must load a large amount of data generated in real time without delay to process unstructured data, which is an alternative data, and efficiently manage storage space by utilizing a deduplication system of different storages when redundant data occurs. In this paper, we propose a multi-layer distributed data deduplication process system using the similarity of the Cuckoo hashing filter technique considering the characteristics of big data. Similarity between virtual machines is applied as Cuckoo hash, individual storage nodes can improve performance with deduplication efficiency, and multi-layer Cuckoo filter is applied to reduce processing time. Experimental results show that the proposed method shortens the processing time by 8.9% and increases the deduplication rate by 10.3%.