• Title/Summary/Keyword: deduplication

Search Result 73, Processing Time 0.024 seconds

File Modification Pattern Detection Mechanism Using File Similarity Information

  • Jung, Ho-Min;Ko, Yong-Woong
    • International journal of advanced smart convergence
    • /
    • v.1 no.1
    • /
    • pp.34-37
    • /
    • 2012
  • In a storage system, the performance of data deduplication can be increased if we consider the file modification pattern. For example, if a file is modified at the end of file region then fixed-length chunking algorithm superior to variable-length chunking. Therefore, it is important to predict in which location of a file is modified between files. In this paper, the essential idea is to exploit an efficient file pattern checking scheme that can be used for data deduplication system. The file modification pattern can be used for elaborating data deduplication system for selecting deduplication algorithm. Experiment result shows that the proposed system can predict file modification region with high probability.

Encrypted Data Deduplication Using Key Issuing Server (키 발급 서버를 이용한 암호데이터 중복제거 기술)

  • Kim, Hyun-il;Park, Cheolhee;Hong, Dowon;Seo, Changho
    • Journal of KIISE
    • /
    • v.43 no.2
    • /
    • pp.143-151
    • /
    • 2016
  • Data deduplication is an important technique for cloud storage savings. These techniques are especially important for encrypted data because data deduplication over plaintext is basically vulnerable for data confidentiality. We examined encrypted data deduplication with the aid of a key issuing server and compared Convergent Encryption with a technique created by M.Bellare et al. In addition, we implemented this technique over not only Dropbox but also an open cloud storage service, Openstack Swift. We measured the performance for this technique over Dropbox and Openstack Swift. According to our results, we verified that the encrypted data deduplication technique with the aid of a key issuing server is a feasible and versatile method.

Secure and Efficient Client-side Deduplication for Cloud Storage (안전하고 효율적인 클라이언트 사이드 중복 제거 기술)

  • Park, Kyungsu;Eom, Ji Eun;Park, Jeongsu;Lee, Dong Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.1
    • /
    • pp.83-94
    • /
    • 2015
  • Deduplication, which is a technique of eliminating redundant data by storing only a single copy of each data, provides clients and a cloud server with efficiency for managing stored data. Since the data is saved in untrusted public cloud server, however, both invasion of data privacy and data loss can be occurred. Over recent years, although many studies have been proposed secure deduplication schemes, there still remains both the security problems causing serious damages and inefficiency. In this paper, we propose secure and efficient client-side deduplication with Key-server based on Bellare et. al's scheme and challenge-response method. Furthermore, we point out potential risks of client-side deduplication and show that our scheme is secure against various attacks and provides high efficiency for uploading big size of data.

Information Dispersal Algorithm and Proof of Ownership for Data Deduplication in Dispersed Storage Systems (분산 스토리지 시스템에서 데이터 중복제거를 위한 정보분산 알고리즘 및 소유권 증명 기법)

  • Shin, Youngjoo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.1
    • /
    • pp.155-164
    • /
    • 2015
  • Information dispersal algorithm guarantees high availability and confidentiality for data and is one of the useful solutions for faulty and untrusted dispersed storage systems such as cloud storages. As the amount of data stored in storage systems increases, data deduplication which allows to save IT resources is now being considered as the most promising technology. Hence, it is necessary to study on an information dispersal algorithm that supports data deduplication. In this paper, we propose an information dispersal algorithm and proof of ownership for client-side data deduplication in the dispersed storage systems. The proposed solutions allow to save the network bandwidth as well as the storage space while giving robust security guarantee against untrusted storage servers and malicious clients.

Client-Side Deduplication to Enhance Security and Reduce Communication Costs

  • Kim, Keonwoo;Youn, Taek-Young;Jho, Nam-Su;Chang, Ku-Young
    • ETRI Journal
    • /
    • v.39 no.1
    • /
    • pp.116-123
    • /
    • 2017
  • Message-locked encryption (MLE) is a widespread cryptographic primitive that enables the deduplication of encrypted data stored within the cloud. Practical client-side contributions of MLE, however, are vulnerable to a poison attack, and server-side MLE schemes require large bandwidth consumption. In this paper, we propose a new client-side secure deduplication method that prevents a poison attack, reduces the amount of traffic to be transmitted over a network, and requires fewer cryptographic operations to execute the protocol. The proposed primitive was analyzed in terms of security, communication costs, and computational requirements. We also compared our proposal with existing MLE schemes.

Cloud Storage Security Deduplication Scheme Based on Dynamic Bloom Filter

  • Yan, Xi-ai;Shi, Wei-qi;Tian, Hua
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1265-1276
    • /
    • 2019
  • Data deduplication is a common method to improve cloud storage efficiency and save network communication bandwidth, but it also brings a series of problems such as privacy disclosure and dictionary attacks. This paper proposes a secure deduplication scheme for cloud storage based on Bloom filter, and dynamically extends the standard Bloom filter. A public dynamic Bloom filter array (PDBFA) is constructed, which improves the efficiency of ownership proof, realizes the fast detection of duplicate data blocks and reduces the false positive rate of the system. In addition, in the process of file encryption and upload, the convergent key is encrypted twice, which can effectively prevent violent dictionary attacks. The experimental results show that the PDBFA scheme has the characteristics of low computational overhead and low false positive rate.

Privacy Preserving Source Based Deduplication In Cloud Storage (클라우드 스토리지 상에서의 프라이버시 보존형 소스기반 중복데이터 제거기술)

  • Park, Cheolhee;Hong, Dowon;Seo, Changho;Chang, Ku-Young
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.1
    • /
    • pp.123-132
    • /
    • 2015
  • In cloud storage, processing the duplicated data, namely deduplication, is necessary technology to save storage space. Users who store sensitive data in remote storage want data be encrypted. However Cloud storage server do not detect duplication of conventionally encrypted data. To solve this problem, Convergent Encryption has been proposed. But it inherently have weakness due to brute-force attack. On the other hand, to save storage space as well as save bandwidths, client-side deduplication have been applied. Recently, various client-side deduplication technology has been proposed. However, this propositions still cannot solve the security problem. In this paper, we suggest a secure source-based deduplication technology, which encrypt data to ensure the confidentiality of sensitive data and apply proofs of ownership protocol to control access to the data, from curious cloud server and malicious user.

Deduplication and Exploitability Determination of UAF Vulnerability Samples by Fast Clustering

  • Peng, Jianshan;Zhang, Mi;Wang, Qingxian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.10
    • /
    • pp.4933-4956
    • /
    • 2016
  • Use-After-Free (UAF) is a common lethal form of software vulnerability. By using tools such as Web Browser Fuzzing, a large amount of samples containing UAF vulnerabilities can be generated. To evaluate the threat level of vulnerability or to patch the vulnerabilities, automatic deduplication and exploitability determination should be carried out for these samples. There are some problems existing in current methods, including inadequate pertinence, lack of depth and precision of analysis, high time cost, and low accuracy. In this paper, in terms of key dangling pointer and crash context, we analyze four properties of similar samples of UAF vulnerability, explore the method of extracting and calculate clustering eigenvalues from these samples, perform clustering by fast search and find of density peaks on a large number of vulnerability samples. Samples were divided into different UAF vulnerability categories according to the clustering results, and the exploitability of these UAF vulnerabilities was determined by observing the shape of class cluster. Experimental results showed that the approach was applicable to the deduplication and exploitability determination of a large amount of UAF vulnerability samples, with high accuracy and low performance cost.

Parallel Rabin Fingerprinting on GPGPU for Efficient Data Deduplication (효율적인 데이터 중복제거를 위한 GPGPU 병렬 라빈 핑거프린팅)

  • Ma, Jeonghyeon;Park, Sejin;Park, Chanik
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.611-616
    • /
    • 2014
  • Rabin fingerprinting used for chunking requires the largest amount computation time in data deduplication, In this paper, therefore, we proposed parallel Rabin fingerprinting on GPGPU for efficient data deduplication. In addition, for efficient parallelism in Rabin fingerprinting, four issues are considered. Firstly, when dividing input data stream into data sections, we consider the data located near the boundaries between data sections to calculate Rabin fingerprint continuously. Secondly, we consider exploiting the characteristics of Rabin fingerprinting for efficient operation. Thirdly, we consider the chunk boundaries which can be changed compared to sequential Rabin fingerprinting when adapting parallel Rabin fingerprinting. Finally, we consider optimizing GPGPU memory access. Parallel Rabin fingerprinting on GPGPU shows 16 times and 5.3 times better performance compared to sequential Rabin fingerprinting on CPU and compared to parallel Rabin fingerprinting on CPU, respectively. These throughput improvement of Rabin fingerprinting can lead to total performance improvement of data deduplication.

Study of Efficient Algorithm for Deduplication of Complex Structure (복잡한 구조의 데이터 중복제거를 위한 효율적인 알고리즘 연구)

  • Lee, Hyeopgeon;Kim, Young-Woon;Kim, Ki-Young
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.1
    • /
    • pp.29-36
    • /
    • 2021
  • The amount of data generated has been growing exponentially, and the complexity of data has been increasing owing to the advancement of information technology (IT). Big data analysts and engineers have therefore been actively conducting research to minimize the analysis targets for faster processing and analysis of big data. Hadoop, which is widely used as a big data platform, provides various processing and analysis functions, including minimization of analysis targets through Hive, which is a subproject of Hadoop. However, Hive uses a vast amount of memory for data deduplication because it is implemented without considering the complexity of data. Therefore, an efficient algorithm has been proposed for data deduplication of complex structures. The performance evaluation results demonstrated that the proposed algorithm reduces the memory usage and data deduplication time by approximately 79% and 0.677%, respectively, compared to Hive. In the future, performance evaluation based on a large number of data nodes is required for a realistic verification of the proposed algorithm.