• Title/Summary/Keyword: Data deduplication

Search Result 47, Processing Time 0.02 seconds

Image Deduplication Based on Hashing and Clustering in Cloud Storage

  • Chen, Lu;Xiang, Feng;Sun, Zhixin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1448-1463
    • /
    • 2021
  • With the continuous development of cloud storage, plenty of redundant data exists in cloud storage, especially multimedia data such as images and videos. Data deduplication is a data reduction technology that significantly reduces storage requirements and increases bandwidth efficiency. To ensure data security, users typically encrypt data before uploading it. However, there is a contradiction between data encryption and deduplication. Existing deduplication methods for regular files cannot be applied to image deduplication because images need to be detected based on visual content. In this paper, we propose a secure image deduplication scheme based on hashing and clustering, which combines a novel perceptual hash algorithm based on Local Binary Pattern. In this scheme, the hash value of the image is used as the fingerprint to perform deduplication, and the image is transmitted in an encrypted form. Images are clustered to reduce the time complexity of deduplication. The proposed scheme can ensure the security of images and improve deduplication accuracy. The comparison with other image deduplication schemes demonstrates that our scheme has somewhat better performance.

Survey on Data Deduplication in Cloud Storage Environments

  • Kim, Won-Bin;Lee, Im-Yeong
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.658-673
    • /
    • 2021
  • Data deduplication technology improves data storage efficiency while storing and managing large amounts of data. It reduces storage requirements by determining whether replicated data is being added to storage and omitting these uploads. Data deduplication technologies require data confidentiality and integrity when applied to cloud storage environments, and they require a variety of security measures, such as encryption. However, because the source data cannot be transformed, common encryption techniques generally cannot be applied at the same time as data deduplication. Various studies have been conducted to solve this problem. This white paper describes the basic environment for data deduplication technology. It also analyzes and compares multiple proposed technologies to address security threats.

File Deduplication using Logical Partition of Storage System (저장 시스템의 논리 파티션을 이용한 파일 중복 제거)

  • Kong, Jin-San;Yoo, Chuck;Ko, Young-Woong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.7 no.6
    • /
    • pp.345-351
    • /
    • 2012
  • In traditional target-based data deduplication system, all of the files should be chunked and compared for reducing duplicated data blocks. One of the critical problem of this system arises as the number of files are increasing. The system suffers from computational delay for calculating hash value and processing metadata for handling each file. To overcome this problem, in this paper, we propose a novel data deduplication system using logical partition of storage system. The system applies data deduplication scheme to each logical partition not each file. Experiment result shows that the proposed system is more efficient compared with traditional deduplication scheme where the logical partition is full of files by 50% in terms of deduplication capacity and processing time.

Side-Channel Attack against Secure Data Deduplication over Encrypted Data in Cloud Storage (암호화된 클라우드 데이터의 중복제거 기법에 대한 부채널 공격)

  • Shin, Hyungjune;Koo, Dongyoung;Hur, Junbeom
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.4
    • /
    • pp.971-980
    • /
    • 2017
  • Data deduplication can be utilized to reduce storage space in cloud storage services by storing only a single copy of data rather than all duplicated copies. Users who are concerned the confidentiality of their outsourced data can use secure encryption algorithms, but it makes data deduplication ineffective. In order to reconcile data deduplication with encryption, Liu et al. proposed a new server-side cross-user deduplication scheme by exploiting password authenticated key exchange (PAKE) protocol in 2015. In this paper, we demonstrate that this scheme has side channel which causes insecurity against the confirmation-of-file (CoF), or duplicate identification attack.

Dynamic Prime Chunking Algorithm for Data Deduplication in Cloud Storage

  • Ellappan, Manogar;Abirami, S
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1342-1359
    • /
    • 2021
  • The data deduplication technique identifies the duplicates and minimizes the redundant storage data in the backup server. The chunk level deduplication plays a significant role in detecting the appropriate chunk boundaries, which solves the challenges such as minimum throughput and maximum chunk size variance in the data stream. To provide the solution, we propose a new chunking algorithm called Dynamic Prime Chunking (DPC). The main goal of DPC is to dynamically change the window size within the prime value based on the minimum and maximum chunk size. According to the result, DPC provides high throughput and avoid significant chunk variance in the deduplication system. The implementation and experimental evaluation have been performed on the multimedia and operating system datasets. DPC has been compared with existing algorithms such as Rabin, TTTD, MAXP, and AE. Chunk Count, Chunking time, throughput, processing time, Bytes Saved per Second (BSPS) and Deduplication Elimination Ratio (DER) are the performance metrics analyzed in our work. Based on the analysis of the results, it is found that throughput and BSPS have improved. Firstly, DPC quantitatively improves throughput performance by more than 21% than AE. Secondly, BSPS increases a maximum of 11% than the existing AE algorithm. Due to the above reason, our algorithm minimizes the total processing time and achieves higher deduplication efficiency compared with the existing Content Defined Chunking (CDC) algorithms.

A Secure and Practical Encrypted Data De-duplication with Proof of Ownership in Cloud Storage (클라우드 스토리지 상에서 안전하고 실용적인 암호데이터 중복제거와 소유권 증명 기술)

  • Park, Cheolhee;Hong, Dowon;Seo, Changho
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1165-1172
    • /
    • 2016
  • In cloud storage environment, deduplication enables efficient use of the storage. Also, in order to save network bandwidth, cloud storage service provider has introduced client-side deduplication. Cloud storage service users want to upload encrypted data to ensure confidentiality. However, common encryption method cannot be combined with deduplication, because each user uses a different private key. Also, client-side deduplication can be vulnerable to security threats because file tag replaces the entire file. Recently, proof of ownership schemes have suggested to remedy the vulnerabilities of client-side deduplication. Nevertheless, client-side deduplication over encrypted data still causes problems in efficiency and security. In this paper, we propose a secure and practical client-side encrypted data deduplication scheme that has resilience to brute force attack and performs proof of ownership over encrypted data.

Deduplication Technologies over Encrypted Data (암호데이터 중복처리 기술)

  • Kim, Keonwoo;Chang, Ku-Young;Kim, Ik-Kyun
    • Electronics and Telecommunications Trends
    • /
    • v.33 no.1
    • /
    • pp.68-77
    • /
    • 2018
  • Data deduplication is a common used technology in backup systems and cloud storage to reduce storage costs and network traffic. To preserve data privacy from servers or malicious attackers, there has been a growing demand in recent years for individuals and companies to encrypt data and store encrypted data on a server. In this study, we introduce two cryptographic primitives, Convergent Encryption and Message-Locked Encryption, which enable deduplication of encrypted data between clients and a storage server. We analyze the security of these schemes in terms of dictionary and poison attacks. In addition, we introduce deduplication systems that can be implemented in real cloud storage, which is a practical application environment, and describes the proof of ownership on client-side deduplication.

Encrypted Data Deduplication Using Key Issuing Server (키 발급 서버를 이용한 암호데이터 중복제거 기술)

  • Kim, Hyun-il;Park, Cheolhee;Hong, Dowon;Seo, Changho
    • Journal of KIISE
    • /
    • v.43 no.2
    • /
    • pp.143-151
    • /
    • 2016
  • Data deduplication is an important technique for cloud storage savings. These techniques are especially important for encrypted data because data deduplication over plaintext is basically vulnerable for data confidentiality. We examined encrypted data deduplication with the aid of a key issuing server and compared Convergent Encryption with a technique created by M.Bellare et al. In addition, we implemented this technique over not only Dropbox but also an open cloud storage service, Openstack Swift. We measured the performance for this technique over Dropbox and Openstack Swift. According to our results, we verified that the encrypted data deduplication technique with the aid of a key issuing server is a feasible and versatile method.

Secure and Efficient Client-side Deduplication for Cloud Storage (안전하고 효율적인 클라이언트 사이드 중복 제거 기술)

  • Park, Kyungsu;Eom, Ji Eun;Park, Jeongsu;Lee, Dong Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.1
    • /
    • pp.83-94
    • /
    • 2015
  • Deduplication, which is a technique of eliminating redundant data by storing only a single copy of each data, provides clients and a cloud server with efficiency for managing stored data. Since the data is saved in untrusted public cloud server, however, both invasion of data privacy and data loss can be occurred. Over recent years, although many studies have been proposed secure deduplication schemes, there still remains both the security problems causing serious damages and inefficiency. In this paper, we propose secure and efficient client-side deduplication with Key-server based on Bellare et. al's scheme and challenge-response method. Furthermore, we point out potential risks of client-side deduplication and show that our scheme is secure against various attacks and provides high efficiency for uploading big size of data.

File Modification Pattern Detection Mechanism Using File Similarity Information

  • Jung, Ho-Min;Ko, Yong-Woong
    • International journal of advanced smart convergence
    • /
    • v.1 no.1
    • /
    • pp.34-37
    • /
    • 2012
  • In a storage system, the performance of data deduplication can be increased if we consider the file modification pattern. For example, if a file is modified at the end of file region then fixed-length chunking algorithm superior to variable-length chunking. Therefore, it is important to predict in which location of a file is modified between files. In this paper, the essential idea is to exploit an efficient file pattern checking scheme that can be used for data deduplication system. The file modification pattern can be used for elaborating data deduplication system for selecting deduplication algorithm. Experiment result shows that the proposed system can predict file modification region with high probability.