• Title/Summary/Keyword: deduplication

Search Result 73, Processing Time 0.025 seconds

Image Deduplication Based on Hashing and Clustering in Cloud Storage

  • Chen, Lu;Xiang, Feng;Sun, Zhixin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1448-1463
    • /
    • 2021
  • With the continuous development of cloud storage, plenty of redundant data exists in cloud storage, especially multimedia data such as images and videos. Data deduplication is a data reduction technology that significantly reduces storage requirements and increases bandwidth efficiency. To ensure data security, users typically encrypt data before uploading it. However, there is a contradiction between data encryption and deduplication. Existing deduplication methods for regular files cannot be applied to image deduplication because images need to be detected based on visual content. In this paper, we propose a secure image deduplication scheme based on hashing and clustering, which combines a novel perceptual hash algorithm based on Local Binary Pattern. In this scheme, the hash value of the image is used as the fingerprint to perform deduplication, and the image is transmitted in an encrypted form. Images are clustered to reduce the time complexity of deduplication. The proposed scheme can ensure the security of images and improve deduplication accuracy. The comparison with other image deduplication schemes demonstrates that our scheme has somewhat better performance.

File Deduplication using Logical Partition of Storage System (저장 시스템의 논리 파티션을 이용한 파일 중복 제거)

  • Kong, Jin-San;Yoo, Chuck;Ko, Young-Woong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.7 no.6
    • /
    • pp.345-351
    • /
    • 2012
  • In traditional target-based data deduplication system, all of the files should be chunked and compared for reducing duplicated data blocks. One of the critical problem of this system arises as the number of files are increasing. The system suffers from computational delay for calculating hash value and processing metadata for handling each file. To overcome this problem, in this paper, we propose a novel data deduplication system using logical partition of storage system. The system applies data deduplication scheme to each logical partition not each file. Experiment result shows that the proposed system is more efficient compared with traditional deduplication scheme where the logical partition is full of files by 50% in terms of deduplication capacity and processing time.

Offline Deduplication for Solid State Disk Using a Lightweight Hash Algorithm

  • Park, Eunsoo;Shin, Dongkun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.5
    • /
    • pp.539-545
    • /
    • 2015
  • Deduplication technique can expand the lifespan and capacity of flash memory-based storage devices by eliminating duplicated write operations. The deduplication techniques can be classified into two approaches, i.e., online and offline approaches. We propose an offline deduplication technique that uses a lightweight hash algorithm, whereas the previous offline technique uses a high-cost hash algorithm. Therefore, the memory space for caching hash values can be reduced, and more pages can be examined for deduplication during short idle intervals. As a result, it can provide shorter write latencies compared to the online approach, and can show low garbage collection costs compared to the previous offline deduplication technique.

Survey on Data Deduplication in Cloud Storage Environments

  • Kim, Won-Bin;Lee, Im-Yeong
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.658-673
    • /
    • 2021
  • Data deduplication technology improves data storage efficiency while storing and managing large amounts of data. It reduces storage requirements by determining whether replicated data is being added to storage and omitting these uploads. Data deduplication technologies require data confidentiality and integrity when applied to cloud storage environments, and they require a variety of security measures, such as encryption. However, because the source data cannot be transformed, common encryption techniques generally cannot be applied at the same time as data deduplication. Various studies have been conducted to solve this problem. This white paper describes the basic environment for data deduplication technology. It also analyzes and compares multiple proposed technologies to address security threats.

A Secure and Practical Encrypted Data De-duplication with Proof of Ownership in Cloud Storage (클라우드 스토리지 상에서 안전하고 실용적인 암호데이터 중복제거와 소유권 증명 기술)

  • Park, Cheolhee;Hong, Dowon;Seo, Changho
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1165-1172
    • /
    • 2016
  • In cloud storage environment, deduplication enables efficient use of the storage. Also, in order to save network bandwidth, cloud storage service provider has introduced client-side deduplication. Cloud storage service users want to upload encrypted data to ensure confidentiality. However, common encryption method cannot be combined with deduplication, because each user uses a different private key. Also, client-side deduplication can be vulnerable to security threats because file tag replaces the entire file. Recently, proof of ownership schemes have suggested to remedy the vulnerabilities of client-side deduplication. Nevertheless, client-side deduplication over encrypted data still causes problems in efficiency and security. In this paper, we propose a secure and practical client-side encrypted data deduplication scheme that has resilience to brute force attack and performs proof of ownership over encrypted data.

Side-Channel Attack against Secure Data Deduplication over Encrypted Data in Cloud Storage (암호화된 클라우드 데이터의 중복제거 기법에 대한 부채널 공격)

  • Shin, Hyungjune;Koo, Dongyoung;Hur, Junbeom
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.4
    • /
    • pp.971-980
    • /
    • 2017
  • Data deduplication can be utilized to reduce storage space in cloud storage services by storing only a single copy of data rather than all duplicated copies. Users who are concerned the confidentiality of their outsourced data can use secure encryption algorithms, but it makes data deduplication ineffective. In order to reconcile data deduplication with encryption, Liu et al. proposed a new server-side cross-user deduplication scheme by exploiting password authenticated key exchange (PAKE) protocol in 2015. In this paper, we demonstrate that this scheme has side channel which causes insecurity against the confirmation-of-file (CoF), or duplicate identification attack.

Dynamic Prime Chunking Algorithm for Data Deduplication in Cloud Storage

  • Ellappan, Manogar;Abirami, S
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1342-1359
    • /
    • 2021
  • The data deduplication technique identifies the duplicates and minimizes the redundant storage data in the backup server. The chunk level deduplication plays a significant role in detecting the appropriate chunk boundaries, which solves the challenges such as minimum throughput and maximum chunk size variance in the data stream. To provide the solution, we propose a new chunking algorithm called Dynamic Prime Chunking (DPC). The main goal of DPC is to dynamically change the window size within the prime value based on the minimum and maximum chunk size. According to the result, DPC provides high throughput and avoid significant chunk variance in the deduplication system. The implementation and experimental evaluation have been performed on the multimedia and operating system datasets. DPC has been compared with existing algorithms such as Rabin, TTTD, MAXP, and AE. Chunk Count, Chunking time, throughput, processing time, Bytes Saved per Second (BSPS) and Deduplication Elimination Ratio (DER) are the performance metrics analyzed in our work. Based on the analysis of the results, it is found that throughput and BSPS have improved. Firstly, DPC quantitatively improves throughput performance by more than 21% than AE. Secondly, BSPS increases a maximum of 11% than the existing AE algorithm. Due to the above reason, our algorithm minimizes the total processing time and achieves higher deduplication efficiency compared with the existing Content Defined Chunking (CDC) algorithms.

Distributed data deduplication technique using similarity based clustering and multi-layer bloom filter (SDS 환경의 유사도 기반 클러스터링 및 다중 계층 블룸필터를 활용한 분산 중복제거 기법)

  • Yoon, Dabin;Kim, Deok-Hwan
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.14 no.5
    • /
    • pp.60-70
    • /
    • 2018
  • A software defined storage (SDS) is being deployed in cloud environment to allow multiple users to virtualize physical servers, but a solution for optimizing space efficiency with limited physical resources is needed. In the conventional data deduplication system, it is difficult to deduplicate redundant data uploaded to distributed storages. In this paper, we propose a distributed deduplication method using similarity-based clustering and multi-layer bloom filter. Rabin hash is applied to determine the degree of similarity between virtual machine servers and cluster similar virtual machines. Therefore, it improves the performance compared to deduplication efficiency for individual storage nodes. In addition, a multi-layer bloom filter incorporated into the deduplication process to shorten processing time by reducing the number of the false positives. Experimental results show that the proposed method improves the deduplication ratio by 9% compared to deduplication method using IP address based clusters without any difference in processing time.

Deduplication Technologies over Encrypted Data (암호데이터 중복처리 기술)

  • Kim, Keonwoo;Chang, Ku-Young;Kim, Ik-Kyun
    • Electronics and Telecommunications Trends
    • /
    • v.33 no.1
    • /
    • pp.68-77
    • /
    • 2018
  • Data deduplication is a common used technology in backup systems and cloud storage to reduce storage costs and network traffic. To preserve data privacy from servers or malicious attackers, there has been a growing demand in recent years for individuals and companies to encrypt data and store encrypted data on a server. In this study, we introduce two cryptographic primitives, Convergent Encryption and Message-Locked Encryption, which enable deduplication of encrypted data between clients and a storage server. We analyze the security of these schemes in terms of dictionary and poison attacks. In addition, we introduce deduplication systems that can be implemented in real cloud storage, which is a practical application environment, and describes the proof of ownership on client-side deduplication.

Analysis of Security Weakness on Secure Deduplication Schemes in Cloud Storage (클라우드 스토리지에서 안전한 중복 제거 기법들에 대한 보안 취약점 분석)

  • Park, Ji Sun;Shin, Sang Uk
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.909-916
    • /
    • 2018
  • Cloud storage services have many advantages. As a result, the amount of data stored in the storage of the cloud service provider is increasing rapidly. This increase in demand forces cloud storage providers to apply deduplication technology for efficient use of storages. However, deduplication technology has inherent security and privacy concerns. Several schemes have been proposed to solve these problems, but there are still some vulnerabilities to well-known attacks on deduplication techniques. In this paper, we examine some of the existing schemes and analyze their security weaknesses.