• 제목/요약/키워드: deduplication

검색결과 71건 처리시간 0.09초

Image Deduplication Based on Hashing and Clustering in Cloud Storage

  • Chen, Lu;Xiang, Feng;Sun, Zhixin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권4호
    • /
    • pp.1448-1463
    • /
    • 2021
  • With the continuous development of cloud storage, plenty of redundant data exists in cloud storage, especially multimedia data such as images and videos. Data deduplication is a data reduction technology that significantly reduces storage requirements and increases bandwidth efficiency. To ensure data security, users typically encrypt data before uploading it. However, there is a contradiction between data encryption and deduplication. Existing deduplication methods for regular files cannot be applied to image deduplication because images need to be detected based on visual content. In this paper, we propose a secure image deduplication scheme based on hashing and clustering, which combines a novel perceptual hash algorithm based on Local Binary Pattern. In this scheme, the hash value of the image is used as the fingerprint to perform deduplication, and the image is transmitted in an encrypted form. Images are clustered to reduce the time complexity of deduplication. The proposed scheme can ensure the security of images and improve deduplication accuracy. The comparison with other image deduplication schemes demonstrates that our scheme has somewhat better performance.

저장 시스템의 논리 파티션을 이용한 파일 중복 제거 (File Deduplication using Logical Partition of Storage System)

  • 공진산;유혁;고영웅
    • 대한임베디드공학회논문지
    • /
    • 제7권6호
    • /
    • pp.345-351
    • /
    • 2012
  • In traditional target-based data deduplication system, all of the files should be chunked and compared for reducing duplicated data blocks. One of the critical problem of this system arises as the number of files are increasing. The system suffers from computational delay for calculating hash value and processing metadata for handling each file. To overcome this problem, in this paper, we propose a novel data deduplication system using logical partition of storage system. The system applies data deduplication scheme to each logical partition not each file. Experiment result shows that the proposed system is more efficient compared with traditional deduplication scheme where the logical partition is full of files by 50% in terms of deduplication capacity and processing time.

Offline Deduplication for Solid State Disk Using a Lightweight Hash Algorithm

  • Park, Eunsoo;Shin, Dongkun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제15권5호
    • /
    • pp.539-545
    • /
    • 2015
  • Deduplication technique can expand the lifespan and capacity of flash memory-based storage devices by eliminating duplicated write operations. The deduplication techniques can be classified into two approaches, i.e., online and offline approaches. We propose an offline deduplication technique that uses a lightweight hash algorithm, whereas the previous offline technique uses a high-cost hash algorithm. Therefore, the memory space for caching hash values can be reduced, and more pages can be examined for deduplication during short idle intervals. As a result, it can provide shorter write latencies compared to the online approach, and can show low garbage collection costs compared to the previous offline deduplication technique.

Survey on Data Deduplication in Cloud Storage Environments

  • Kim, Won-Bin;Lee, Im-Yeong
    • Journal of Information Processing Systems
    • /
    • 제17권3호
    • /
    • pp.658-673
    • /
    • 2021
  • Data deduplication technology improves data storage efficiency while storing and managing large amounts of data. It reduces storage requirements by determining whether replicated data is being added to storage and omitting these uploads. Data deduplication technologies require data confidentiality and integrity when applied to cloud storage environments, and they require a variety of security measures, such as encryption. However, because the source data cannot be transformed, common encryption techniques generally cannot be applied at the same time as data deduplication. Various studies have been conducted to solve this problem. This white paper describes the basic environment for data deduplication technology. It also analyzes and compares multiple proposed technologies to address security threats.

클라우드 스토리지 상에서 안전하고 실용적인 암호데이터 중복제거와 소유권 증명 기술 (A Secure and Practical Encrypted Data De-duplication with Proof of Ownership in Cloud Storage)

  • 박철희;홍도원;서창호
    • 정보과학회 논문지
    • /
    • 제43권10호
    • /
    • pp.1165-1172
    • /
    • 2016
  • 클라우드 스토리지 환경에서 중복제거 기술은 스토리지의 효율적인 활용을 가능하게 한다. 또한 클라우드 스토리지 서비스 제공자들은 네트워크 대역폭을 절약하기 위해 클라이언트 측 중복제거 기술을 도입하고 있다. 클라우드 스토리지 서비스를 이용하는 사용자들은 민감한 데이터의 기밀성을 보장하기 위해 데이터를 암호화하여 업로드하길 원한다. 그러나 일반적인 암호화 방식은 사용자마다 서로 다른 비밀키를 사용하기 때문에 중복제거와 조화를 이룰 수 없다. 또한 클라이언트 측 중복제거는 태그 값이 전체 데이터를 대신하기 때문에 안전성에 취약할 수 있다. 최근 클라이언트 측 중복제거의 취약점을 보완하기 위해 소유권 증명 기법들이 제안되었지만 여전히 암호데이터 상에서 클라이언트 측 중복제거 기술은 효율성과 안전성에 문제점을 가지고 있다. 본 논문에서는 전수조사 공격에 저항성을 갖고 암호데이터 상에서 소유권 증명을 수행하는 안전하고 실용적인 클라이언트 측 중복제거 기술을 제안한다.

암호화된 클라우드 데이터의 중복제거 기법에 대한 부채널 공격 (Side-Channel Attack against Secure Data Deduplication over Encrypted Data in Cloud Storage)

  • 신형준;구동영;허준범
    • 정보보호학회논문지
    • /
    • 제27권4호
    • /
    • pp.971-980
    • /
    • 2017
  • 클라우드 환경에서 대량으로 발생하는 데이터들에 대해 효율적인 저장 공간을 제공하는 기법으로 단일의 데이터만을 저장하여 중복을 제거하는 중복제거 기법을 활용할 수 있다. 위탁 데이터에 대한 기밀성에 민감한 사용자들은 안전한 암호 알고리즘을 이용 가능하지만 중복제거 기법의 효율성을 떨어뜨린다는 단점을 가지고 있다. 사용자의 데이터 프라이버시를 보장하면서 저장 공간의 효율성을 올리기 위해 2015년에 PAKE(Password Authenticated Key Exchange) 프로토콜을 활용한 서버 측면의 사용자간 중복제거 기법이 제안되었다. 본 논문에서는 부채널을 통하여 제안된 기법이 CoF(Confirmation-of-File) 또는 중복 확인 공격(ducplicate identification attack)에 대해 안전하지 않음을 증명한다.

Dynamic Prime Chunking Algorithm for Data Deduplication in Cloud Storage

  • Ellappan, Manogar;Abirami, S
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권4호
    • /
    • pp.1342-1359
    • /
    • 2021
  • The data deduplication technique identifies the duplicates and minimizes the redundant storage data in the backup server. The chunk level deduplication plays a significant role in detecting the appropriate chunk boundaries, which solves the challenges such as minimum throughput and maximum chunk size variance in the data stream. To provide the solution, we propose a new chunking algorithm called Dynamic Prime Chunking (DPC). The main goal of DPC is to dynamically change the window size within the prime value based on the minimum and maximum chunk size. According to the result, DPC provides high throughput and avoid significant chunk variance in the deduplication system. The implementation and experimental evaluation have been performed on the multimedia and operating system datasets. DPC has been compared with existing algorithms such as Rabin, TTTD, MAXP, and AE. Chunk Count, Chunking time, throughput, processing time, Bytes Saved per Second (BSPS) and Deduplication Elimination Ratio (DER) are the performance metrics analyzed in our work. Based on the analysis of the results, it is found that throughput and BSPS have improved. Firstly, DPC quantitatively improves throughput performance by more than 21% than AE. Secondly, BSPS increases a maximum of 11% than the existing AE algorithm. Due to the above reason, our algorithm minimizes the total processing time and achieves higher deduplication efficiency compared with the existing Content Defined Chunking (CDC) algorithms.

SDS 환경의 유사도 기반 클러스터링 및 다중 계층 블룸필터를 활용한 분산 중복제거 기법 (Distributed data deduplication technique using similarity based clustering and multi-layer bloom filter)

  • 윤다빈;김덕환
    • 한국차세대컴퓨팅학회논문지
    • /
    • 제14권5호
    • /
    • pp.60-70
    • /
    • 2018
  • 클라우드 환경에서 다수의 사용자가 물리적 서버를 가상화하여 사용할 수 있도록 편의성을 제공하는 Software Defined Storage(SDS)를 적용하고 있지만 한정된 물리적 자원을 고려하여 공간 효율성을 최적화하는 솔루션이 필요하다. 기존의 데이터 중복제거 시스템에서는 서로 다른 스토리지에 업로드 된 중복 데이터가 중복제거되기 어렵다는 단점이 있다. 본 논문에서는 유사도기반 클러스터링과 다중 계층 블룸 필터를 적용한 분산 중복제거 기법을 제안한다. 라빈 해시를 이용하여 가상 머신 서버들 간의 유사도를 판단하고 유사도가 높은 가상머신들을 클러스터 함으로써 개별 스토리지 노드별 중복제거 효율에 비하여 성능을 향상시킨다. 또한 중복제거 프로세스에 다중 계층 블룸 필터를 접목하여 처리 시간을 단축하고 긍정오류를 감소시킬 수 있다. 실험결과 제안한 방법은 IP주소 기반 클러스터를 이용한 중복제거 기법에 비해 처리 시간의 차이가 없으면서, 중복제거율이 9% 높아짐을 확인하였다.

암호데이터 중복처리 기술 (Deduplication Technologies over Encrypted Data)

  • 김건우;장구영;김익균
    • 전자통신동향분석
    • /
    • 제33권1호
    • /
    • pp.68-77
    • /
    • 2018
  • Data deduplication is a common used technology in backup systems and cloud storage to reduce storage costs and network traffic. To preserve data privacy from servers or malicious attackers, there has been a growing demand in recent years for individuals and companies to encrypt data and store encrypted data on a server. In this study, we introduce two cryptographic primitives, Convergent Encryption and Message-Locked Encryption, which enable deduplication of encrypted data between clients and a storage server. We analyze the security of these schemes in terms of dictionary and poison attacks. In addition, we introduce deduplication systems that can be implemented in real cloud storage, which is a practical application environment, and describes the proof of ownership on client-side deduplication.

File Modification Pattern Detection Mechanism Using File Similarity Information

  • Jung, Ho-Min;Ko, Yong-Woong
    • International journal of advanced smart convergence
    • /
    • 제1권1호
    • /
    • pp.34-37
    • /
    • 2012
  • In a storage system, the performance of data deduplication can be increased if we consider the file modification pattern. For example, if a file is modified at the end of file region then fixed-length chunking algorithm superior to variable-length chunking. Therefore, it is important to predict in which location of a file is modified between files. In this paper, the essential idea is to exploit an efficient file pattern checking scheme that can be used for data deduplication system. The file modification pattern can be used for elaborating data deduplication system for selecting deduplication algorithm. Experiment result shows that the proposed system can predict file modification region with high probability.