Search | Korea Science

Yoon, Dabin;Kim, Deok-Hwan
- The Journal of Korean Institute of Next Generation Computing
- /
- v.14 no.5
- /
- pp.60-70
- /
- 2018
A software defined storage (SDS) is being deployed in cloud environment to allow multiple users to virtualize physical servers, but a solution for optimizing space efficiency with limited physical resources is needed. In the conventional data deduplication system, it is difficult to deduplicate redundant data uploaded to distributed storages. In this paper, we propose a distributed deduplication method using similarity-based clustering and multi-layer bloom filter. Rabin hash is applied to determine the degree of similarity between virtual machine servers and cluster similar virtual machines. Therefore, it improves the performance compared to deduplication efficiency for individual storage nodes. In addition, a multi-layer bloom filter incorporated into the deduplication process to shorten processing time by reducing the number of the false positives. Experimental results show that the proposed method improves the deduplication ratio by 9% compared to deduplication method using IP address based clusters without any difference in processing time.

Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
- Journal of Convergence for Information Technology
- /
- v.10 no.10
- /
- pp.1-8
- /
- 2020
The need for storage, management, and retrieval techniques for alternative data has emerged as technologies based on data generated from business activities conducted by enterprises have emerged as the key to business success in recent years. Existing big data platform systems must load a large amount of data generated in real time without delay to process unstructured data, which is an alternative data, and efficiently manage storage space by utilizing a deduplication system of different storages when redundant data occurs. In this paper, we propose a multi-layer distributed data deduplication process system using the similarity of the Cuckoo hashing filter technique considering the characteristics of big data. Similarity between virtual machines is applied as Cuckoo hash, individual storage nodes can improve performance with deduplication efficiency, and multi-layer Cuckoo filter is applied to reduce processing time. Experimental results show that the proposed method shortens the processing time by 8.9% and increases the deduplication rate by 10.3%.
https://doi.org/10.22156/CS4SMB.2020.10.10.001 인용 PDF KSCI