Browse > Article
http://dx.doi.org/10.14372/IEMEK.2012.7.6.345

File Deduplication using Logical Partition of Storage System  

Kong, Jin-San (Hallym Univ.)
Yoo, Chuck (Korea Univ.)
Ko, Young-Woong (Hallym Univ.)
Publication Information
Abstract
In traditional target-based data deduplication system, all of the files should be chunked and compared for reducing duplicated data blocks. One of the critical problem of this system arises as the number of files are increasing. The system suffers from computational delay for calculating hash value and processing metadata for handling each file. To overcome this problem, in this paper, we propose a novel data deduplication system using logical partition of storage system. The system applies data deduplication scheme to each logical partition not each file. Experiment result shows that the proposed system is more efficient compared with traditional deduplication scheme where the logical partition is full of files by 50% in terms of deduplication capacity and processing time.
Keywords
Deduplication; Chunking; partition; FLC; VLC;
Citations & Related Records
연도 인용수 순위
  • Reference
1 E.J. Choi, J.W. Lee, "The Method of Data Synchronization Among Devices for Personal Cloud Services," Journal of IEMEK, Vol. 6, No. 6, pp.377-382, 2011 (in Korean).
2 D.T. Meyer, W.J. Bolosky, "A study of practical deduplication," Proceedings on the 9th USENIX conference on File and stroage technologies (FAST), 2011.
3 S. Quinlan, S. Dorward, "Venti: a new approach to archival storage," Proceedings on the FAST 2002 Conference on File and Storage Technologies, Vol. 4, 2002.
4 F. Douglis, A. Iyengar. "Application-specific Delta-encoding via Resemblance Detection," Proceedings on 2003 USENIX Technical Conference, pp.113-126, 2003.
5 P. Kulkarni, F. Douglis, J. LaVoie, J.M. Tracey, "Redundancy Elimination Within Large Collections of Files," Proceedings on 2004 USENIX Technical Conference, 2004.
6 A. Muthitacharoen, B. Chen, D. Mazieres, "A low-bandwidth network file system," ACM SIGOPS Operating System Review Vol. 35, No. 5, pp.174-187, 2001.   DOI   ScienceOn
7 S. Annapureddy, M.J. Freedman, D. Mazieres, "Shark: Scaling file servers via cooperative caching," Proceedings on the 2nd Symposium on Networked Systems Design and Implementation (NSDI), pp.129-142, 2005.
8 B. Zhu, K. Li, H. Patterson, "Avoiding the disk bottleneck in the data domain deduplication file system," Proceedings on the Seventh USENIX Conference on File and Storage Technologies (FAST), pp.269-282, 2008.
9 A. Broder, M. Mitzenmacher, "Network Applications of Bloom Filters: A Survey," Internet Mathematics, Vol. 1, No. 4, pp.485-509, 2002.
10 M. Lillibridge, K. Eshghi, D. Bhagwat, V. Deolalikar, G. Trezise, P. Campbell, "Sparse Indexing, Large Scale, Inline Deduplication Using Sampling and Locality," Proceedings on the Seventh USENIX Conference on File and Storage Technologies (FAST) 2009.
11 D. Harnik, O. Margalit, D. Naor, D. Sotnikov, G. Vernik, "Estimation of deduplication ratios in large data sets," Proceedings on IEEE 28th Symposium on Mass Storage Systems and Technologies, pp.1-11, 2012.