Browse > Article

Storage System Performance Enhancement Using Duplicated Data Management Scheme  

Jung, Ho-Min (한림대학교 컴퓨터공학과)
Ko, Young-Woong (한림대학교 컴퓨터공학과)
Abstract
Traditional storage server suffers from duplicated data blocks which cause an waste of storage space and network bandwidth. To address this problem, various de-duplication mechanisms are proposed. Especially, lots of works are limited to backup server that exploits Contents-Defined Chunking (CDC). In backup server, duplicated blocks can be easily traced by using Anchor, therefore CDC scheme is widely used for backup server. In this paper, we propose a new de-duplication mechanism for improving a storage system. We focus on efficient algorithm for supporting general purpose de-duplication server including backup server, P2P server, and FTP server. The key idea is to adapt stride scheme on traditional fixed block duplication checking mechanism. Experimental result shows that the proposed mechanism can minimize computation time for detecting duplicated region of blocks and efficiently manage storage systems.
Keywords
File Fingerprint; Stride; Hash; Duplication; Storage Server;
Citations & Related Records
연도 인용수 순위
  • Reference
1 centos home page, http://www.centos.org/
2 vmware home page, http://www.vmware.com/
3 fedoraproject home page, http://www.fedoraproject.org/
4 A. Tridgell. Efficient algorithms for sorting and synchronization. PhD thesis, The Austrailian National University, 1999.
5 plan9 home page, http://plan9.bell-labs.com/plan9/
6 QUINLAN, S., AND DORWARD, S. "Venti: a new approach to archival storage," In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST), 2002.
7 Athicha Muthitacharoen, Benjie Chen, and David Mazieres, "A Low-Bandwidth Network File System," In Proceedings of the Symposium on Operating Systems Principles (SOSP'01), pp.174-187, 2001.
8 M. O. Rabin, "Fingerprinting by random polynomials," Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University, 1981.
9 D. Bobbarjung, Suresh Jagannathan, C. Dubnicki. Improving Duplicate Elimination in Storage Systems, ACM Transactions on Storage, November 2006.
10 Constantine P. Sapuntzakis, Ramesh Chandra, BenPfaff, Jim Chow, Monica S. Lam, and Mendel Rosenblum. Optimizing the Migration of Virtual Computers. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), December 2002.
11 K. Eshghi and H.K. Tang, A Framework for Analyzing and Improving Content-Based Chunking Algorithms. Hewlett-Packard Labs Technical Report TR 2005-30.
12 Fred Douglis and Arun Iyengar. Application-specific Delta-encoding via Resemblance Detection. In Proceedings of 2003 USENIX Technical Conference, pp.113-126, San Antonio, Texas, USA, 2003.
13 Purushottam Kulkarni, Fred Douglis, Jason La Voie, and John M. Tracey, "Redundancy Elimination Within Large Collections of Files," In Proceedings of 2004 USENIX Technical Conference, Boston, Massachusetts, USA, 2004.
14 B. Zhu, K. Li, and H. Patterson, "Avoiding the disk bottleneck in the data domain deduplication file system," in Proceedings of the Seventh USENIX Conference on File and Storage Technologies (FAST), pp.269-282, 2008.
15 Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Campbell, "Sparse Indexing, Large Scale, Inline Deduplication Using Sampling and Locality," In Proceedings of the Seventh USENIX Conference on File and Storage Technologies (FAST) 2009, San Francisco, CA.
16 Jim Gray, Catharine van Ingen, "Empirical Measurements of Disk Failure Rates and Error Rates," Microsoft Research Technical Report MSR-TR- 2005-166, 2005.
17 L. P. Cox, C. D. Murray, and B. D. Noble. Pastiche: Making backup cheap and easy. In Proc. 5th USENIX OSDI, Boston, MA, Dec. 2002.
18 J.S. Robin and C.E. Irvine. Analysis of the Intel Pentium's ability to support a secure virtual machine monitor. In Proceedings of the 9th USENIX Security Symposium, Denver, CO, August 2000.
19 KyoungSoo Park, Sunghwan Ihm, Mic Bowman, and Vivek S. Pai., "Supporting Practical Content- Addressable Caching with CZIP Compression," In Proceedings of the USENIX Annual Technical Conference, Santa Clara, CA, June 2007.
20 Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, ACM SIGCOMM 2001, San Deigo, CA, August 2001, pp.149-160.
21 R. L. Rivest, "The MD5 Message Digest Algorithm," Request for Comments(RFC) 1321, Internet Activities Board, 1992.
22 RFC 3174, "US Secure Hash Algorithm 1 (SHA-1)"