Browse > Article

A Clustering File Backup Server Using Multi-level De-duplication  

Ko, Young-Woong (한림대학교 컴퓨터공학과)
Jung, Ho-Min (한림대학교 컴퓨터공학과)
Kim, Jin (한림대학교 컴퓨터공학과)
Abstract
Traditional off-the-shelf file server has several potential drawbacks to store data blocks. A first drawback is a lack of practical de-duplication consideration for storing data blocks, which leads to worse storage capacity waste. Second drawback is the requirement for high performance computer system for processing large data blocks. To address these problems, this paper proposes a clustering backup system that exploits file fingerprinting mechanism for block-level de-duplication. Our approach differs from the traditional file server systems in two ways. First, we avoid the data redundancy by multi-level file fingerprints technology which enables us to use storage capacity efficiently. Second, we applied a cluster technology to I/O subsystem, which effectively reduces data I/O time and network bandwidth usage. Experimental results show that the requirement for storage capacity and the I/O performance is noticeably improved.
Keywords
File Fingerprinting; Cluster; Backup; Meta Data Server; Hash; SHA1;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Storage Networking Industry Association, Backup/Recovery Tutorial, 2001
2 R. L. Rivest, 'The MD5 Message Digest Algorithm,' Request for Comments(RFC) 1321, Internet Activities Board, 1992
3 vmware home page, http://www.vmware.com
4 COX, L. P., AND NOBLE, B. D. 'Pastiche: making backup cheap and easy,' In Proceedings of the 5th Symposium on Operating Systems Design and Implementation, Dec.2002
5 L. Wang, K. Park, R. Pang, V. Pai, and L. Peterson. 'Reliability and security in the CoDeeN content distribution network,' In Proceedings of the USENIX Annual Technical Conference, 2004
6 S. Annapureddy, M. J. Freedman, and D. Mazires. 'Shark: Scaling file servers via cooperative caching,' In 2nd USENIX/ACM Symposium on Networked Systems Design and Implementation, Boston, MA, May 2005
7 H. Pucha, D. G. Andersen, and M. Kaminsky. 'Exploiting similarity for multi-source downloads using file handprints,' In Proceedings of the 4th USENIX/ACM Symposium on Networked Systems Design and Implementation, 2007
8 KyoungSoo Park, Sunghwan Ihm, Mic Bowman, and Vivek S. Pai. 'Supporting Practical Content-Addressable Caching with CZIP Compression,' In Proceedings of the USENIX Annual Technical Conference, Santa Clara, CA, June 2007
9 Josh Cates, Robust and Efficient Data Management for a Distributed Hash Table. Master's thesis, Massachusetts Institute of Technology, May 2003
10 J. C. Mogul, Y. M. Chan, and T. Kelly. 'Design, implementation, and evaluation of duplicate transfer detection in HTTP,' In Proceedings of the 1st Symposium on Networked Systems Design and Implementation, 2004
11 QUINLAN, S., AND DORWARD, S. 'Venti: a new approach to archival storage,' In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST) (2002)
12 S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S. Shenker, I. Stoica, and H. Yu. 'OpenDHT: A public DHT service and its uses,' In SIGCOMM, 2005
13 M. O. Rabin. 'Fingerprinting by random polynomials,' Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University, 1981
14 M. Ajtai, R. Burns, et al. 'Compactly encoding unstructured inputs with differential compression,' Journal of the Association for Computing Machinery, 2000
15 http://www.ibm.com/tivoli
16 N. Tolia, M. Kaminsky, D. G. Andersen, and S. Patil. 'An architecture for internet data transfer,' In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation, 2006
17 http://www.samba.org/rsync/
18 RFC 3174, 'US Secure Hash Algorithm 1 (SHA-1)
19 Centos home page, http://www.centos.org
20 A. Tridgell. Efficient algorithms for sorting and synchronization. PhD thesis, The Austrailian National University, 1999
21 C. Policroniades and I. Pratt. 'Alternatives for detecting redundancy in storage systems data,' In Proceedings of USENIX Annual Technical Conference, 2004