Browse > Article
http://dx.doi.org/10.5573/ieek.2013.50.2.143

Data Deduplication Method using Locality-based Chunking policy for SSD-based Server Storages  

Lee, Seung-Kyu (Department of Electronic Engineering, Inha University)
Kim, Ju-Kyeong (Department of Electronic Engineering, Inha University)
Kim, Deok-Hwan (Department of Electronic Engineering, Inha University)
Publication Information
Journal of the Institute of Electronics and Information Engineers / v.50, no.2, 2013 , pp. 143-151 More about this Journal
Abstract
NAND flash-based SSDs (Solid State Drive) have advantages of fast input/output performance and low power consumption so that they could be widely used as storages on tablet, desktop PC, smart-phone, and server. But, SSD has the disadvantage of wear-leveling due to increase of the number of writes. In order to improve the lifespan of the SSD, a variety of data deduplication techniques have been introduced. General fixed-size splitting method allocates fixed size of chunk without considering locality of data so that it may execute unnecessary chunking and hash key generation, and variable-size splitting method occurs excessive operation since it compares data byte-by-byte for deduplication. This paper proposes adaptive chunking method based on application locality and file name locality of written data in SSD-based server storage. The proposed method split data into 4KB or 64KB chunks adaptively according to application locality and file name locality of duplicated data so that it can reduce the overhead of chunking and hash key generation and prevent duplicated data writing. The experimental results show that the proposed method can enhance write performance, reduce power consumption and operation time compared to existing variable-size splitting method and fixed size splitting method using 4KB.
Keywords
SSD; Data Deduplication; Locality based Chunking; Fixed-size Splitting; Variable-size Splitting;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 J. F. Gantz, C. Chute, A. Manfrediz, S. Minton, D. Reinsel, W. Schlichting, and A. Toncheva, "The diverse and exploding digital universe: An updated forecast of worldwide information growth through 2011," IDC, An IDC White Paper- sponsored by EMC, March 2008.
2 D.G. Andersen and S.Swanson, "Rethinking flash in the data center", IEEE Micro, vol. 30, no. 4, pp.52-54, Jul. 2010.   DOI   ScienceOn
3 J. Min et al, "Efficient Deduplication Techiques for Modern Backup Operation," IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 6, June, 2011.
4 Chin-Hsien Wu, Hau-Shan Wu, "A data de-duplication access framework for solid state drives", SAC'11, Proceedings of the 2011 ACM Symposium on Applied Computing, pp.600-604, Mar, 2011.
5 Seung-Kyu Lee, Yu-Seok Yang, Deok-Hwan Kim, "Hybrid Data Deduplication Method for Reducing Wear-Level of SSD-based Server Storage", Journal of KIISE : Computer Systems and Theory, Vol 38, No 6, pp.292-297, Dec, 2011.   과학기술학회마을
6 Lawrence You and Christos Karamanolis, "Evaluation of efficient archival storage techniques", Proceedings of the 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies, pp.1-6, Apr, 2004.
7 Ahmed El-Shimi, Ran Kalach, Ankit Kumar, Adi Oltean, Jin Li, and Sudipta Sengupta, "Primary Data Deduplication-Large Scale Study and System Design", Usenix ATC'12, June, 2012.
8 S. Quinlan and S. Dorward, "Venti: a new approach to archival storage," in Proceedings of the 1st USENIX conference on File and storage technologies, pp.89-101, 2002.
9 Athicha Muthitacharoen, Benjie Chen, David Maz Ieres "A low-bandwidth network file system" , in proceeding SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles. pp.174-187, 2001.
10 M. O. Rabin, "Fingerprinting by random polynomials", Center for Research in Computing Technology, Tech. Rep.TR-15-81, 1981.
11 Yu-Seok Yang, Seung-Kyu Lee, Deok-Hwan Kim, "De-duplication of Parity Disk in SSD-Based RAID System", Journal of IEEK : CI, acceptance publication, Dec, 2012.
12 Laura DuBois, Robert Amatruda, "Using Deduplication efficiency & IT cost reduction" IDC analyze the Future. September 2010.
13 B. Debnath, S. Sengupta, J. Li, "ChunkStash:S peeding up Inline Storage Deduplication using Flash Memory", USENIX ATC'10, 2010.
14 A. Gupta, R. Pisolka, B. Urgaonkar, and ASivasubramaniam, "Leveraging value locality in optimizing nand flash-based ssds", in Proceedings of the 9th USENIX conference on File and storage technologies, 2011.
15 F. Chen, T. Luo, and X. Zhang, "Caftl: a cont ent-aware flash translation layer enhancing the lifespan of flash memory based solid state drives" in Proceedings of the 9th USENIX conference on File and stroage technologies, 2011.