Browse > Article
http://dx.doi.org/10.7840/kics.2016.41.11.1515

Implementation and Performance Measuring of Erasure Coding of Distributed File System  

Kim, Cheiyol (Electronics and Telecommunications Research Institute)
Kim, Youngchul (Electronics and Telecommunications Research Institute)
Kim, Dongoh (Electronics and Telecommunications Research Institute)
Kim, Hongyeon (Electronics and Telecommunications Research Institute)
Kim, Youngkyun (Electronics and Telecommunications Research Institute)
Seo, Daewha (Dept. of Electronics Eng., Kyungpook National University)
Abstract
With the growth of big data, machine learning, and cloud computing, the importance of storage that can store large amounts of unstructured data is growing recently. So the commodity hardware based distributed file systems such as MAHA-FS, GlusterFS, and Ceph file system have received a lot of attention because of their scale-out and low-cost property. For the data fault tolerance, most of these file systems uses replication in the beginning. But as storage size is growing to tens or hundreds of petabytes, the low space efficiency of the replication has been considered as a problem. This paper applied erasure coding data fault tolerance policy to MAHA-FS for high space efficiency and introduces VDelta technique to solve data consistency problem. In this paper, we compares the performance of two file systems, MAHA-FS and GlusterFS. They have different IO processing architecture, the former is server centric and the latter is client centric architecture. We found the erasure coding performance of MAHA-FS is better than GlusterFS.
Keywords
Erasure Coding; Data Consistency; Distributed Storage; Fault Tolerance; MAHA-FS;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 J. S. Kim and T. W. Kim, "OwFS: A distributed file system for large-scale internet services," J. Korean Data & Inf. Sci. Soc., vol. 27, no. 5, pp. 77-85, May 2009.
2 G. J. Lee, Y. C. Shin, J. H. Koo, and S. H. Choi, "Practical implementation and performance evaluation of random linear network coding," J. KICS, vol. 40, no. 9, pp. 1786-1792, Sept. 2015.   DOI
3 D. Lambright, Erasure Codes and Storage Tiers on Gluster, SA summit, Sept. 23, 2014.
4 A. Ajisaka, HDFS 2015: Past, Present, and Future(2015), Retrieved Sep., 30, 2016, from http://events.linuxfoundation.org/sites/events/files/slides/HDFS2015_Past_present_future.pdf
5 T. Y. Kim, Lessons learned from deploying SSD in NAVER services(2014), Retrieved Sep., 30, 2016, http://dcslab.hanyang.ac.kr/nvramos/nvramos14/presentation/s1.pdf
6 J. N. Gray, "Notes on data base operating systems," Springer-Verlag, vol. 60, pp. 393-481, Berlin, 1978.
7 C. Gray and D. Cheriton, "Leases: An efficient fault-tolerant mechanism for distributed file cache consistency," ACM SIGOPS Operating Systems Rev., vol. 23, no. 5, pp. 202-210, Dec. 1989.
8 H.-T. Kung and John T. Robinson, "On optimistic methods for concurrency control," ACM TODS, vol. 6, no. 2, pp. 213-226, Jun. 1981.   DOI
9 David P. Reed and L. Svobodova, "SWALLOW: A distributed data storage system for a local network," in Proc. 1FIP Working Group 6.4 Int. Workshop on Local Netw., pp. 355-373, Aug. 1980.
10 David P. Reed, "Implementing atomic actions on decentralized data," ACM Trans. Computer Systems (TOCS), vol. 1, no. 1, pp. 3-23, Feb. 1983.   DOI
11 D. H. Kim and S. Y. Hwang, "An efficient wear-leveling algorithm for NAND flash SSD with multi-channel and multi-way architecture," J. KICS, vol. 39B, no. 7, pp. 425-432, Jul. 2014.   DOI
12 G. R. Garth, J. J. Wylie, R. G. Ganger, and M. K. Reiter, "Efficient consistency for erasure-coded data via versioning servers," Carnegie-Mellon Univ. Pittsburgh PA School of Comput. Sci., no. CMU-CS-03-127, Mar. 2003.
13 S. Frolund, A. Merchant, Y. Saito, S. Spence, and A. Veitch, "A decentralized algorithm for erasure-coded virtual disks," IEEE Int. Conf. Dependable Syst. and Netw., pp. 125-134, Jun. 2004.
14 J. S. Plank, S. Simmerman, and C. D. Schuman, Jerasure: A library in C/C++ facilitating erasure coding for storage applications-Version 1.2, Technical Report CS-08-627, University of Tennessee, 2008.
15 Iozone Filesystem Benchmark, http://www.iozone.org/docs/IOzone_msword_98.pdf(access ed Sept. 2016).
16 "Filesystem in Userspace," https://sourceforge.net/projects/fuse/(accessed May 2016).
17 S. M. Han, H. S. Park, and T. W. Kwon. "Shelf-life time based cache replacement policy suitable for web environment," J. KICS, vol. 40, no. 6, pp. 1091-1101, Jun. 2015.   DOI
18 GlusterFS. http://www.gluster.com/ (accessed Sept. 2016).
19 Intel Storage Acceleration Library, https://software.intel.com/en-us/storage/ISA-L (accessed Sept. 2016).
20 S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google File System," in Proc. ACM SOSP, pp. 29-43, 2003.
21 S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn, "Ceph: A scalable, high-performance distributed file system," in Proc. 7th Symp. Operating Syst. Design and Implementation, USENIX Association, 2006.
22 K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The hadoop distributed file system," IEEE MSST, pp. 1-10, 2010.
23 H. Y. Kim, G. S. Jin, M. H. Cha, S. M. Lee, S. M. Lee, Y. C. Kim, and Y. K. Kim, "GLORY-FS: A distributed file system for large-scale internet service," KICS Inf. and Commun. Mag., vol. 30, no. 4, pp. 16-22, Mar. 2013.
24 Y. C. Kim, D. O. Kim, H. Y. Kim, Y. K. Kim, and W. Choi, "MAHA-FS: A distributed file system for high performance metadata processing and random IO," KIPS Trans. Software and Data Eng., vol. 2, no. 2, pp. 91-96, Feb. 2013.   DOI