[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/JIPS.01.0035

HRSF: Single Disk Failure Recovery for Liberation Code Based Storage Systems

Li, Jun (School of Computer Science and Engineering, University of Electronic Science and Technology of China)
Hou, Mengshu (School of Computer Science and Engineering, University of Electronic Science and Technology of China)

Publication Information

Journal of Information Processing Systems / v.15, no.1, 2019 , pp. 55-66 More about this Journal

Abstract

Storage system often applies erasure codes to protect against disk failure and ensure system reliability and availability. Liberation code that is a type of coding scheme has been widely used in many storage systems because its encoding and modifying operations are efficient. However, it cannot effectively achieve fast recovery from single disk failure in storage systems, and has great influence on recovery performance as well as response time of client requests. To solve this problem, in this paper, we present HRSF, a Hybrid Recovery method for solving Single disk Failure. We present the optimal algorithm to accelerate failure recovery process. Theoretical analysis proves that our scheme consumes approximately 25% less amount of data read than the conventional method. In the evaluation, we perform extensive experiments by setting different number of disks and chunk sizes. The results show that HRSF outperforms conventional method in terms of the amount of data read and failure recovery time.

Keywords

Erasure Codes; Disk Failure; Recovery Scheme; Reliability; Storage System;

Citations & Related Records

Reference

1	M. Deng, Z. Chen, Y. Du, N. Xiao, and F. Liu, "Erasure codes in big data era," in Proceedings of International Conference on Control, Automation and Information Sciences (ICCAIS), Gwangju, Korea, 2014, pp. 218-223.
2	X. Pei, Y. Wang, X. Ma, and F. Xu, "A decentralized redundancy generation scheme for codes with locality in distributed storage systems," Concurrency and Computation: Practice and Experience, vol. 29, no. 8, article no. e3987, 2017.
3	P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar, "Row-diagonal parity for double disk failure correction," in Proceeding of the 3rd USENIX Conference on File and Storage Technologies, San Francisco, CA, 2004, pp. 1-14.
4	M. Blaum, J. Brady, J. Bruck, and J. Menon, "EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures," IEEE Transactions on Computers, vol. 44, no. 2, pp. 192-202, 1995. DOI
5	J. S. Plank, "The RAID-6 Liber8Tion Code," The International Journal of High Performance Computing Applications, vol. 23, no. 3, pp. 242-251, 2009. DOI
6	D. Narayanan, A. Donnelly, and A. Rowstron, "Write off-loading: practical power management for enterprise storage," in Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), San Jose, CA, 2008, pp. 253-267.
7	S. Wu, H. Jiang, D. Feng, L. Tian, and B. Mao, "Workout: I/O workload outsourcing for boosting RAID reconstruction performance," in Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST), San Francisco, CA, 2009, pp. 239-252.
8	The DiskSim simulation environment (v4.0) [Online]. Available: http://www.pdl.cmu.edu/DiskSim/index.shtml.
9	Y. Fu, J. Shu, X. Luo, Z. Shen, and Q. Hu, "Short code: an efficient RAID-6 MDS code for optimizing degraded reads and partial stripe writes," IEEE Transactions on Computers, vol. 66, no. 1, pp. 127-137, 2017. DOI
10	C. Huang and L. Xu, "STAR: an efficient coding scheme for correcting triple storage node failures," IEEE Transactions on Computers, vol. 57, no. 7, pp. 889-901, 2008. DOI
11	P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson, "RAID: high-performance, reliable secondary storage," ACM Computing Surveys (CSUR), vol. 26, no. 2, pp. 145-185, 1994. DOI
12	L. Xiang, Y. Xu, J. Lui, and Q. Chang, "Optimal recovery of single disk failure in RDP code storage," ACM SIGMETRICS Performance Evaluation Review, vol. 38, no. 1, pp. 119-130, 2010. DOI
13	Z. Wang, A. G. Dimakis, and J. Bruck, "Rebuilding for array codes in distributed storage systems," in Proceedings of 2010 IEEE Globecom Workshops, Miami, FL, 2010, pp. 1905-1909.
14	Z. Shen, J. Shu, P.P. Lee, and Y. Fu, "Seek-efficient I/O optimization in single failure recovery for XOR-coded storage systems," IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 3, pp. 877-890, 2017. DOI
15	Z. Shen, P.P. Lee, J. Shu, and W. Guo, "Encoding-aware data placement for efficient degraded reads in XORcoded storage systems," in Proceedings of IEEE 35th Symposium on Reliable Distributed Systems, Budapest, Hungary, 2016, pp. 239-248.
16	O. Khan, R. C. Burns, J. S. Plank, and C. Huang, "In search of I/O optimal recovery from disk failures," in Proceedings of the 3rd USENIXCconference on Hot Topics in Storage and File Systems, Portland, OR, 2011.
17	P. Nakkiran, K. V. Rashmi, and K. Ramchandran, "Optimal systematic distributed storage codes with fast encoding," in Proceedings of IEEE International Symposium on Information Theory, Barcelona, Spain, 2016, pp. 430-434.
18	M. Itani, S. Sharafeddine, and I. Elkabbani, "Practical single node failure recovery using fractional repetition codes in data centers," in Proceedings of IEEE 30th International Conference on Advanced Information Networking and Applications (AINA), Crans-Montana, Switzerland, 2016, pp. 762-768.
19	B. Sung and C. Park, "Fast reconstruction for degraded reads and recovery process in primary array storage systems," IEICE Transactions on Information and Systems, vol. 100, no. 2, pp. 294-303, 2017. DOI
20	O. Khan, R. C. Burns, J. S. Plank, W. Pierce, and C. Huang, "Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads," in Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST), San Jose, CA, 2012.
21	Y. Zhu, J. Lin, P. P. Lee, and Y. Xu, "Boosting degraded reads in heterogeneous erasure-coded storage systems," IEEE Transaction on Computers, vol. 64, no. 8, pp. 2145-2157, 2015. DOI
22	K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran, "A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the Facebook warehouse cluster," in Proceedings of the 5th USENIX Conference on Hot Topics in Storage and File System, Berkeley, CA, 2013, pp. 8-13.
23	M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, "XORing elephants: novel erasure codes for big data," Proceedings of the VLDB Endowment, vol. 6, no. 5, pp. 325-336, 2013. DOI
24	K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran, "A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers," ACM SIGCOMM Computer Communication Review, vol. 44, no. 4, pp. 331-342, 2015. DOI
25	Z. Shen, J. Shu, and P. P. Lee, "Reconsidering single failure recovery in clustered file systems," in Proceedings of 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Toulouse, France, 2016, pp. 323-334.