Browse > Article
http://dx.doi.org/10.1109/JCN.2015.000105

HTSC and FH HTSC: XOR-based Codes to Reduce Access Latency in Distributed Storage Systems  

Shuai, Qiqi (Department of Electrical and Electronic Engineering, The University of Hong Kong)
Li, Victor O.K. (Department of Electrical and Electronic Engineering, The University of Hong Kong)
Publication Information
Abstract
A massive distributed storage system is the foundation for big data operations. Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this paper we design new XOR-based erasure codes hierarchical tree structure code (HTSC) and high failure tolerant HTSC (FH HTSC) to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time.
Keywords
Access latency; computation cost; erasure codes; failure tolerance; repair cost; storage overhead;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Ghemawat, H. Gobioff, and S. T. Leung, "The Google file system," ACM SIGOPS Operating Systems Review, vol. 37, pp. 29-43, 1997.
2 J. Li and B. Li, "Erasure coding for cloud storage systems: A survey," Tsinghua Science and Technology, vol. 18, pp. 259-272, 2013.   DOI
3 A. G. Dimakis et al., "A survey on network codes for distributed storage," Proc. IEEE, vol. 99, 2011, pp. 476-489.   DOI
4 C. Huang et al., "Erasure Coding in Windows Azure Storage," in Proc. USENIX ATC, (Boston, USA), 2012, pp. 15-26.
5 M. Sathiamoorthy et al., "Xoring elephants: Novel erasure codes for big data," in Proc. LDB, (Trento, Italy), 2013, pp. 325-336.
6 A. G. Dimakis et al., "Network coding for distributed storage systems," IEEE Trans. Inf. Theory, vol. 56, pp. 4539-4551, 2010.   DOI
7 A. Rudra, P. K. Dubey, C. S. Jutla, V. Kumar, J. R. Rao, and P. Rohatgi, "Efficient Rijndael encryption implementation with composite field arithmetic," in Proc. CHES, Springer, pp. 171-184, 2001.
8 M. Foley, "High availability HDFS," in Proc. IEEEMSST, (Asilomar Conference Grounds Pacific Grove, USA), vol. 12, 2012.
9 J. Brutlag, "Speed matters for Google web search," Google, June, 2009.
10 S. B. Wicker and V. K. Bhargava, "Reed-Solomon codes and their applications," John Wiley & Sons, 1999.
11 Q. Shuai, V. O. K. Li, and Y. Zhu, "Performance models of access latency in cloud storage systems," in Fourth Workshop on Architectures and Systems for Big Data, (Minneapolis, USA), June, 2014.
12 K. V. Rashmi, N. B. Shah, and P. V. Kumar, "Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a productmatrix construction," in IEEE Trans. Inf. Theory, vol. 57, pp. 5227-5239, 2011.   DOI
13 V. R. Cadambe et al., "Asymptotic interference alignment for optimal repair of MDS codes in distributed data storage," IEEE Trans. Inf. Theory, vol. 59, pp. 2974-2987, 2013.   DOI
14 A. Duminuco and E. Biersack, "A practical study of regenerating codes for peer-to-peer backup systems," in Proc. IEEE ICDCS, (Montreal, Canada), 2009, pp. 376-384.
15 N. B. Shah et al., "Explicit codes minimizing repair bandwidth for distributed storage," Information Theory Workshop, (Cairo, Egypt), 2010, pp. 1-5.
16 V. R. Cadambe, S. A. Jafar, and H. Maleki, "Distributed data storage with minimum storage regenerating codes-exact and functional repair are asymptotically equally efficient," in arXiv preprint arXiv:1004.4299, 2010.
17 N. B. Shah et al., "Interference alignment in regenerating codes for distributed storage: Necessity and code constructions," IEEE Trans. Inf. Theory, vol. 58, pp. 2134-2158, 2012.   DOI
18 A. Duminuco and E.W. Biersack, "Hierarchical codes: A flexible tradeoff for erasure codes in peer-to-peer storage systems," Peer-to-peer Networking and Applications,, vol. 3, pp. 52-66, 2010.   DOI
19 M. Blaum et al., "Evenodd: An efficient scheme for tolerating double disk failures in raid architectures," IEEE Trans. Computers, vol. 44, pp. 192-202, 1995.   DOI
20 L. Xu and J. Bruck, "X-code: MDS array codes with optimal encoding," IEEE Trans. Computers, vol. 45, pp. 272-276, 1999.
21 P. Corbett et al., "Row-diagonal parity for double disk failure correction," in Proc. 3rd USENIX Conference on File and Storage Technologies, (San Francisco, USA), 2014, pp. 1-14.
22 C. Huang and L. Xu, "Star: An efficient coding scheme for correcting triple storage node failures," IEEE Trans. Computers, vol. 57, pp. 889-901, 2008.   DOI
23 J. L. Hafner, "Weaver codes: Highly fault tolerant erasure codes for storage systems," in FAST, (San Francisco, USA), 2005, pp. 16-16.
24 N. B. Shah, K. Lee, and K. Ramchandran, "When do redundant requests reduce latency?," in Allerton Conf, (Monticello, USA), 2013.
25 L. Huang et al., "Codes can reduce queueing delay in data centers," in Proc. IEEE ISIT, (Cambridge, USA), 2012, pp. 2766-2770.
26 N. B. Shah, K. Lee, and K. Ramchandran, "The MDS queue: Analysing the latency performance of erasure codes," in Proc. IEEE ISIT, (Honolulu, USA), 2014, pp. 861-865.
27 G. Joshi, Y. Liu, and E. Soljanin, "On the delay-storage trade-off in content download from coded distributed storage systems," IEEE J. Sel. Areas Commun., vol. 32, pp. 989-997, 2014.   DOI
28 G. Liang and U. C. Kozat, "Fast Cloud: Pushing the envelope on delay performance of cloud storage with coding," IEEE/ACM Trans. Netw., vol. 22, pp. 2012-2025, 2014.   DOI
29 L. E. Dickson, "Linear Groups: With an exposition of the Galois field theory," Courier Dover Publications, 2003.
30 K. Rashmi et al., "A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster" Presented as part of the 5th USENIX Workshop on Hot Topics in Storage and File Systems, (San Jose, USA), 2013.
31 E. Pinheiro, W. D. Weber, and L. A. Barroso, "Failure trends in a large disk drive population," in FAST, (San Jose, USA), 2007, pp. 17-23.
32 K. M. Greenan, X. Li, and J. J. Wylie, "Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs," in IEEE 26th MSST, (Incline Village, USA), 2010, pp. 1-14.
33 L. Kleinrock, "Queueing Systems: Volume 2: Computer Applications," John Wiley & Sons, New York, 1976.
34 S. Nath et al., "Subtleties in tolerating correlated failures in wide-area storage systems," in NSDI, (San Jose, USA), 2006, pp. 225-238.
35 A. Fikes, "Storage architecture and challenges," Talk at the Faculty Summit, 2010.
36 D. Ford et al., "Availability in globally distributed storage systems," in OSDI, (Vancouver, Canada), 2010, pp. 61-74.
37 D. Borthakur et al., "HDFS RAID," in Hadoop User Group Meeting, 2010.