Browse > Article
http://dx.doi.org/10.3837/tiis.2015.09.014

A Network Load Sensitive Block Placement Strategy of HDFS  

Meng, Lingjun (School of Computer Science and Technology, Henan Polytechnic University)
Zhao, Wentao (School of Computer Science and Technology, Henan Polytechnic University)
Zhao, Haohao (School of Computer Science and Technology, Henan Polytechnic University)
Ding, Yang (School of Computer Science and Technology, Henan Polytechnic University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.9, no.9, 2015 , pp. 3539-3558 More about this Journal
Abstract
This paper investigates and analyzes the default block placement strategy of HDFS. HDFS is a typical representative distributed file system to stream vast amount of data effectively at high bandwidth to user applications. However, the default HDFS block placement policy assumes that all nodes in the cluster are homogeneous, and places blocks with a simple RoundRobin strategy without considering any nodes' resource characteristics, which decreases self-adaptability of the system. The primary contribution of this paper is the proposition of a network load sensitive block placement strategy. We have implemented our algorithm and justify it through extensive simulations and comparison with similar existing studies. The results indicate that our work not only performs much better in the data distribution but also improves write performance more significantly than the others.
Keywords
HDFS; block placement; network load; imbalance; load balance;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 X. L. Ye, M. X. Huang, D. H. Zhu and P. Xu, "A Novel Blocks Placement Strategy for Hadoop," in Proc. of 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, pp. 3-7, May 30-June 1,2012. Article (CrossRef Link) .
2 Y. W. Wang, C. Ma, W. P. Wang and D. Meng, “An approach of fast data manipulation in HDFS with supplementary mechanisms,” Journal of Supercomputing, vol. 71, no. 5, pp. 1736-1753, May, 2015. Article (CrossRef Link).   DOI
3 H. Zhuo, Z. Sheng and N. H. Yu, “A privacy-preserving remote data integrity checking protocol with data dynamics and public verifiability,” IEEE Transactions on Knowledge and Data Engineering, vol.23, no.9, pp. 1432-1437, March,2011. Article (CrossRef Link) .   DOI
4 J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," in Proc. of 6th Symposium on Operating System Design and Implementation (OSDI), pp. 137-150, December, 2004. Article (CrossRef Link) .
5 J. Xie, S. Yin, X. J. Ruan, Z. Y. Ding, Y. Tian, J. Majors, A. Manzanares, and X. Qin, "Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters," in Proc. of the 2010 IEEE International Symposium on Parallel and Distributed Processing, pp. 1-9, April 19-23 ,2010. Article (CrossRef Link) .
6 W. W. Lin and B. Liu, “Hadoop data load balancing method based on dynamic bandwidth allocation,” Journal of South China University of Technology (Natural Science), vol.40, no.9, pp. 42-47, September ,2012. Article (CrossRef Link) .
7 N. M. Patel, N. M. Patel, M. I. Hasan, P. D. Shah and M. M. Patel, "Improving HDFS write performance using efficient replica placement," in Proc. of the 5th International Conference on Confluence 2014, pp. 36-39, September 25-26,2014. Article (CrossRef Link) .
8 H. Rahmawan and Y. S. Gondokaryono, "The simulation of static load balancing algorithms," in Proc. of the 2009 International Conference on Electrical Engineering and Informatics, pp. 640-645, August 5-7, 2009. Article (CrossRef Link) .
9 O. Khan, R. Burns, J. Plank, W. Pierce and C. Huang, "Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads," in Proc. of Conference on File and Storage Technologies (FAST), pp. 1-14, February 14-17,2012. Article (CrossRef Link) .
10 H. H. Le, S. Hikida and H.Yokota, “Accordion: An efficient gear-shifting for a power-proportional distributed data-placement method,” IEICE Transactions on Information and Systems, vol.98, no.5, pp. 1013-1026, May, 2015. Article (CrossRef Link) .   DOI
11 K. Liu, G. C. Xu, and J. Yuan, “An Improved Hadoop Data Load Balancing Algorithm,” JOURNAL OF NETWORKS, vol.8, no.12, pp. 2816-2822, December, 2013. Article (CrossRef Link) .
12 K. Shvachko, H. R. Kuang, S. Radia and R. Chansler, "The Hadoop Distributed File System," in Proc. of 2010 IEEE 26th Symposium on MSST, pp. 1-10, May 6-7, 2010. Article (CrossRef Link) .
13 P. P. Hung, M. A and E-N. H, “CTaG: An innovative approach for optimizing recovery time in cloud environment,” KSII Transactions on Internet and Information Systems, vol.9, no.4, pp. 1282-1301, April, 2015. Article (CrossRef Link) .   DOI
14 G. Sanjay, G. Howard and S-T. Leung, “The google file system,” Operating Systems Review (ACM), vol.37, no.5, pp. 29-43, October, 2003. Article (CrossRef Link) .   DOI
15 X. L. Shao, Y. G. Wang, Y. L. Li and Y. W. Liu, “Replication Placement Strategy of Hadoop,” CAAI Transactions on Intelligent Systems, vol.8, no.6, pp.489-496, January, 2013. Article (CrossRef Link) .
16 Understanding Hadoop Clusters and the Network, http://bradhedlund.com/.
17 Z. D. Cheng, Z. Z. Luan, Y. Meng, Y. J. Xu and D. P. Qian, "ERMS: An Elastic Replication Management System for HDFS," in Proc. of 2012 IEEE International Conference on Cluster Computing Workshops, pp. 32-40, September 24-28, 2012. Article (CrossRef Link) .
18 Q. S. Wei, B. Veeravalli, B. Z. Gong, L. F. Zeng and D. Feng, "CDRM: A Cost-effective Dynamic Replication Management Scheme for Cloud Storage Cluster," in Proc. of 2010 IEEE International Conference on Cluster Computing, pp. 188-196, September 20-24, 2010. Article (CrossRef Link) .