Browse > Article
http://dx.doi.org/10.9708/jksci.2014.19.11.009

A Block Relocation Algorithm for Reducing Network Consumption in Hadoop Cluster  

Kim, Jun-Sang (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
Kim, Chang-Hyeon (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
Lee, Won-Joo (Dept. of Computer Science, Inha Technical College)
Jeon, Chang-Ho (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
Abstract
In this paper, We propose a block reallocation algorithm for reducing network traffic in Hadoop cluster. The scheduler of Hadoop cluster receives a job from users. And the job is divided into multiple tasks assigned to nodes. At this time, the scheduler allocates the task to the node that satisfied data locality. If a task is assigned to the node that does not have the data(block) to be processed, the task is processed after the data transmission from another node. There is difference of workload among nodes because blocks in cluster have different access frequency. Therefore, the proposed algorithm relocates blocks according to the task allocation pattern of Hadoop scheduler. Eventually, workload of nodes are leveled, and the case of the task processing in a node that does not have the block to be processing is reduced. Thus, the network traffic of the cluster is also reduced. We evaluate the proposed block reallocation algorithm by a simulation. The simulation result shows maximum 23.3% reduction of network consumption than default delay scheduling for jobs processing.
Keywords
Hadoop Cluster; Hadoop Scheduler; Block Relocation;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
2 Zaharia, Matei, et al. "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling." Proceedings of the 5th European conference on Computer systems. ACM, 2010.
3 Borthakur, Dhruba. "The hadoop distributed file system: Architecture and design." Hadoop Project Website 11 (2007): 21.
4 Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS Operating Systems Review. Vol. 37. No. 5. ACM, 2003.
5 Zaharia, Matei, et al. "Improving MapReduce Performance in Heterogeneous Environments." OSDI. Vol. 8. No. 4. 2008.
6 Tae Hoon Keum, Won Joo Lee, Chang Ho Jeon, "Design and Implementation of a Monitor for Hadoop Cluster," Journal of the Institute of Electronics and Information Engineers, Vol 41, No 1, pp. 8-15, 2012.
7 Cameron, David G., et al. "Evaluating scheduling and replica optimisation strategies in OptorSim." Proceedings of the 4th International Workshop on Grid Computing. IEEE Computer Society, 2003.
8 Ranganathan, Kavitha, and Ian Foster. "Simulation studies of computation and data scheduling algorithms for data grids." Journal of Grid Computing 1.1 (2003): 53-62.   DOI
9 Tang, Ming, et al. "Dynamic replication algorithms for the multi-tier data grid." Future Generation Computer Systems 21.5 (2005): 775-790.   DOI   ScienceOn
10 Lee, Ming-Chang, Fang-Yie Leu, and Ying-ping Chen. "PFRF: An adaptive data replication algorithm based on star-topology data grids." Future Generation Computer Systems 28.7 (2012): 1045-1057.   DOI   ScienceOn
11 Jeong-Hyeok Park, Sang-Yeol Lee, Da-Hyun Kang, Joong-Ho Won, "Hadoop and MapReduce," Journal of the Korean Data & Information Science Society, Vol 24, No 5, pp. 1013-1027, 2013.   과학기술학회마을   DOI   ScienceOn
12 Apache Hadoop http://hadoop.apache.org
13 Olson, Mike. "Hadoop: Scalable, flexible data storage and analysis." IQT Quarterly 1.3 (2010): 14-18.
14 Zaharia, Matei. "Job scheduling with the fair and capacity schedulers." Hadoop Summit 9 (2009).