[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9708/jksci.2014.19.11.009

A Block Relocation Algorithm for Reducing Network Consumption in Hadoop Cluster

Kim, Jun-Sang (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
Kim, Chang-Hyeon (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
Lee, Won-Joo (Dept. of Computer Science, Inha Technical College)
Jeon, Chang-Ho (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)

Publication Information

Journal of the Korea Society of Computer and Information / v.19, no.11, 2014 , pp. 9-15 More about this Journal

Abstract

In this paper, We propose a block reallocation algorithm for reducing network traffic in Hadoop cluster. The scheduler of Hadoop cluster receives a job from users. And the job is divided into multiple tasks assigned to nodes. At this time, the scheduler allocates the task to the node that satisfied data locality. If a task is assigned to the node that does not have the data(block) to be processed, the task is processed after the data transmission from another node. There is difference of workload among nodes because blocks in cluster have different access frequency. Therefore, the proposed algorithm relocates blocks according to the task allocation pattern of Hadoop scheduler. Eventually, workload of nodes are leveled, and the case of the task processing in a node that does not have the block to be processing is reduced. Thus, the network traffic of the cluster is also reduced. We evaluate the proposed block reallocation algorithm by a simulation. The simulation result shows maximum 23.3% reduction of network consumption than default delay scheduling for jobs processing.

Keywords

Hadoop Cluster; Hadoop Scheduler; Block Relocation;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
2	Zaharia, Matei, et al. "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling." Proceedings of the 5th European conference on Computer systems. ACM, 2010.
3	Borthakur, Dhruba. "The hadoop distributed file system: Architecture and design." Hadoop Project Website 11 (2007): 21.
4	Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS Operating Systems Review. Vol. 37. No. 5. ACM, 2003.
5	Zaharia, Matei, et al. "Improving MapReduce Performance in Heterogeneous Environments." OSDI. Vol. 8. No. 4. 2008.
6	Tae Hoon Keum, Won Joo Lee, Chang Ho Jeon, "Design and Implementation of a Monitor for Hadoop Cluster," Journal of the Institute of Electronics and Information Engineers, Vol 41, No 1, pp. 8-15, 2012.
7	Cameron, David G., et al. "Evaluating scheduling and replica optimisation strategies in OptorSim." Proceedings of the 4th International Workshop on Grid Computing. IEEE Computer Society, 2003.
8	Ranganathan, Kavitha, and Ian Foster. "Simulation studies of computation and data scheduling algorithms for data grids." Journal of Grid Computing 1.1 (2003): 53-62. DOI
9	Tang, Ming, et al. "Dynamic replication algorithms for the multi-tier data grid." Future Generation Computer Systems 21.5 (2005): 775-790. DOI ScienceOn
10	Lee, Ming-Chang, Fang-Yie Leu, and Ying-ping Chen. "PFRF: An adaptive data replication algorithm based on star-topology data grids." Future Generation Computer Systems 28.7 (2012): 1045-1057. DOI ScienceOn
11	Jeong-Hyeok Park, Sang-Yeol Lee, Da-Hyun Kang, Joong-Ho Won, "Hadoop and MapReduce," Journal of the Korean Data & Information Science Society, Vol 24, No 5, pp. 1013-1027, 2013. 과학기술학회마을 DOI ScienceOn
12	Apache Hadoop http://hadoop.apache.org
13	Olson, Mike. "Hadoop: Scalable, flexible data storage and analysis." IQT Quarterly 1.3 (2010): 14-18.
14	Zaharia, Matei. "Job scheduling with the fair and capacity schedulers." Hadoop Summit 9 (2009).

KSCI

A Block Relocation Algorithm for Reducing Network Consumption in Hadoop Cluster 하둡 클러스터의 네트워크 사용량 감소를 위한 블록 재배치 알고리즘

A Block Relocation Algorithm for Reducing Network Consumption in Hadoop Cluster