DOI QR코드

DOI QR Code

A Block Relocation Algorithm for Reducing Network Consumption in Hadoop Cluster

하둡 클러스터의 네트워크 사용량 감소를 위한 블록 재배치 알고리즘

  • Kim, Jun-Sang (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus) ;
  • Kim, Chang-Hyeon (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus) ;
  • Lee, Won-Joo (Dept. of Computer Science, Inha Technical College) ;
  • Jeon, Chang-Ho (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
  • 김준상 (한양대학교 컴퓨터공학과) ;
  • 김창현 (한양대학교 컴퓨터공학과) ;
  • 이원주 (인하공업전문대학 컴퓨터정보과) ;
  • 전창호 (한양대학교 컴퓨터공학과)
  • Received : 2014.08.21
  • Accepted : 2014.10.13
  • Published : 2014.11.29

Abstract

In this paper, We propose a block reallocation algorithm for reducing network traffic in Hadoop cluster. The scheduler of Hadoop cluster receives a job from users. And the job is divided into multiple tasks assigned to nodes. At this time, the scheduler allocates the task to the node that satisfied data locality. If a task is assigned to the node that does not have the data(block) to be processed, the task is processed after the data transmission from another node. There is difference of workload among nodes because blocks in cluster have different access frequency. Therefore, the proposed algorithm relocates blocks according to the task allocation pattern of Hadoop scheduler. Eventually, workload of nodes are leveled, and the case of the task processing in a node that does not have the block to be processing is reduced. Thus, the network traffic of the cluster is also reduced. We evaluate the proposed block reallocation algorithm by a simulation. The simulation result shows maximum 23.3% reduction of network consumption than default delay scheduling for jobs processing.

본 논문에서는 하둡 클러스터의 네트워크 사용량 감소를 위한 블록 재배치 알고리즘을 제안한다. 하둡 클러스터의 스케줄러는 사용자들에게 작업을 받아 다중 태스크로 작업을 나누어서 각 노드들에게 할당한다. 이 때 스케줄러는 데이터 지역성(Data locality)을 만족시키는 노드에 우선적으로 태스크를 할당한다. 만약 처리할 데이터(블록)가 없는 노드에 태스크가 할당되면 다른 노드로부터 전송받아 처리한다. 클러스터의 블록들은 사용 빈도가 각각 다르기 때문에 노드 간 작업 부하의 차이가 발생하며 이로 인해 노드 간 데이터 전송이 빈번해진다. 그래서 제안하는 블록 재배치 알고리즘은 하둡 스케줄러의 작업 할당 패턴에 따라 블록들을 균등하게 재배치한다. 결국 노드들의 작업부하는 평준화 되고 처리할 블록이 없는 노드에서 태스크를 처리하는 경우가 감소하기 때문에 클러스터의 네트워크 트래픽이 감소한다. 시뮬레이션으로 제안하는 블록 재배치 정책의 성능평가를 진행했으며 기본 지연 스케줄링으로 작업을 처리한 경우와 비교하여 최대 23.3%의 네트워크 사용량 감소를 보였다.

Keywords

References

  1. Jeong-Hyeok Park, Sang-Yeol Lee, Da-Hyun Kang, Joong-Ho Won, "Hadoop and MapReduce," Journal of the Korean Data & Information Science Society, Vol 24, No 5, pp. 1013-1027, 2013. https://doi.org/10.7465/jkdi.2013.24.5.1013
  2. Apache Hadoop http://hadoop.apache.org
  3. Olson, Mike. "Hadoop: Scalable, flexible data storage and analysis." IQT Quarterly 1.3 (2010): 14-18.
  4. Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
  5. Borthakur, Dhruba. "The hadoop distributed file system: Architecture and design." Hadoop Project Website 11 (2007): 21.
  6. Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS Operating Systems Review. Vol. 37. No. 5. ACM, 2003.
  7. Zaharia, Matei. "Job scheduling with the fair and capacity schedulers." Hadoop Summit 9 (2009).
  8. Zaharia, Matei, et al. "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling." Proceedings of the 5th European conference on Computer systems. ACM, 2010.
  9. Zaharia, Matei, et al. "Improving MapReduce Performance in Heterogeneous Environments." OSDI. Vol. 8. No. 4. 2008.
  10. Tae Hoon Keum, Won Joo Lee, Chang Ho Jeon, "Design and Implementation of a Monitor for Hadoop Cluster," Journal of the Institute of Electronics and Information Engineers, Vol 41, No 1, pp. 8-15, 2012.
  11. Cameron, David G., et al. "Evaluating scheduling and replica optimisation strategies in OptorSim." Proceedings of the 4th International Workshop on Grid Computing. IEEE Computer Society, 2003.
  12. Ranganathan, Kavitha, and Ian Foster. "Simulation studies of computation and data scheduling algorithms for data grids." Journal of Grid Computing 1.1 (2003): 53-62. https://doi.org/10.1023/A:1024035627870
  13. Tang, Ming, et al. "Dynamic replication algorithms for the multi-tier data grid." Future Generation Computer Systems 21.5 (2005): 775-790. https://doi.org/10.1016/j.future.2004.08.001
  14. Lee, Ming-Chang, Fang-Yie Leu, and Ying-ping Chen. "PFRF: An adaptive data replication algorithm based on star-topology data grids." Future Generation Computer Systems 28.7 (2012): 1045-1057. https://doi.org/10.1016/j.future.2011.08.015