[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2018.11.005

An Empirical Performance Analysis on Hadoop via Optimizing the Network Heartbeat Period

Lee, Jaehwan (School of Electronics and Information Engineering, Korea Aerospace University)
Choi, June (School of Electronics and Information Engineering, Korea Aerospace University)
Roh, Hongchan (SK Telecom)
Shin, Ji Sun (Department of Computer and Information Security, Sejong University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.12, no.11, 2018 , pp. 5252-5268 More about this Journal

Abstract

To support a large-scale Hadoop cluster, Hadoop heartbeat messages are designed to deliver the significant messages, including task scheduling and completion messages, via piggybacking to reduce the number of messages received by the NameNode. Although Hadoop is designed and optimized for high-throughput computing via batch processing, the real-time processing of large amounts of data in Hadoop is increasingly important. This paper evaluates Hadoop's performance and costs when the heartbeat period is controlled to support latency sensitive applications. Through an empirical study based on Hadoop 2.0 (YARN) architecture, we improve Hadoop's I/O performance as well as application performance by up to 13 percent compared to the default configuration. We offer a guideline that predicts the performance, costs and limitations of the total system by controlling the heartbeat period using simple equations. We show that Hive performance can be improved by tuning Hadoop's heartbeat periods through extensive experiments.

Keywords

Hadoop; Heartbeat; Hadoop Ecosystem; Hive; TPC-H; Terasort Benchmark;

Citations & Related Records

Reference

1	A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, "Hive - A Warehousing Solution over a Map-reduce Framework," in Proc. of Very Large Data Bases, August '09, vol. 2, issue.2, p. 1626-1629, 2009.
2	D. Heger, "Hadoop Performance Tuning - A Pragmatic & Iterative Approach."
3	Jung Kyu Park, "Improving the performance of HDFS by reducing I/O using adaptable I/O," in Proc. of International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016.
4	J. Lofstead, S. Klasky, K. Schwan, N. Podhorszki and C. Jin, "Flexible io and integration for scientific codes through the adaptable io system(adios)," in Proc. of CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments, p. 15-24, 2008.
5	H. Herodotou et al., "Starfish: A Self-tuning System for Big Data Analytics," in Proc. of 5th Biennial Conference on Innovative Data Systems Research (CIDR 11), pp. 261-272, Jan. 2011.
6	K. Wang, X. Lin and W. Tang, "Predator - An experience guided configuration optimizer for Hadoop MapReduce," in Proc. of 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 2012.
7	J. Yan, X. Yang, C. Yuan, and Y. Huang, "Performance Optimization for Short MapReduce Job Execution in Hadoop," IEEE, 2012.
8	H. Zhu, H. Chen, "Adaptive failure detection via heartbeat under Hadoop," IEEE, 2011.
9	Transaction Processing Performance Council. TPC-H Benchmark Specification.
10	TPC-H-Hive. [Online]
11	V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O. Malley, S. Radia, B. Reed, and E. Baldeschwieler, "Apache Hadoop YARN: yet another resource negotiator," in Proc. of G. M Lohman, editor, ACM Symposium on Cloud Computing '13, no.5, 2013.
12	K. V Shavachko, Hairong Kuang and Sanjay Radia, "The Hadoop Distributed File System," in Proc. of MSST, 2010 IEEE 26th Symposium, 2010.