DOI QR코드

DOI QR Code

An Empirical Performance Analysis on Hadoop via Optimizing the Network Heartbeat Period

  • Lee, Jaehwan (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Choi, June (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Roh, Hongchan (SK Telecom) ;
  • Shin, Ji Sun (Department of Computer and Information Security, Sejong University)
  • Received : 2018.03.21
  • Accepted : 2018.06.21
  • Published : 2018.11.30

Abstract

To support a large-scale Hadoop cluster, Hadoop heartbeat messages are designed to deliver the significant messages, including task scheduling and completion messages, via piggybacking to reduce the number of messages received by the NameNode. Although Hadoop is designed and optimized for high-throughput computing via batch processing, the real-time processing of large amounts of data in Hadoop is increasingly important. This paper evaluates Hadoop's performance and costs when the heartbeat period is controlled to support latency sensitive applications. Through an empirical study based on Hadoop 2.0 (YARN) architecture, we improve Hadoop's I/O performance as well as application performance by up to 13 percent compared to the default configuration. We offer a guideline that predicts the performance, costs and limitations of the total system by controlling the heartbeat period using simple equations. We show that Hive performance can be improved by tuning Hadoop's heartbeat periods through extensive experiments.

Keywords

References

  1. V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O. Malley, S. Radia, B. Reed, and E. Baldeschwieler, "Apache Hadoop YARN: yet another resource negotiator," in Proc. of G. M Lohman, editor, ACM Symposium on Cloud Computing '13, no.5, 2013.
  2. K. V Shavachko, Hairong Kuang and Sanjay Radia, "The Hadoop Distributed File System," in Proc. of MSST, 2010 IEEE 26th Symposium, 2010.
  3. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, "Hive - A Warehousing Solution over a Map-reduce Framework," in Proc. of Very Large Data Bases, August '09, vol. 2, issue.2, p. 1626-1629, 2009.
  4. D. Heger, "Hadoop Performance Tuning - A Pragmatic & Iterative Approach."
  5. Jung Kyu Park, "Improving the performance of HDFS by reducing I/O using adaptable I/O," in Proc. of International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016.
  6. J. Lofstead, S. Klasky, K. Schwan, N. Podhorszki and C. Jin, "Flexible io and integration for scientific codes through the adaptable io system(adios)," in Proc. of CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments, p. 15-24, 2008.
  7. H. Herodotou et al., "Starfish: A Self-tuning System for Big Data Analytics," in Proc. of 5th Biennial Conference on Innovative Data Systems Research (CIDR 11), pp. 261-272, Jan. 2011.
  8. K. Wang, X. Lin and W. Tang, "Predator - An experience guided configuration optimizer for Hadoop MapReduce," in Proc. of 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 2012.
  9. J. Yan, X. Yang, C. Yuan, and Y. Huang, "Performance Optimization for Short MapReduce Job Execution in Hadoop," IEEE, 2012.
  10. H. Zhu, H. Chen, "Adaptive failure detection via heartbeat under Hadoop," IEEE, 2011.
  11. Transaction Processing Performance Council. TPC-H Benchmark Specification.
  12. TPC-H-Hive. [Online]