An Empirical Performance Analysis on Hadoop via Optimizing the Network Heartbeat Period

Lee, Jaehwan;Choi, June;Roh, Hongchan;Shin, Ji Sun;

doi:10.3837/tiis.2018.11.005

KSII Transactions on Internet and Information Systems (TIIS)

Volume 12 Issue 11
/
Pages.5252-5268
/
2018
/
1976-7277(pISSN)
/
1976-7277(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

An Empirical Performance Analysis on Hadoop via Optimizing the Network Heartbeat Period

Lee, Jaehwan (School of Electronics and Information Engineering, Korea Aerospace University) ;
Choi, June (School of Electronics and Information Engineering, Korea Aerospace University) ;
Roh, Hongchan (SK Telecom) ;
Shin, Ji Sun (Department of Computer and Information Security, Sejong University)

Received : 2018.03.21
Accepted : 2018.06.21
Published : 2018.11.30

https://doi.org/10.3837/tiis.2018.11.005 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

To support a large-scale Hadoop cluster, Hadoop heartbeat messages are designed to deliver the significant messages, including task scheduling and completion messages, via piggybacking to reduce the number of messages received by the NameNode. Although Hadoop is designed and optimized for high-throughput computing via batch processing, the real-time processing of large amounts of data in Hadoop is increasingly important. This paper evaluates Hadoop's performance and costs when the heartbeat period is controlled to support latency sensitive applications. Through an empirical study based on Hadoop 2.0 (YARN) architecture, we improve Hadoop's I/O performance as well as application performance by up to 13 percent compared to the default configuration. We offer a guideline that predicts the performance, costs and limitations of the total system by controlling the heartbeat period using simple equations. We show that Hive performance can be improved by tuning Hadoop's heartbeat periods through extensive experiments.

Keywords

References

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O. Malley, S. Radia, B. Reed, and E. Baldeschwieler, "Apache Hadoop YARN: yet another resource negotiator," in Proc. of G. M Lohman, editor, ACM Symposium on Cloud Computing '13, no.5, 2013.
K. V Shavachko, Hairong Kuang and Sanjay Radia, "The Hadoop Distributed File System," in Proc. of MSST, 2010 IEEE 26th Symposium, 2010.
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, "Hive - A Warehousing Solution over a Map-reduce Framework," in Proc. of Very Large Data Bases, August '09, vol. 2, issue.2, p. 1626-1629, 2009.
D. Heger, "Hadoop Performance Tuning - A Pragmatic & Iterative Approach."
Jung Kyu Park, "Improving the performance of HDFS by reducing I/O using adaptable I/O," in Proc. of International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016.
J. Lofstead, S. Klasky, K. Schwan, N. Podhorszki and C. Jin, "Flexible io and integration for scientific codes through the adaptable io system(adios)," in Proc. of CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments, p. 15-24, 2008.
H. Herodotou et al., "Starfish: A Self-tuning System for Big Data Analytics," in Proc. of 5th Biennial Conference on Innovative Data Systems Research (CIDR 11), pp. 261-272, Jan. 2011.
K. Wang, X. Lin and W. Tang, "Predator - An experience guided configuration optimizer for Hadoop MapReduce," in Proc. of 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 2012.
J. Yan, X. Yang, C. Yuan, and Y. Huang, "Performance Optimization for Short MapReduce Job Execution in Hadoop," IEEE, 2012.
H. Zhu, H. Chen, "Adaptive failure detection via heartbeat under Hadoop," IEEE, 2011.
Transaction Processing Performance Council. TPC-H Benchmark Specification.
TPC-H-Hive. [Online]

KSII Transactions on Internet and Information Systems (TIIS)

An Empirical Performance Analysis on Hadoop via Optimizing the Network Heartbeat Period

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)