Browse > Article
http://dx.doi.org/10.13067/JKIECS.2020.15.1.107

Scaling of Hadoop Cluster for Cost-Effective Processing of MapReduce Applications  

Ryu, Woo-Seok (Dept. of Health Care Management, Catholic University of Pusan)
Publication Information
The Journal of the Korea institute of electronic communication sciences / v.15, no.1, 2020 , pp. 107-114 More about this Journal
Abstract
This paper studies a method for estimating the scale of a Hadoop cluster to process big data as a cost-effective manner. In the case of medical institutions, demands for cloud-based big data analysis are increasing as medical records can be stored outside the hospital. This paper first analyze the Amazon EMR framework, which is one of the popular cloud-based big data framework. Then, this paper presents a efficiency model for scaling the Hadoop cluster to execute a Mapreduce application more cost-effectively. This paper also analyzes the factors that influence the execution of the Mapreduce application by performing several experiments under various conditions. The cost efficiency of the analysis of the big data can be increased by setting the scale of cluster with the most efficient processing time compared to the operational cost.
Keywords
Hadoop; Cluster; Efficiency; Mapreduce; Cloud;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 Y. Ding and K. Kim, "A Customized Tourism System Using Log Data on Hadoop," J. of the Korea Institute of Electronic Communication Sciences, vol. 13, no. 2, Apr. 2018, pp. 397-404.   DOI
2 E. Nazari, M. H. Shahriari, and H. Tabesh, "Big Data Analysis in Healthcare: Apache Hadoop, Apache spark and Apache Flink," Frontiers in Health Informatics, vol. 8, no. 1, 2019, pp. 92-101.
3 J. Choi, "Utilization value of medical Big Data created in operation of medical information system," J. of the Korea Institute of Electronic Communication Sciences, vol. 10, no. 12, Dec. 2015, pp. 1403-1410.   DOI
4 Y. Ahn and H. Cho, "Hospital System Model for Personalized Medical Service," J. of the Korea Convergence Society, vol. 8, no. 12, Dec. 2017, pp. 77-84.   DOI
5 S. Kim and D. Kim, "The Design and Implementation of the Fire Spot Display System Using s Smart Device," J. of the Korea Institute of Electronic Communication Sciences, vol. 13, no. 6, Dec. 2018, pp. 1287-1292.   DOI
6 M. Lee, "Considerations for the Migration of Electronic Medical Records to Cloud Based Storage," J. of Korean Library and Information Science, vol. 47, no. 1, Mar. 2016, pp. 149-173.
7 M. Copeland, J. Soh, A. Puca, M. Manning, and D. Gollob, Microsoft Azure. Berkeley: Apress, 2015.
8 T. Gunarathne, T. Wu, J. Qiu, and G. Fox, "MapReduce in the Clouds for Science," In Proc. the IEEE Cloud Computing Technology and Science, Indianapolis, USA, 2010, pp. 565-572.
9 S. Mathew, "Overview of Amazon Web Services," Amazon Whitepapers, Nov. 2014.
10 W. Ryu, "Cost-Effective MapReduce Processing in the Cloud," In Proc. the Conf. on Korea Information and Communication Engineering, vol. 22, no. 2, Oct. 2018, pp. 114-115.
11 A. Sharma and G. Singh, "A Review on Data locality in Hadoop MapReduce," In 2018 Fifth Int. Conf. on Parallel, Distributed and Grid Computing, Solan Himachal Pradesh, India, Dec. 2018, pp. 723-728.
12 S. Kim, Y. Kim, and W. Kim, "The Design of Method for Efficient Processing of Small Files in the Distributed System based on Hadoop Framework," J. of the Korea Institute of Electronic Communication Sciences, vol. 10, no. 10, Oct. 2015, pp. 1115-1122.   DOI