Browse > Article
http://dx.doi.org/10.5392/JKCA.2013.13.09.020

An Efficient Data Replacement Algorithm for Performance Optimization of MapReduce in Non-dedicated Distributed Computing Environments  

Ryu, Eunkyung (충북대학교 정보통신공학부)
Son, Ingook (충북대학교 정보통신공학부)
Park, Junho (충북대학교 정보통신공학부)
Bok, Kyoungsoo (충북대학교 정보통신공학부)
Yoo, Jaesoo (충북대학교 정보통신공학부)
Publication Information
Abstract
In recently years, with the growth of social media and the development of mobile devices, the data have been significantly increased. MapReduce is an emerging programming model that processes large amount of data. However, since MapReduce evenly places the data in the dedicated distributed computing environment, it is not suitable to the non-dedicated distributed computing environment. The data replacement algorithms were proposed for performance optimization of MapReduce in the non-dedicated distributed computing environments. However, they spend much time for date replacement and cause the network load for unnecessary data transmission. In this paper, we propose an efficient data replacement algorithm for the performance optimization of MapReduce in the non-dedicated distributed computing environments. The proposed scheme computes the ratio of data blocks in the nodes based on the node availability model and reduces the network load by transmitting the data blocks considering the data placement. Our experimental results show that the proposed scheme outperforms the existing scheme.
Keywords
Non-dedicated Distributed Computing; MapReduce; Hadoop;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 I. Hwang, K. Jung, K. Im, and J. Lee, "Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling," Journal of the Korea Contents Association, Vol.10, No.10, pp.78-85, 2010.   과학기술학회마을   DOI   ScienceOn
2 http://hadoop.apache.org.
3 K. Shvachko, H. Huang, S. Radia, and R. Chansler, The Hadoop Distributed File System, Proc. of the IEEE Symposium on Massive Storage Systems, pp.1-10, 2010.
4 J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Magazine Communications of the ACM, Vol.51, Issue1, pp.107-113, 2008.
5 D. Werthimer, J. Cobb, M. Lebofsky, D. Anderson, and E. Korpela, "SETI@HOME-Massively Distributed Computing for SETI," Journal of Computing Science and Engineering, Vol.3, No.1, pp.78-83, 2001.
6 D. L. Eager, E.D. Lazowska, and J. Zahor-jan, "Adaptive Load Sharing in Homogeneous Distributed Systems," Journal of Software Engineering, Vol.12, No.5, pp.662-675, 1986.
7 S. T. Leutenegger, X. H. Sun, Distributed Computing Feasibility in a Non-dedicated Homogeneous Distributed System, Proc. of the ACM/IEEE Conference on Supercomputing, pp.143-152, 1993.
8 "SETI@home", http://setiathome.berkeley.edu
9 H. Jin, X. Yang, X. H. Sun, and I. Raicu, ADAPT: Availability-Aware MapReduce Data Placement for Non-Dedicated Distributed Computing, Proc. of IEEE International Conference on Distributed Computing Systems, pp.516-525, 2012.
10 S. Ghemawat, H. Gobioff, and S. Leung, The Google File System, Proc. of ACM Symposium on Operating Systems Principles, pp.29-43, 2003.
11 J. Dittrich and J. Quiane-Ruiz, Efficient big data processing in Hadoop MapReduce, Proc. of the VLDB Endowment, pp.2014-2014, 2012.