Browse > Article
http://dx.doi.org/10.5392/JKCA.2013.13.11.021

A Hot-Data Replication Scheme Based on Data Access Patterns for Enhancing Processing Speed of MapReduce  

Son, Ingook (충북대학교 정보통신공학부)
Ryu, Eunkyung (충북대학교 정보통신공학부)
Park, Junho (충북대학교 정보통신공학부)
Bok, Kyoungsoo (충북대학교 정보통신공학부)
Yoo, Jaesoo (충북대학교 정보통신공학부)
Publication Information
Abstract
In recently years, with the growth of social media and the development of mobile devices, the data have been significantly increased. Hadoop has been widely utilized as a typical distributed storage and processing framework. The tasks in Mapreduce based on the Hadoop distributed file system are allocated to the map as close as possible by considering the data locality. However, there are data being requested frequently according to the data analysis tasks of Mapreduce. In this paper, we propose a hot-data replication mechanism to improve the processing speed of Mapreduce according to data access patterns. The proposed scheme reduces the task processing time and improves the data locality using the replica optimization algorithm on the high access frequency of hot data. It is shown through performance evaluation that the proposed scheme outperforms the existing scheme in terms of the load of access frequency.
Keywords
Distributed Computing; MapReduce; Hadoop; Data Locality; Hot-Data;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 J. Dittrich and J. Quiane-Ruiz, "Efficient Big Data Processing in Hadoop MapReduce," Proc. of the VLDB Endowment, Vol.5, No.12, pp.2014-2015, 2012.
2 J. Cohen, J. Dolan, M. Dunlap, J. Hellerstein, and C. Welton, "MAD Skills: New Analysis Practices for Big Data," Proc. of the VLDB Endowment, Vol.2, No.2, pp.1481-1492, 2009.
3 http://hadoop.apache.org.
4 K. Shvachko, H. Huang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," Proc. of the IEEE Symposium on Massive Storage Systems, pp.1-10, 2010.
5 J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communication of the ACM, Vol.81, No.1, pp.107-113, 2008.
6 F. N. Afrati and J. D. Ullman, "Optimizing Joins in a Map-reduce Environment," Proc. of the International Conference on Extending Database Technology(EDBT '10), pp.99-110, 2010.
7 I. Hwang, K. Jung, K. Im, and J. Lee, "Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling," Journal of the Korea Contents Association, Vol.10, No.10, pp.78-85, 2010.   과학기술학회마을   DOI   ScienceOn
8 H.-C. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker, "Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters," Proc. of the ACM SIGMOD International Conference on Management of Data, pp.1029-1040, 2007.
9 S. Ghemawat, H. Gobioff, and S. Leung. "The Google File System," Proc. of ACM Symposium on Operating Systems Principles, pp.29-43, 2003.
10 H. Zhao, S. Yang, Z. Chen, S. Jin, H. Yin, and L. Li, "MapReduce Model-Based Optimization of Range Queries," Proc. of the International Conference on Fuzzy Systems and Knowledge Discovery(FSKD '12), pp.2487-2492, 2012.