[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5392/JKCA.2013.13.11.021

A Hot-Data Replication Scheme Based on Data Access Patterns for Enhancing Processing Speed of MapReduce

Son, Ingook (충북대학교 정보통신공학부)
Ryu, Eunkyung (충북대학교 정보통신공학부)
Park, Junho (충북대학교 정보통신공학부)
Bok, Kyoungsoo (충북대학교 정보통신공학부)
Yoo, Jaesoo (충북대학교 정보통신공학부)

Publication Information

The Journal of the Korea Contents Association / v.13, no.11, 2013 , pp. 21-27 More about this Journal

Abstract

In recently years, with the growth of social media and the development of mobile devices, the data have been significantly increased. Hadoop has been widely utilized as a typical distributed storage and processing framework. The tasks in Mapreduce based on the Hadoop distributed file system are allocated to the map as close as possible by considering the data locality. However, there are data being requested frequently according to the data analysis tasks of Mapreduce. In this paper, we propose a hot-data replication mechanism to improve the processing speed of Mapreduce according to data access patterns. The proposed scheme reduces the task processing time and improves the data locality using the replica optimization algorithm on the high access frequency of hot data. It is shown through performance evaluation that the proposed scheme outperforms the existing scheme in terms of the load of access frequency.

Keywords

Distributed Computing; MapReduce; Hadoop; Data Locality; Hot-Data;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	J. Dittrich and J. Quiane-Ruiz, "Efficient Big Data Processing in Hadoop MapReduce," Proc. of the VLDB Endowment, Vol.5, No.12, pp.2014-2015, 2012.
2	J. Cohen, J. Dolan, M. Dunlap, J. Hellerstein, and C. Welton, "MAD Skills: New Analysis Practices for Big Data," Proc. of the VLDB Endowment, Vol.2, No.2, pp.1481-1492, 2009.
3	http://hadoop.apache.org.
4	K. Shvachko, H. Huang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," Proc. of the IEEE Symposium on Massive Storage Systems, pp.1-10, 2010.
5	J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communication of the ACM, Vol.81, No.1, pp.107-113, 2008.
6	F. N. Afrati and J. D. Ullman, "Optimizing Joins in a Map-reduce Environment," Proc. of the International Conference on Extending Database Technology(EDBT '10), pp.99-110, 2010.
7	I. Hwang, K. Jung, K. Im, and J. Lee, "Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling," Journal of the Korea Contents Association, Vol.10, No.10, pp.78-85, 2010. 과학기술학회마을 DOI ScienceOn
8	H.-C. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker, "Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters," Proc. of the ACM SIGMOD International Conference on Management of Data, pp.1029-1040, 2007.
9	S. Ghemawat, H. Gobioff, and S. Leung. "The Google File System," Proc. of ACM Symposium on Operating Systems Principles, pp.29-43, 2003.
10	H. Zhao, S. Yang, Z. Chen, S. Jin, H. Yin, and L. Li, "MapReduce Model-Based Optimization of Range Queries," Proc. of the International Conference on Fuzzy Systems and Knowledge Discovery(FSKD '12), pp.2487-2492, 2012.

1	A Distributed Cache Management Scheme for Efficient Accesses of Small Files in HDFS / [Oh, Hyunkyo;Kim, Kiyeon;Hwang, Jae-Min;Park, Junho;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo;] / The Journal of the Korea Contents Association
2	Reverse k-Nearest Neighbor Query Processing Method for Continuous Query Processing in Bigdata Environments / [Lim, Jongtae;Park, Sunyong;Seo, Kiwon;Lee, Minho;Bok, Kyoungsoo;Yoo, Jaesoo;] / The Journal of the Korea Contents Association

KSCI

A Hot-Data Replication Scheme Based on Data Access Patterns for Enhancing Processing Speed of MapReduce 맵-리듀스의 처리 속도 향상을 위한 데이터 접근 패턴에 따른 핫-데이터 복제 기법

A Hot-Data Replication Scheme Based on Data Access Patterns for Enhancing Processing Speed of MapReduce