[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5392/JKCA.10.10.078

Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling

Hwang, In-Sung (인하대학교 정보공학과)
Chung, Kyung-Yong (상지대학교 컴퓨터정보공학부)
Rim, Kee-Wook (선문대학교 컴퓨터정보공학부)
Lee, Jung-Hyun (인하대학교 컴퓨터정보공학부)

Publication Information

The Journal of the Korea Contents Association / v.10, no.10, 2010 , pp. 78-85 More about this Journal

Abstract

Map/Reduce is the programing model which can implement the Cloud Computing recently has been noticed. The model operates an application program processing amount of data using a lot of computers. It is important to plan the mechanism of separating the data in proper size and distributing that to a cluster consisted of computing node in efficient for using the computing nodes very well. Besides that, planning a process of Map phases and Reduce phases also influences the performance of Map/Reduce. This paper suggests the effectively distributing scheme that separates a huge data and operates Map task in the considering the performance of computing node and network status. And we make the Reduce task can be processed quickly through the tuning the mechanism of Map and Reduce task operation. Using the two Map/Reduce sample application, we experimented the suggestion and we evaluate suggestion considered it in how impact the Map/Reduce performance.

Keywords

Map/Reduce; Cloud Computing; Predict Performance; Hadoop;

Citations & Related Records

Reference

1	http://lucene.apache.org/hadoop
2	T. White, Hadoop: The Definitive Guide, O'Reilly, 2009.
3	http://hadoop.apache.org/common/docs/ r0.20.2/ mapred_tutorial.html
4	J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," In the Proceedings of the 6th Symposium on Operating Systems Design and Implementation, pp.107-113, 2004.
5	C. Tian, H. Zhou, Y. He, and L. Zha, "A Dynamic Scheduler for Heterogeneous Workloads," The 8th International Conference on Grid and Cooperative Computing, pp.218-224, 2009. DOI
6	J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, and I. Whalley, "Performance-Driven Task Co-Scheduling for MapReduce Environments," The 12th IEEE/IFIP Network Operations and Management Symposium, pp.373-380, 2010. DOI
7	K. Morton, A. Friesen, M. Balazinska, and D. Grossman, "Estimating the Progress of MapReduce Pipelines," 26th IEEE International Conference on Data Engineering, pp.681-684, 2010. DOI
8	Z. Vrba, P. Halvorsen, C. Griwodz, and P. Beskow, "Kahn Process Networks are a Flexible Alternative to MapReduce," The 11th IEEE International Conference on High Performance Computing and Communications, pp.154-162, 2009. DOI
9	J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares and X. Qin, "Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters," The 24th IEEE International Symposium on Parallel & Distributed Processing: Workshops and Phd Forum, pp.1-9, 2010. DOI
10	J. Shafer, S. Rixner, and A. L. Cox, "The Hadoop Distributed Filesystem: Balancing Portability and Performance," The 11th IEEE International Symposium on Performance Analysis of Systems and Software, pp.122-133, 2010. DOI
11	S. H. Kang and D. A. Bader, "Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce cluster and a Highly Multithreaded System," The 24th IEEE International Symposium on Parallel & Distributed Processing: Workshops and Phd Forum, pp.1-8, 2010. DOI

1	An Efficient Data Replacement Algorithm for Performance Optimization of MapReduce in Non-dedicated Distributed Computing Environments / [Ryu, Eunkyung;Son, Ingook;Park, Junho;Bok, Kyoungsoo;Yoo, Jaesoo;] / The Journal of the Korea Contents Association
2	A Hot-Data Replication Scheme Based on Data Access Patterns for Enhancing Processing Speed of MapReduce / [Son, Ingook;Ryu, Eunkyung;Park, Junho;Bok, Kyoungsoo;Yoo, Jaesoo;] / The Journal of the Korea Contents Association
3	빅데이터 병렬 처리 기술 동향 / [Park, Jun-Ho;Bok, Gyeong-Su;Yu, Jae-Su;] / Communications of the Korean Institute of Information Scientists and Engineers

KSCI

Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling 데이터 분배 및 태스크 진행 스케쥴링을 통한 맵/리듀스 모델의 성능 향상

Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling