Browse > Article
http://dx.doi.org/10.5392/JKCA.10.10.078

Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling  

Hwang, In-Sung (인하대학교 정보공학과)
Chung, Kyung-Yong (상지대학교 컴퓨터정보공학부)
Rim, Kee-Wook (선문대학교 컴퓨터정보공학부)
Lee, Jung-Hyun (인하대학교 컴퓨터정보공학부)
Publication Information
Abstract
Map/Reduce is the programing model which can implement the Cloud Computing recently has been noticed. The model operates an application program processing amount of data using a lot of computers. It is important to plan the mechanism of separating the data in proper size and distributing that to a cluster consisted of computing node in efficient for using the computing nodes very well. Besides that, planning a process of Map phases and Reduce phases also influences the performance of Map/Reduce. This paper suggests the effectively distributing scheme that separates a huge data and operates Map task in the considering the performance of computing node and network status. And we make the Reduce task can be processed quickly through the tuning the mechanism of Map and Reduce task operation. Using the two Map/Reduce sample application, we experimented the suggestion and we evaluate suggestion considered it in how impact the Map/Reduce performance.
Keywords
Map/Reduce; Cloud Computing; Predict Performance; Hadoop;
Citations & Related Records
연도 인용수 순위
  • Reference
1 http://lucene.apache.org/hadoop
2 T. White, Hadoop: The Definitive Guide, O'Reilly, 2009.
3 http://hadoop.apache.org/common/docs/ r0.20.2/ mapred_tutorial.html
4 J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," In the Proceedings of the 6th Symposium on Operating Systems Design and Implementation, pp.107-113, 2004.
5 C. Tian, H. Zhou, Y. He, and L. Zha, "A Dynamic Scheduler for Heterogeneous Workloads," The 8th International Conference on Grid and Cooperative Computing, pp.218-224, 2009.   DOI
6 J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, and I. Whalley, "Performance-Driven Task Co-Scheduling for MapReduce Environments," The 12th IEEE/IFIP Network Operations and Management Symposium, pp.373-380, 2010.   DOI
7 K. Morton, A. Friesen, M. Balazinska, and D. Grossman, "Estimating the Progress of MapReduce Pipelines," 26th IEEE International Conference on Data Engineering, pp.681-684, 2010.   DOI
8 J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares and X. Qin, "Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters," The 24th IEEE International Symposium on Parallel & Distributed Processing: Workshops and Phd Forum, pp.1-9, 2010.   DOI
9 J. Shafer, S. Rixner, and A. L. Cox, "The Hadoop Distributed Filesystem: Balancing Portability and Performance," The 11th IEEE International Symposium on Performance Analysis of Systems and Software, pp.122-133, 2010.   DOI
10 Z. Vrba, P. Halvorsen, C. Griwodz, and P. Beskow, "Kahn Process Networks are a Flexible Alternative to MapReduce," The 11th IEEE International Conference on High Performance Computing and Communications, pp.154-162, 2009.   DOI
11 S. H. Kang and D. A. Bader, "Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce cluster and a Highly Multithreaded System," The 24th IEEE International Symposium on Parallel & Distributed Processing: Workshops and Phd Forum, pp.1-8, 2010.   DOI