Browse > Article
http://dx.doi.org/10.9708/jksci.2013.18.11.023

Pre-arrangement Based Task Scheduling Scheme for Reducing MapReduce Job Processing Time  

Park, Jung Hyo (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
Kim, Jun Sang (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
Kim, Chang Hyeon (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
Lee, Won Joo (Dept. of Computer Science, Inha Technical College)
Jeon, Chang Ho (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
Abstract
In this paper, we propose pre-arrangement based task scheduling scheme to reduce MapReduce job processing time. If a task and data to be processed do not locate in same node, the data should be transmitted to node where the task is allocated on. In that case, a job processing time increases owing to data transmission time. To avoid that case, we schedule tasks into two steps. In the first step, tasks are sorted in the order of high data locality. In the second step, tasks are exchanged to improve their data localities based on a location information of data. In performance evaluation, we compare the proposed method based Hadoop with a default Hadoop on a small Hadoop cluster in term of the job processing time and the number of tasks sorted to node without data to be processed by them. The result shows that the proposed method lowers job processing time by around 18%. Also, we confirm that the number of tasks allocated to node without data to be processed by them decreases by around 25%.
Keywords
Hadoop; MapReduce; Data Locality;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Microsoft Azure, http://www.microsoft.com/windowsazure/Whitepapers/introducingwindowsazureplatform.
2 KT Ucloud,. http://home.ucloud.olleh.com/guide/guide.kt.
3 Google App Engine, https://developers.google.com/appengine/docs/whatisgoogleappengine.html.
4 K. Lee, H. Choi, B. Moon, Y. Lee, and Y. Chung, "Parallel Data Processing with MapReduce : A Survey," In Proceedings of ACM SIGMOD, Vol . 4, Issue 3, pp. 11-20, Dec. 2012.
5 J. Dean and S. Ghemawat, "MapReduce :Simplified Data Processing on Large Clusters," In Proceeding of the 6th USENIX Symposium on Operating Systems Design and Implementation, pp. 107-113, Jan. 2008.
6 J. Lee, H. Yu, E. Lee, "Data Replication Technique for Improving Data Locality of MapReduce," In Proceeding of the KIISE Korea Computer Congress 2012, Vol. 39, No. 1(A), pp. 218-220, Jun. 2012.   과학기술학회마을
7 C. L. Abad, Y. Lu, and R. H. Campbell "DARE:Adaptive Data Replication for Efficient Cluster Scheduling," IEEE CLUSTER, 2011 IEEE International Conference, pp. 159-168, Sep. 2011.
8 J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares and X. Qin, "Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters," The 24th IEEE International Symposium on Parallel&Distributed processing:Workshops and Phd Forum, pp. 1-9, April 2010.
9 C. Tian, H. Zhou, Y. He, and L. Zha, "A Dynamic Scheduler for Heterougeneous Workloads," The 8th International Conference on Grid and Cooperative Computing, pp. 218-224, Aug. 2009.
10 X. Zhang, Y. Feng, S. Feng, J. Fan and M. Zhong, "An Effective Data Locality Aware Task Scheduling Method for MapReduce Framework in Heterogeneous Environments," In Proceedings of the Internatinal Conference on Cloud and Service Computing, pp. 235-242, Dec. 2011.
11 Z. Guo, G. Fox, and M. Zhou, "Investigation of Data Locality in MapReduce," In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cludster, Cloud and Grid Computing, pp.419-426, May 2012.
12 M. Zaharia, A. Konwinski, A. D. Joseph, R.Katz, and I. Stoica, "Improving MapReduce Performance in Heterogeneous Environments," In Proceedings of 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI, Vol. 8, No. 4, pp. 29-42, Dec. 2008.
13 O. O'Malley, "TeraByte Sort on Apache Hadoop", Yahoo, available online at: http://sortbenchmark.org/Yahoo-Hadoop.pdf, May 2008.