DOI QR코드

DOI QR Code

Workflow Scheduling Using Heuristic Scheduling in Hadoop

  • Thingom, Chintureena (Centre for Advanced Research & Training, CHRIST (Deemed to be University)) ;
  • Kumar R, Ganesh (Faculty of Engineering, CHRIST (Deemed to be University)) ;
  • Yeon, Guydeuk (Innovation Centre, CHRIST (Deemed to be University))
  • Received : 2018.08.12
  • Accepted : 2018.09.20
  • Published : 2018.12.31

Abstract

In our research study, we aim at optimizing multiple load in cloud, effective resource allocation and lesser response time for the job assigned. Using Hadoop on datacenter is the best and most efficient analytical service for any corporates. To provide effective and reliable performance analytical computing interface to the client, various cloud service providers host Hadoop clusters. The previous works done by many scholars were aimed at execution of workflows on Hadoop platform which also minimizes the cost of virtual machines and other computing resources. Earlier stochastic hill climbing technique was applied for single parameter and now we are working to optimize multiple parameters in the cloud data centers with proposed heuristic hill climbing. As many users try to priorities their job simultaneously in the cluster, resource optimized workflow scheduling technique should be very reliable to complete the task assigned before the deadlines and also to optimize the usage of the resources in cloud.

Keywords

E1ICAW_2018_v16n4_264_f0001.png 이미지

Fig. 1. Comparison of result for ARU (%).

E1ICAW_2018_v16n4_264_f0002.png 이미지

Fig. 2. Comparison of result for VM cost (the number of VM).

E1ICAW_2018_v16n4_264_f0003.png 이미지

Fig. 3. Comparison of result for energy cost (Watts).

E1ICAW_2018_v16n4_264_f0004.png 이미지

Fig. 4. Comparison of result for bandwidth cost (kB).

E1ICAW_2018_v16n4_264_f0005.png 이미지

Fig. 5. Comparison of result for SLA violation.

E1ICAW_2018_v16n4_264_f0006.png 이미지

Fig. 8. MATLAB result for the proposed solution & other solutions. Task size is the number of task.

E1ICAW_2018_v16n4_264_f0008.png 이미지

Fig. 6. Slope to total cost.

E1ICAW_2018_v16n4_264_f0009.png 이미지

Fig. 7. Comparison between target total cost (tct) and actual total cost (ATC).

Table 1. Job with deadline and dependency

E1ICAW_2018_v16n4_264_t0001.png 이미지

References

  1. M. Islam, A. K. Huang, M. Battisha, M. Chiang, S. Srinivasan, C. Peters, A. Neumann, and A. Abdelnur, "Oozie: towards a scalable workflow management system for Hadoop," in Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, Scottsdale, AZ, 2012. DOI: 10.1145/2443416.2443420.
  2. C. Wensel, "Cascading: defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster," 2008 [Internet], Available: https://github.com/cwensel/cascading.
  3. C. Olston, G. Chiou, L. Chitnis, F. Liu, Y. Han, M. Larsson, et al., "Nova: continuous Pig/Hadoop workflows," in Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, Athens, Greece, pp. 1081-1090, 2011. DOI: 10.1145/1989323.1989439.
  4. F. Dong and S. G. Akl, "PFAS: a resource-performance-fluctuation-aware workflow scheduling algorithm for grid computing," in Proceedings of IEEE International Parallel and Distributed Processing Symposium, Rome, Italy, pp. 1-9, 2007. DOI: 10.1109/IPDPS.2007.370328.
  5. J. Wang, D. Crawl, and I. Altintas, "Kepler+Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems," in Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, Portland, OR, 2009. DOI: 10.1145/1645164.1645176.
  6. Z. Tang, M. Liu, A. Ammar, K. Li, and K. Li, "An optimized MapReduce workflow scheduling algorithm for heterogeneous computing," The Journal of Supercomputing, vol. 72, no. 6, pp. 2059-2079, 2016. DOI: 10.1007/s11227-014-1335-2.
  7. K. R. Krish, A. Anwar, and A. R. Butt, "[phi]Sched: a heterogeneity-aware Hadoop workflow scheduler," in Proceedings of 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), Paris, France, pp. 255-264, 2014. DOI: 10.1109/MASCOTS.2014.40.
  8. X. Xu, L. Cao, and X. Wang, "Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous Hadoop clusters," IEEE Systems Journal, vol. 10, no. 2, pp. 471-482, 2016. DOI: 10.1109/JSYST.2014.2323112.
  9. Q. Chen, D. Zhang, M. Guo, Q. Deng, and S. Guo, "SAMR: a self-adaptive Mapreduce scheduling algorithm in heterogeneous environment," in Proceedings of 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), Bradford, UK, pp. 2736-2743, 2010. DOI: 10.1109/CIT.2010.458.
  10. D. Crawl, J. Wang, and I. Altintas, "Provenance for Mapreduce-based data-intensive workflows," in Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science, Seattle, WA, pp. 21-30, 2011. DOI: 10.1145/2110497.2110501.
  11. D. de Oliveira, K. A. Ocana, F. Baiao, and M. Mattoso, "A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds," Journal of Grid Computing, vol. 10, no. 3, pp. 521-552, 2012. DOI: 10.1007/s10723-012-9227-2.
  12. D. de Oliveira, E. Ogasawara, K. Ocana, F. Baiao, and M. Mattoso, "An adaptive parallel execution strategy for cloud - based scientific workflows," Concurrency and Computation: Practice and Experience, vol. 24, no. 13, pp. 1531-1550, 2012. DOI: 10.1002/cpe.1880.
  13. K. Deng, L. Kong, J. Song, K. Ren, and D. Yuan, "A weighted k-means clustering based co-scheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments," in Proceedings of 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC), Sydney, Australia, pp. 547-554, 2011. DOI: 10.1109/DASC.2011.102.