Browse > Article
http://dx.doi.org/10.9728/dcs.2017.18.7.1411

Study of Scheduling Optimization through the Batch Job Logs Analysis  

Yoon, JunWeon (Department of Supercomputing Center, KISTI)
Song, Ui-Sung (Department of Computer Education, Busan National University of Education)
Publication Information
Journal of Digital Contents Society / v.18, no.7, 2017 , pp. 1411-1418 More about this Journal
Abstract
The batch job scheduler recognizes the computational resources configured in the cluster environment and plays a role of efficiently arranging the jobs in order. In order to efficiently use the limited available resources in the cluster, it is important to analyze and characterize the characteristics of user tasks. To do this, it is important to identify various scheduling algorithms and apply them to the system environment. Most scheduler software reflects the user's work environment, from job submission to termination, as well as the state of the inventory and system status of the entire managed object. It also stores various information related to task execution, such as job scripts, environment variables, libraries, wait for tasks, start and end times. In this paper, we analyze the execution log of the scheduler such as user 's success rate, execution time, and resource size through information related to job execution through batch scheduler. Based on this, it can be used as a basis to optimize the system by increasing the utilization rate of resources.
Keywords
HPC; Batch Job; Scheduling; Log analysis; Optimization;
Citations & Related Records
연도 인용수 순위
  • Reference
1 He, Libo, et al., "A Review of Resource Scheduling in Large-Scale Server Cluster", International Conference on Knowledge Management in Organizations. Springer, Cham, pp. 494-505, 2017.
2 J.H. Abawajy, "An efficient adaptive scheduling policy for high-performance computing", Original Research Article Future Generation Computer Systems, Vol 25, Issue 3, pp.364-370, Mar 2009.   DOI
3 National Institute of Supercomputing and Networking, KISTI, Available: http://www.nisn.re.kr/.
4 Reuther, Albert, et al. "Scalable system scheduling for HPC and big data", Journal of Parallel and Distributed Computing 111, pp.76-92, 2017.
5 Templeton, D., "A Beginner's Guide to Sun Grid Engine 6.2", Whitepaper of Sun Microsystems, July 2009.
6 C. Chaubal, "Scheduler Policies for Job Prioritization in the Sun N1 Grid Engine 6 System", Technical report, Sun BluePrints Online, Sun Microsystems, Inc., Santa Clara, CA, USA.
7 Zhou, Xiaobing, et al., "Exploring distributed resource allocation techniques in the slurm job management system", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013.
8 KLUSACEK, Dalibor; CHLUMSKY, Vaclav; RUDOVA, Hana, "Planning and optimization in TORQUE resource manager", In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp. 203-206, 2015.
9 Quintero, Dino, et al., "IBM Platform Computing Solutions Reference Architectures and Best Practices", IBM Redbooks, 2014.
10 Yuan, Yulai, et al., "Guarantee strict fairness and utilize prediction better in parallel job scheduling", IEEE Transactions on Parallel and Distributed Systems Vol. 25, No. 4, pp. 971-981, 2014.   DOI
11 Feitelson, D. G., & Weil, A. M. A. (1998, April). Utilization and predictability in scheduling the IBM SP2 with backfilling. In Parallel Processing Symposium, 1998. IPPS/SPDP 1998. Proceedings of the First Merged International and Symposium on Parallel and Distributed Processing , IEEE. pp.542-546, 1998.
12 J. W. Yoon, T. Y. Hong, C. Y. Park, H.C. Yu, "Analysis of Batch Job log to improve the success rate in HPC Environment", International Conference on Convergence Technology, vol.2 No.1, pp.209-210, July,2013.
13 El-Sayed, N., & Schroeder, B., "Reading between the lines of failure logs: Understanding how HPC systems fail". In: Dependable Systems and Networks (DSN), 2013 43rd Annual IEEE/IFIP International Conference on, pp.1-12, June, 2013.