Improving the Job Success Rate through Analysis of User Logs in HPC |
Yoon, JunWeon
(Dept. of Supercomputing Center, KISTI)
Hong, TaeYoung (Dept. of Supercomputing Center, KISTI) Kong, Ki-Sik (Dept. of Multimedia, Namseoul University) Park, ChanYeol (Dept. of Supercomputing Center, KISTI) |
1 | National Institute of Supercomputing and Networking, KISTI, http://www.nisn.re.kr |
2 | F. Wang, S. Oral, G. Shipman, O. Drokin, T. Wang, and I. Huang, "Understanding lustre filesystem internals", Oak Ridge National Lab, Technical Report ORNL/TM-2009/117, 2009 |
3 | G. Pfister, "An Introduction to the InfiniBand Architecture (http://www.infinibandta.org/)", IEEE Press, 2001. |
4 | G. Cawood, T. Seed, R. Abrol, T. Sloan, "TGO & JOSH:Grid Scheduling with Grid Engine & Globus", Proceedings of the UK e-Science All Hands Meetings, Nottingham, 2004. |
5 | Templeton, D., "A Beginner's Guide to Sun Grid Engine 6.2", Whitepaper of Sun Microsystems, July 2009. |
6 | Stillwell, M.; Vivien, F.; Casanova, H., "Dynamic Fractional Resource Scheduling versus Batch Scheduling," Parallel and Distributed Systems, IEEE Transactions, vol.23, no.3, pp.521-529, March 2012. DOI |
7 | C. Chaubal, "Scheduler Policies for Job Prioritization in the Sun N1 Grid Engine 6 System", Technical report, Sun BluePrints Online, Sun Microsystems, Inc., Santa Clara, CA, USA. http://www.sun.com/blueprints/1005/819-4325.pdf, 2005. |
8 | J.H. Abawajy, "An efficient adaptive scheduling policy for high-performance computing", Original Research Article Future Generation Computer Systems, Volume 25, Issue 3, pp.364-370, Mar 2009. DOI |
9 | J. W. Yoon, T. Y. Hong, C. Y. Park, H.C. Yu, "Analysis of Batch Job log to improve the success rate in HPC Environment", International Conference on Convergence Technology, vol.2 No.1, pp.209-210, July,2013. |
10 | El-Sayed, N., & Schroeder, B.., "Reading between the lines of failure logs: Understanding how HPC systems fail". In: Dependable Systems and Networks (DSN), 2013 43rd Annual IEEE/IFIP International Conference on, pp.1-12, June, 2013. |