Browse > Article
http://dx.doi.org/10.3745/KTCCS.2020.9.10.221

A Study on Scalability of Profiling Method Based on Hardware Performance Counter for Optimal Execution of Supercomputer  

Choi, Jieun (한국과학기술정보연구원 슈퍼컴퓨터기술개발센터)
Park, Guenchul (한국과학기술정보연구원 슈퍼컴퓨터기술개발센터)
Rho, Seungwoo (한국과학기술정보연구원 슈퍼컴퓨터기술개발센터)
Park, Chan-Yeol (한국과학기술정보연구원 슈퍼컴퓨터기술개발센터)
Publication Information
KIPS Transactions on Computer and Communication Systems / v.9, no.10, 2020 , pp. 221-230 More about this Journal
Abstract
Supercomputer that shares limited resources to multiple users needs a way to optimize the execution of application. For this, it is useful for system administrators to get prior information and hint about the applications to be executed. In most high-performance computing system operations, system administrators strive to increase system productivity by receiving information about execution duration and resource requirements from users when executing tasks. They are also using profiling techniques that generates the necessary information using statistics such as system usage to increase system utilization. In a previous study, we have proposed a scheduling optimization technique by developing a hardware performance counter-based profiling technique that enables characterization of applications without further understanding of the source code. In this paper, we constructed a profiling testbed cluster to support optimal execution of the supercomputer and experimented with the scalability of the profiling method to analyze application characteristics in the built cluster environment. Also, we experimented that the profiling method can be utilized in actual scheduling optimization with scalability even if the application class is reduced or the number of nodes for profiling is minimized. Even though the number of nodes used for profiling was reduced to 1/4, the execution time of the application increased by 1.08% compared to profiling using all nodes, and the scheduling optimization performance improved by up to 37% compared to sequential execution. In addition, profiling by reducing the size of the problem resulted in a quarter of the cost of collecting profiling data and a performance improvement of up to 35%.
Keywords
Profiling; Scalability; Supercomputer; Hardware Performance Counter; Job Scheduling;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Top500 [Internet], https://www.top500.org
2 B. Yang, X. Ji, X. Ma, X. Wang, T. Zhang, X. Zhu, N. El-Sayed, H. Lan, Y. Yang, J. Zhai, W. Liu, and W. Xue, "End-to-end I/O Monitoring on a Leading Supercomputer," in Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, Boston, MA, USA, pp.379-394, 2019.
3 X. Ji, B. Yang, T. Zhang, X. Ma, X. Zhu, X. Wang, N. El-Sayed, J. Zhai, W. Liu, and W. Xue, "Automatic, Application-Aware I/O Forwarding Resource Allocation," in Proceedings of the 17th USENIX Conference on File and Storage Technologies, Boston, MA, USA, pp.265-279, 2019.
4 S. Chunduri, S. Parker, P. Balaji, K. Harms, and K. Kumaran, "Characterization of MPI Usage on a Production Supercomputer," in SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, pp.386-400, 2018.
5 P. Gomez-Sanchez, D. Encinas, J. Panadero, A. Bezerra, S. Mendez, M. Naiouf, A. D. Giusti, D. Rexachs, and E. Luque, "Using AWS EC2 as Test-Bed infrastructure in the I/O system configuration for HPC applications," Journal of Computer Science & Technology, Vol.16, No.2, pp.65-75, 2016.
6 B. H. Park, S. Hukerikar, R. Adamson, and C. Engelmann, "Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale," in 2017 IEEE International Conference on Cluster Computing, Honolulu, HI, USA, pp.758-765, 2017.
7 S. Oral, S. S. Vazhkudai, F. Wang, C. Zimmer, C. Brumgard, J. Hanley, G. Markomanolis, R. Miller, D. Leverman, S. Atchley, and V. V. Larrea, "End-to-end I/O Portforlio for the Summit Supercomputing Ecosystem," in SC'19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, pp.1-14, 2019.
8 G. Wei, H. Yang, Z. Luan, and D. Qian, "iDPL: A Scalable and Flexible Inter-continental Testbed for Data Placement Research and Experiment," in 2017 IEEE Symposium on Computers and Communications, Heraklion, Greece, pp.1158-1163, 2017.
9 S. Ilager, R. Muralidhar, K. Rammohanrao, and R. Buyya, "A Data-Driven Frequency Scaling Approach for Deadlineaware Energy Efficient Scheduling on Graphics Processing Units (GPUs), in Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet, Melbourne, Australia, pp.1-10, 2020.
10 S. Wallace, X. Yang, V. Vishwanath, W. E. Allcock, S. Coghlan, M. E. Papka, and Z. Lan, "A Data Driven Scheduling Approach for Power Management on HPC Systems," in SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, Utah, USA, pp.656-666, 2016.
11 T. Bridi, "Scalable Optimization-based Scheduling Approaches for HPC Facilities," PhD. Dissertation, University of Bologna, Italy, 2018.
12 O. Sarood, A. Langer, A. Gupta, and L. Kale, "Maximizing throughput of Overprovisioned HPC Data Centers Under a strict Power Budget," in SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA, pp.807-818, 2014.
13 A. J. Younge, R. E. Grant, J. H. Laros III, M. Levenhagen, S. L. Olivier, K. Pedretti, and L. Ward, "Small Scale to Extreme: Methods for Characterizing Energy Efficiency in Supercomputing Applications," The Sustainable Computing: Informatics and Systems, Vol.21, pp.90-102, 2019.   DOI
14 Perf [Internet], https://perf.wiki.kernel.org/index.php/Main_Page
15 Oprofile [Internet], https://oprofile.sourceforge.io/about/
16 Papi [Internet], http://icl.cs.utk.edu/papi/index.html
17 Intel vtune [Internet], https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance/hardware-event-based-sampling-collection.html
18 AMD uProf [Internet], https://developer.amd.com/wordpress/media/2013/12/User_Guide.pdf
19 IBM HPCS Toolkit [Internet], https://researcher.watson.ibm.com/researcher/files/us-hfwen/HPCST_README.pdf
20 J. Choi, G. Park, and D. Nam, "Interference-aware coscheduling method based on classification of application characteristics from hardware performance counter using data mining," The Cluster Computing, Vol.23, pp.57-69, 2020.   DOI
21 PBS Scheduler [Internet], https://www.pbspro.org
22 Slurm Scheduler [Internet], https://www.slurm.schedmd.com
23 Nurion [Internet], https://www.ksc.re.kr/gsjw/jcs/hd
24 Nas Parallel Benchmarks [Internet], https://www.nas.nasa.gov/publiccations/npb.html.