DOI QR코드

DOI QR Code

A Study on Scalability of Profiling Method Based on Hardware Performance Counter for Optimal Execution of Supercomputer

슈퍼컴퓨터 최적 실행 지원을 위한 하드웨어 성능 카운터 기반 프로파일링 기법의 확장성 연구

  • 최지은 (한국과학기술정보연구원 슈퍼컴퓨터기술개발센터) ;
  • 박근철 (한국과학기술정보연구원 슈퍼컴퓨터기술개발센터) ;
  • 노승우 (한국과학기술정보연구원 슈퍼컴퓨터기술개발센터) ;
  • 박찬열 (한국과학기술정보연구원 슈퍼컴퓨터기술개발센터)
  • Received : 2020.07.13
  • Accepted : 2020.09.01
  • Published : 2020.10.31

Abstract

Supercomputer that shares limited resources to multiple users needs a way to optimize the execution of application. For this, it is useful for system administrators to get prior information and hint about the applications to be executed. In most high-performance computing system operations, system administrators strive to increase system productivity by receiving information about execution duration and resource requirements from users when executing tasks. They are also using profiling techniques that generates the necessary information using statistics such as system usage to increase system utilization. In a previous study, we have proposed a scheduling optimization technique by developing a hardware performance counter-based profiling technique that enables characterization of applications without further understanding of the source code. In this paper, we constructed a profiling testbed cluster to support optimal execution of the supercomputer and experimented with the scalability of the profiling method to analyze application characteristics in the built cluster environment. Also, we experimented that the profiling method can be utilized in actual scheduling optimization with scalability even if the application class is reduced or the number of nodes for profiling is minimized. Even though the number of nodes used for profiling was reduced to 1/4, the execution time of the application increased by 1.08% compared to profiling using all nodes, and the scheduling optimization performance improved by up to 37% compared to sequential execution. In addition, profiling by reducing the size of the problem resulted in a quarter of the cost of collecting profiling data and a performance improvement of up to 35%.

한정된 자원을 여러 사용자에게 공유해야하는 슈퍼컴퓨터와 같은 시스템은 응용프로그램의 실행을 최적화하는 방안이 필요하다. 이를 위해 시스템 관리자가 수행할 응용프로그램에 대한 사전 정보를 파악하는 것이 유용하다. 대부분의 고성능 컴퓨팅 시스템 운영에 있어 작업을 실행할 때 사용자로부터 실행 기간, 자원 요구사항들에 대한 정보를 제공 받거나 시스템 사용 통계 값을 사용하여 필요한 정보를 생성하는 등의 프로파일링 기술을 바탕으로 시스템 활용률을 높이는데 활용하고 있다. 본 논문의 선행연구에서는 하드웨어 성능 카운터를 이용하여 소스코드에 대한 별도의 이해 없이 응용프로그램 특성분석을 실행하고, 이 결과를 바탕으로 작업 스케줄링 알고리즘을 최적화하는 기술을 개발한 바 있다. 본 논문에서는 슈퍼컴퓨터 최적 실행지원을 위한 프로파일링 테스트베드 클러스터를 구축하고 구축한 클러스터 환경에서 하드웨어 성능 카운터를 기반으로 응용프로그램의 특성을 분석하는 프로파일링 기법의 확장성을 실험하였다. 이를 통해 응용프로그램의 문제크기를 축소하거나 프로파일링에 사용되는 노드수를 최소화하여도 개발한 하드웨어 성능 카운터 기반의 프로파일링 기법이 확장성 있게 동작하여 실제 스케줄링 최적화시에 활용될 수 있음을 보이고자 한다. 실험을 통해 프로파일링에 사용되는 노드의 수를 1/4로 줄여도 전체 노드를 사용한 프로파일링 대비 응용프로그램의 실행 시간이 1.08% 증가할 뿐 스케줄링 최적화 성능은 순차실행 대비 최대 37% 향상되었다. 또한 응용프로그램의 문제크기를 축소하여 프로파일링한 결과 프로파일링 데이터 수집 단계의 시간적 비용을 1/4배 이상 낮추면서 최대 35% 성능 향상 효과를 얻었다.

Keywords

References

  1. Top500 [Internet], https://www.top500.org
  2. B. Yang, X. Ji, X. Ma, X. Wang, T. Zhang, X. Zhu, N. El-Sayed, H. Lan, Y. Yang, J. Zhai, W. Liu, and W. Xue, "End-to-end I/O Monitoring on a Leading Supercomputer," in Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, Boston, MA, USA, pp.379-394, 2019.
  3. X. Ji, B. Yang, T. Zhang, X. Ma, X. Zhu, X. Wang, N. El-Sayed, J. Zhai, W. Liu, and W. Xue, "Automatic, Application-Aware I/O Forwarding Resource Allocation," in Proceedings of the 17th USENIX Conference on File and Storage Technologies, Boston, MA, USA, pp.265-279, 2019.
  4. S. Chunduri, S. Parker, P. Balaji, K. Harms, and K. Kumaran, "Characterization of MPI Usage on a Production Supercomputer," in SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, pp.386-400, 2018.
  5. P. Gomez-Sanchez, D. Encinas, J. Panadero, A. Bezerra, S. Mendez, M. Naiouf, A. D. Giusti, D. Rexachs, and E. Luque, "Using AWS EC2 as Test-Bed infrastructure in the I/O system configuration for HPC applications," Journal of Computer Science & Technology, Vol.16, No.2, pp.65-75, 2016.
  6. B. H. Park, S. Hukerikar, R. Adamson, and C. Engelmann, "Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale," in 2017 IEEE International Conference on Cluster Computing, Honolulu, HI, USA, pp.758-765, 2017.
  7. S. Oral, S. S. Vazhkudai, F. Wang, C. Zimmer, C. Brumgard, J. Hanley, G. Markomanolis, R. Miller, D. Leverman, S. Atchley, and V. V. Larrea, "End-to-end I/O Portforlio for the Summit Supercomputing Ecosystem," in SC'19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, pp.1-14, 2019.
  8. G. Wei, H. Yang, Z. Luan, and D. Qian, "iDPL: A Scalable and Flexible Inter-continental Testbed for Data Placement Research and Experiment," in 2017 IEEE Symposium on Computers and Communications, Heraklion, Greece, pp.1158-1163, 2017.
  9. S. Ilager, R. Muralidhar, K. Rammohanrao, and R. Buyya, "A Data-Driven Frequency Scaling Approach for Deadlineaware Energy Efficient Scheduling on Graphics Processing Units (GPUs), in Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet, Melbourne, Australia, pp.1-10, 2020.
  10. S. Wallace, X. Yang, V. Vishwanath, W. E. Allcock, S. Coghlan, M. E. Papka, and Z. Lan, "A Data Driven Scheduling Approach for Power Management on HPC Systems," in SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, Utah, USA, pp.656-666, 2016.
  11. T. Bridi, "Scalable Optimization-based Scheduling Approaches for HPC Facilities," PhD. Dissertation, University of Bologna, Italy, 2018.
  12. O. Sarood, A. Langer, A. Gupta, and L. Kale, "Maximizing throughput of Overprovisioned HPC Data Centers Under a strict Power Budget," in SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA, pp.807-818, 2014.
  13. A. J. Younge, R. E. Grant, J. H. Laros III, M. Levenhagen, S. L. Olivier, K. Pedretti, and L. Ward, "Small Scale to Extreme: Methods for Characterizing Energy Efficiency in Supercomputing Applications," The Sustainable Computing: Informatics and Systems, Vol.21, pp.90-102, 2019. https://doi.org/10.1016/j.suscom.2018.11.005
  14. Perf [Internet], https://perf.wiki.kernel.org/index.php/Main_Page
  15. Oprofile [Internet], https://oprofile.sourceforge.io/about/
  16. Papi [Internet], http://icl.cs.utk.edu/papi/index.html
  17. Intel vtune [Internet], https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance/hardware-event-based-sampling-collection.html
  18. AMD uProf [Internet], https://developer.amd.com/wordpress/media/2013/12/User_Guide.pdf
  19. IBM HPCS Toolkit [Internet], https://researcher.watson.ibm.com/researcher/files/us-hfwen/HPCST_README.pdf
  20. J. Choi, G. Park, and D. Nam, "Interference-aware coscheduling method based on classification of application characteristics from hardware performance counter using data mining," The Cluster Computing, Vol.23, pp.57-69, 2020. https://doi.org/10.1007/s10586-019-02949-7
  21. PBS Scheduler [Internet], https://www.pbspro.org
  22. Slurm Scheduler [Internet], https://www.slurm.schedmd.com
  23. Nurion [Internet], https://www.ksc.re.kr/gsjw/jcs/hd
  24. Nas Parallel Benchmarks [Internet], https://www.nas.nasa.gov/publiccations/npb.html.