DOI QR코드

DOI QR Code

Effective Distributed Supercomputing Resource Management for Large Scale Scientific Applications

대규모 과학응용을 위한 효율적인 분산 슈퍼컴퓨팅 자원관리 기술 연구

  • 노승우 (KISTI 슈퍼컴퓨팅 기술개발실) ;
  • 김직수 (KISTI 슈퍼컴퓨팅 기술개발실) ;
  • 김상완 (KISTI 슈퍼컴퓨팅 기술개발실) ;
  • 김서영 (KISTI 슈퍼컴퓨팅 기술개발실) ;
  • 황순욱 (KISTI 슈퍼컴퓨팅 기술개발실)
  • Received : 2015.01.13
  • Accepted : 2015.02.24
  • Published : 2015.05.15

Abstract

Nationwide supercomputing infrastructures in Korea consist of geographically distributed supercomputing clusters. We developed High-Throughput Computing as a Service(HTCaaS) based on these distributed national supecomputing clusters to facilitate the ease at which scientists can explore large-scale and complex scientific problems. In this paper, we present our mechanism for dynamically managing computing resources and show its effectiveness through a case study of a real scientific application called drug repositioning. Specifically, we show that the resource utilization, accuracy, reliability, and usability can be improved by applying our resource management mechanism. The mechanism is based on the concepts of waiting time and success rate in order to identify valid computing resources. The results show a reduction in the total job completion time and improvement of the overall system throughput.

국가 슈퍼컴퓨팅 인프라는 국내 여러 지역에 분산된 슈퍼컴퓨팅 클러스터들로 이루어져 있으며, 본 연구팀에서는 이러한 이기종의 지리적으로 분산된 클러스터들을 대규모 과학 응용 연구자들에게 효율적으로 제공하기 위해 대규모 계산처리 시스템인 HTCaaS(High-Throughput Computing as a Service)를 자체 개발하였다. 본 논문에서는 이러한 대규모 계산처리 시스템(HTCaaS)을 활용하여 각 계산 자원을 동적으로 관리하는 방법에 대해서 논의하고, 신약재창출이라는 실제 과학 응용을 통해 그 효율성을 검증한다. 특히 유효 자원 식별을 위한 대기시간 및 성공률 개념을 이용한 동적 계산 자원 관리 기술을 적용함으로써 자원 활용률과 정확성, 신뢰성, 편의성이 향상될 수 있으며, 그 결과 전체적인 작업 시간의 단축과 작업 처리량도 향상될 수 있음을 확인할 수 있었다.

Keywords

References

  1. S. Rho, S. Kim, S. Kim, S. Kim, J.-S. Kim, and S. Hwang, "HTCaaS: A Large-Scale High-Throughput Computing by Leveraging Grids, Supercomputers and Cloud," Research Poster at IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC12), Nov. 2012.
  2. Jacq N, Salzemann J, Legre Y, Reichstadt M, Jacq F, Medernach E, Zimmermann M, Maass A, Sridhar M, Vinod-Kusam K, Montagnat J, Schwichtenberg H, Hofmann M, Breton V, "Grid enabled virtual screening against malaria," Journal of Grid Computing, Vol. 6, Issue 1, pp. 29-43, Mar. 2008. https://doi.org/10.1007/s10723-007-9085-5
  3. Ioan Raicu, Yong Zhao, Catalin Dumitrescu, Ian Foster, and Mike Wilde, "Falkon: A Fast and Light-Weight TasK ExecutiON Framework," In SC 07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pp. 1-12, New York, USA, 2007.
  4. Partnership & Leadership for the nationwide Supercomputing Infrastructure. [Online] Available : http://www.plsi.or.kr/, 2015.
  5. J. Towns, "Evolving from TeraGrid to XSEDE," in APS Southeastern Section Meeting Abstracts, Vol. 1, 2011.
  6. Riedel, M., et al., "Improving e-Science with Interoperability of the e-Infrastructures EGEE and DEISA," Proc. of the MIPRO, 2007.
  7. Miura K, "Overview of Japanese science Grid project NAREGI," Progress in Informatics, 2006.
  8. A. Prenneis, Jr, "LoadLeveler: Workload Management for Parallel and Distributed Computing Environments," Proc. of Supercomputing Europe (SUPEUR), 1996.
  9. I. Raicu, I. Foster and Y. Zhao, "Many-Task Computing for Grids and Supercomputers," IEEE/ACM Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS'08), 2008.
  10. A. Luckow, M. Santcroos, O. Weider, A. Merzky, S. Maddineni, and S. Jha, "Towards a common model for pilot-jobs," Proc. of The International ACM Symposium on High-Performance Parallel and Distributed Computing, 2012.
  11. R. Henderson and D. Tweten, "Portable Batch System: External reference specification," Technical report, NASA Ames Research Center, 1996.
  12. S. Zhou, "LSF: Load sharing in large-scale heterogeneous distributed systems," Proc. Workshop on Cluster Computing, 1992.
  13. I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M.Wilde, "Falkon: a Fast and Light-weight tasK executiON framework," Proc. of the 2007 ACM/IEEE conference on Supercomputing (SC'07), Nov. 2007.
  14. A. Luckow, L. Lacinski, and S. Jha, "SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems," Proc. of the 10th IEEE/ACMInternational Conference on Cluster, Cloud and Grid Computing (CCGrid 2010), May, 2010.
  15. E. Walker, J. P. Gardner, V. Litvin, and E. L. Turner, "Creating Personal Adaptive Clusters for Managing Scientific Jobs in a Distributed Computing Environment," Proc. of the Challenges of Large Applications in Distributed Environments (CLADE'06), Jun. 2006.
  16. A. Tsaregorodtsev, M. Bargiotti, N. Brook, A. C. Ramo, G. Castellani, P. Charpentier, C. Cioffi, J. Closier, R. G. Diaz, G. Kuznetsov, Y. Y. Li, R. Nandakumar, S. Paterson, R. Santinelli, A. C. Smith, M. S. Miguelez, and S. G. Jimenez, "DIRAC: a community grid solution," Journal of Physics: Conference Series, 2008.
  17. A. Anjomshoaa et al., "Job Submission Description Language (JSDL) Specification," Version 1.0. Open Grid Forum Grid Final Document Nr. 136, 2008.
  18. HTCondor. [Online] available : http://research.cs.wisc.edu/htcondor/htc.html, 2015.
  19. P. Andreetto, S. Andreozzi, G. Avellino, S. Beco, A. Cavallini, M. Cecchi, V. Ciaschini, A. Dorise, F. Giacomini, A. Gianelle, et al., "The gLite workload management system," Journal of Physics: Conference Series, Vol. 119, pp. 062007, IOP Publishing, 2008. https://doi.org/10.1088/1742-6596/119/6/062007
  20. E.C2. Amazon. Amazon elastic compute cloud. [Online] Available : http://aws.amazon.com/ec2/, 2015.
  21. J.-S. Kim, S. Rho, S. Kim, S. Kim, S. Kim, and S. Hwang, "HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large-Scale Scientific Computing," IEEE/ACM 6th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS'13) held with SC13, Nov. 2013.
  22. M. Lee, D. Kim, "Large-scale reverse docking profiles and their applications," BMC Bioinformatics, 13(Suppl 17):S6, 2012.
  23. O. Trott, A. J. Olson, "AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading," Journal of Computational Chemistry, Vol. 31, pp. 455-461, 2010.