Acknowledgement
이 논문은 2022년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(2022R1I1A1A01063551).
References
- NVIDIA, NVIDIA Docker Wiki [Internet], https://github.com/NVIDIA/nvidia-docker/wiki.
- Docker, Docker [Internet], https://www.docker.com/.
- Docker, Docker CLI [Internet], https://docs.docker.com/engine/reference/commandline/pause/.
- NVIDIA, Compute Unified Device Architecture (CUDA) [Internet], https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html.
- NVIDIA, CUDA C++ Programming Guide [Internet], https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
- NVIDIA, Multi-Process Service(MPS) [Internet], https://docs.nvidia.com/deploy/mps/index.html.
- NVIDIA, NVIDIA Docker [Internet], https://github.com/NVIDIA/nvidia-docker.
- Docker, docker ps [Internet], https://docs.docker.com/engine/reference/commandline/ps/.
- Docker, docker top [Internet], https://docs.docker.com/engine /reference/commandline/top/.
- NVIDIA, NVIDIA System Management Interface [Internet], https://developer.nvidia.com/nvidia-system-management-interface.
- Q. Chen, J. Oh, S. Kim, and Y. Kim, "Design of an adaptive GPU sharing and scheduling scheme in container-based cluster," Cluster Computing, Vol.23, No.3, pp.2179-2191, 2020. https://doi.org/10.1007/s10586-019-02969-3
- M. Lee, H. Ahn, C. H. Hong, and D. S. Nikolopoulos, "gShare: A centralized GPU memory management framework to enable GPU memory sharing for containers," Future Generation Computer Systems, Vol.130, pp.181-192, 2022. https://doi.org/10.1016/j.future.2021.12.016
- J. Gu, S. Song, Y. Li, and H. Luo, "GaiaGPU: sharing GPUs in container clouds," In 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pp.469-476, 2018.
- J. Shao, J. Ma, Y. Li, B. An, and D. Cao, "GPU scheduling for short tasks in private cloud," In 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE), pp.215-2155, 2019.
- P. Thinakaran, J. R. Gunasekaran, B. Sharma, M. T. Kandemir, and C. R. Das, "Kube-knots: Resource harvesting through dynamic container orchestration in gpubased datacenters," In 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp.1-13, 2019.
- Linux Foundation, kubernetes [Internet], https://kubernetes.io/ko/.
- T. A. Yeh, H. H. Chen, and J. Chou, "Kubeshare: A framework to manage gpus as first-class and shared resources in container cloud," In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, pp.173-184, 2020.
- Y. Peng, Y. Bao, Y. Chen, C. Wu, and C. Guo, "Optimus: an efficient dynamic resource scheduler for deep learning clusters," In Proceedings of the Thirteenth EuroSys Conference, pp.1-14, 2018.
- Z. Liu, C. Chen, J. Li, Y. Cheng, Y. Kou, and D. Zhang, "KubFBS: A fine-grained and balance-aware scheduling system for deep learning tasks based on kubernetes," Concurrency and Computation: Practice and Experience, Vol.34, No.11, pp.e6836, 2022. https://doi.org/10.1002/cpe.6836
- H. H. Chen, E. T. Lin, Y. M. Chou, and J. Chou, "Gemini: Enabling multi-tenant gpu sharing based on kernel burst estimation," IEEE Transactions on Cloud Computing(Early Access), 2021.
- W. Xiao, S. Ren, Y. Li, Y. Zhang, P. Hou, Z. Li, Y. Feng, W. Lin, and Y. Jia, "AntMan: Dynamic scaling on GPU clusters for deep learning," In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, pp.533-548, 2020.
- Google Brain, Tensorflow [Internet], https://www.tensorflow.org/.
- Facebook AI Research, PyTorch [Internet] https://pytorch.org/.