GPU Resource Contention Management Technique for Simultaneous GPU Tasks in the Container Environments with Share the GPU

Kang, Jihun;

doi:10.3745/KTCCS.2022.11.10.333

KIPS Transactions on Computer and Communication Systems (정보처리학회논문지:컴퓨터 및 통신 시스템)

Volume 11 Issue 10
/
Pages.333-344
/
2022
/
2287-5891(pISSN)
/
2734-049X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

GPU Resource Contention Management Technique for Simultaneous GPU Tasks in the Container Environments with Share the GPU

GPU를 공유하는 컨테이너 환경에서 GPU 작업의 동시 실행을 위한 GPU 자원 경쟁 관리기법

Kang, Jihun

강지훈 (고려대학교 4단계 BK21 컴퓨터학교육연구단)

Received : 2022.05.25
Accepted : 2022.06.23
Published : 2022.10.31

https://doi.org/10.3745/KTCCS.2022.11.10.333 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In a container-based cloud environment, multiple containers can share a graphical processing unit (GPU), and GPU sharing can minimize idle time of GPU resources and improve resource utilization. However, in a cloud environment, GPUs, unlike CPU or memory, cannot logically multiplex computing resources to provide users with some of the resources in an isolated form. In addition, containers occupy GPU resources only when performing GPU operations, and resource usage is also unknown because the timing or size of each container's GPU operations is not known in advance. Containers unrestricted use of GPU resources at any given point in time makes managing resource contention very difficult owing to where multiple containers run GPU tasks simultaneously, and GPU tasks are handled in black box form inside the GPU. In this paper, we propose a container management technique to prevent performance degradation caused by resource competition when multiple containers execute GPU tasks simultaneously. Also, this paper demonstrates the efficiency of container management techniques that analyze and propose the problem of degradation due to resource competition when multiple containers execute GPU tasks simultaneously through experiments.

컨테이너 기반 클라우드 환경은 다수의 컨테이너가 GPU(Graphic Processing Unit)를 공유할 수 있으며, GPU 공유는 GPU 자원의 유휴 시간을 최소화하고 자원 사용률을 향상할 수 있다. 하지만, GPU는 전통적으로 클라우드 환경에서 CPU, 메모리와는 다르게 컴퓨팅 자원을 논리적으로 다중화하고 사용자에게 자원 일부를 격리된 형태로 제공할 수 없다. 또한, 컨테이너는 GPU 작업을 실행할 때만 GPU 자원을 점유하며, 각 컨테이너의 GPU 작업 실행 시점이나 작업 규모를 미리 알 수 없기 때문에 자원 사용량 또한 미리 알 수 없다. 컨테이너가 GPU 자원을 임의의 시점에 제한없이 사용한다는 특징은 다수의 컨테이너가 GPU 작업을 동시에 실행하는 환경에서 자원 경쟁 상태 관리를 매우 어렵게 만들며, GPU 작업은 대부분 GPU 내부에서 블랙박스 형태로 처리되기 때문에 GPU 작업이 실행된 이후에는 GPU 자원 경쟁을 방지하는데 제한적이다. 본 논문에서는 다수의 컨테이너가 GPU 작업을 동시에 실행할 때 자원 경쟁으로 인해 발생하는 성능 저하를 방지하기 위한 컨테이너 관리기법을 제안한다. 또한, 본 논문에서는 실험을 통해 다수의 컨테이너가 GPU 작업을 동시에 실행할 때 자원 경쟁으로 인한 성능 저하 문제를 분석하고 제안하는 컨테이너 관리기법의 효율성을 증명한다.

Keywords

Acknowledgement

이 논문은 2022년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(2022R1I1A1A01063551).

References

NVIDIA, NVIDIA Docker Wiki [Internet], https://github.com/NVIDIA/nvidia-docker/wiki.
Docker, Docker [Internet], https://www.docker.com/.
Docker, Docker CLI [Internet], https://docs.docker.com/engine/reference/commandline/pause/.
NVIDIA, Compute Unified Device Architecture (CUDA) [Internet], https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html.
NVIDIA, CUDA C++ Programming Guide [Internet], https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
NVIDIA, Multi-Process Service(MPS) [Internet], https://docs.nvidia.com/deploy/mps/index.html.
NVIDIA, NVIDIA Docker [Internet], https://github.com/NVIDIA/nvidia-docker.
Docker, docker ps [Internet], https://docs.docker.com/engine/reference/commandline/ps/.
Docker, docker top [Internet], https://docs.docker.com/engine /reference/commandline/top/.
NVIDIA, NVIDIA System Management Interface [Internet], https://developer.nvidia.com/nvidia-system-management-interface.
Q. Chen, J. Oh, S. Kim, and Y. Kim, "Design of an adaptive GPU sharing and scheduling scheme in container-based cluster," Cluster Computing, Vol.23, No.3, pp.2179-2191, 2020. https://doi.org/10.1007/s10586-019-02969-3
M. Lee, H. Ahn, C. H. Hong, and D. S. Nikolopoulos, "gShare: A centralized GPU memory management framework to enable GPU memory sharing for containers," Future Generation Computer Systems, Vol.130, pp.181-192, 2022. https://doi.org/10.1016/j.future.2021.12.016
J. Gu, S. Song, Y. Li, and H. Luo, "GaiaGPU: sharing GPUs in container clouds," In 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pp.469-476, 2018.
J. Shao, J. Ma, Y. Li, B. An, and D. Cao, "GPU scheduling for short tasks in private cloud," In 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE), pp.215-2155, 2019.
P. Thinakaran, J. R. Gunasekaran, B. Sharma, M. T. Kandemir, and C. R. Das, "Kube-knots: Resource harvesting through dynamic container orchestration in gpubased datacenters," In 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp.1-13, 2019.
Linux Foundation, kubernetes [Internet], https://kubernetes.io/ko/.
T. A. Yeh, H. H. Chen, and J. Chou, "Kubeshare: A framework to manage gpus as first-class and shared resources in container cloud," In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, pp.173-184, 2020.
Y. Peng, Y. Bao, Y. Chen, C. Wu, and C. Guo, "Optimus: an efficient dynamic resource scheduler for deep learning clusters," In Proceedings of the Thirteenth EuroSys Conference, pp.1-14, 2018.
Z. Liu, C. Chen, J. Li, Y. Cheng, Y. Kou, and D. Zhang, "KubFBS: A fine-grained and balance-aware scheduling system for deep learning tasks based on kubernetes," Concurrency and Computation: Practice and Experience, Vol.34, No.11, pp.e6836, 2022. https://doi.org/10.1002/cpe.6836
H. H. Chen, E. T. Lin, Y. M. Chou, and J. Chou, "Gemini: Enabling multi-tenant gpu sharing based on kernel burst estimation," IEEE Transactions on Cloud Computing(Early Access), 2021.
W. Xiao, S. Ren, Y. Li, Y. Zhang, P. Hou, Z. Li, Y. Feng, W. Lin, and Y. Jia, "AntMan: Dynamic scaling on GPU clusters for deep learning," In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, pp.533-548, 2020.
Google Brain, Tensorflow [Internet], https://www.tensorflow.org/.
Facebook AI Research, PyTorch [Internet] https://pytorch.org/.