Browse > Article
http://dx.doi.org/10.3745/KTCCS.2022.11.11.381

Real-Time GPU Task Monitoring and Node List Management Techniques for Container Deployment in a Cluster-Based Container Environment  

Jihun, Kang (고려대학교 4단계 BK21 컴퓨터학교육연구단)
Joon-Min, Gil (대구가톨릭대학교 컴퓨터소프트웨어학부)
Publication Information
KIPS Transactions on Computer and Communication Systems / v.11, no.11, 2022 , pp. 381-394 More about this Journal
Abstract
Recently, due to the personalization and customization of data, Internet-based services have increased requirements for real-time processing, such as real-time AI inference and data analysis, which must be handled immediately according to the user's situation or requirement. Real-time tasks have a set deadline from the start of each task to the return of the results, and the guarantee of the deadline is directly linked to the quality of the services. However, traditional container systems are limited in operating real-time tasks because they do not provide the ability to allocate and manage deadlines for tasks executed in containers. In addition, tasks such as AI inference and data analysis basically utilize graphical processing units (GPU), which typically have performance impacts on each other because performance isolation is not provided between containers. And the resource usage of the node alone cannot determine the deadline guarantee rate of each container or whether to deploy a new real-time container. In this paper, we propose a monitoring technique for tracking and managing the execution status of deadlines and real-time GPU tasks in containers to support real-time processing of GPU tasks running on containers, and a node list management technique for container placement on appropriate nodes to ensure deadlines. Furthermore, we demonstrate from experiments that the proposed technique has a very small impact on the system.
Keywords
HPC Cloud; Container; GPU Computing; Real-time Task; Monitoring;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 J. H. Kang and J. M. Gil, "Deadline information management techniques to support real-time GPU tasks in container-based cloud environments," Proceedings of the Annual Spring Conference of Korea Information Processing Society Conference (KIPS) 2022, Vol.29, No.1, pp.56-59, 2022.
2 Docker, Docker [Internet], https://www.docker.com/
3 NVIDIA, Compute Unified Device Architecture (CUDA) [Internet], https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
4 NVIDIA, NVIDIA Docker [Internet], https://github.com/NVIDIA/nvidia-docker
5 Docker, docker ps [Internet], https://docs.docker.com/engine/reference/commandline/ps/
6 Docker, docker stats [Internet], https://docs.docker.com/engine/reference/commandline/stats/
7 Docker, docker top [Internet], https://docs.docker.com/engine/reference/commandline/top/
8 The Linux Foundation, Prometheus [Internet], https://prometheus.io/
9 Google, cAdvisor [Internet], https://hub.docker.com/r/google/cadvisor/
10 NVIDIA, NVIDIA Docker [Internet], https://github.com/NVIDIA/nvidia-docker
11 NVIDIA, NVIDIA System Management Interface [Internet], https://developer.nvidia.com/nvidia-system-management-interface
12 J. Ru, Y. Yang, J. Grundy, J. Keung, and L. Hao, "An efficient deadline constrained and data locality aware dynamic scheduling framework for multitenancy clouds," Concurrency and Computation: Practice and Experience, Vol.33, No.5, e6037, 2021.
13 J. Lou, Z. Tang, S. Zhang, W. Jia, W. Zhao, and J. Li, "Cost-effective scheduling for dependent tasks with tight deadline constraints in mobile edge computing," IEEE Transactions on Mobile Computing (Early Access), 2022.
14 H. Xia, M. Liu, Y. Chen, X. Jin, Z. Wang, and F. Wang, "A load balancing strategy of container virtual machine cloud microservice based on deadline limit," 14th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp.998-1002, 2022.
15 V. Struhar, S. S. Craciunas, M. Ashjaei, M. Behnam, and A. V. Papadopoulos, "React: Enabling real-time container orchestration," 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), pp.1-8, 2021.
16 C. Singh, P. Kumari, R. Mishra, H. P. Gupta, and T. Dutta, "Secure industrial iot task containerization with deadline constraint: A stackelberg game approach," IEEE Transactions on Industrial Informatics (Early Access), 2022.
17 L. Ye, Y. Xia, L. Yang, and C. Yan, "SHWS: Stochastic Hybrid Workflows Dynamic Scheduling in Cloud Container Services", IEEE Transactions on Automation Science and Engineering, Vol.19, No.3, pp.2620-2636, 2021.
18 Yann LeCun, Corinna Cortes, and Chris Burges, MNIST handwritten digit database [Internet], http://yann.lecun.com/exdb/mnist/
19 K. Dubey and S. C. Sharma, "A novel multi-objective CRPSO task scheduling algorithm with deadline constraint in cloud computing," Sustainable Computing: Informatics and Systems, Vol.32, 100605, 2021.   DOI
20 Google Brain, Tensorflow [Internet], https://www.tensorflow.org/
21 Docker, Docker CLI [Internet], https://docs.docker.com/engine/reference/commandline/pause/