Search | Korea Science

HyunSeung Jung;Taeshin Kang;Heonchang Yu;Jihun Kang
- Proceedings of the Korea Information Processing Society Conference
- /
- 2023.11a
- /
- pp.93-96
- /
- 2023
머신 러닝의 수요가 증가함에 따라, 머신 러닝 워크플로우의 배포 수요도 증가했다. Kubeflow를 통해 머신 러닝 배포를 편리하게 할 수 있으며, Kubeflow Pipelines에서는 하나의 작업을 여러 컨테이너로 분산시켜서 연산하는 것이 가능하다. 하지만 컨테이너 수를 많이 늘릴수록 반드시 성능이 향상되는 것은 아니다. 따라서, 본 연구에서는 성능 향상의 한계를 제공하는 원인을 분석하기 위해서, Kubeflow에서 CPU 집약적인 작업을 여러 컨테이너로 분산시켜서 연산을 수행하였다. 컨테이너 수에 따른 연산 완료 시간을 비교 및 분석한 결과, 컨테이너 수가 증가할수록 연산 속도 향상이 빨라지나, 어느 시점을 지나면 속도가 다시 완만하게 줄어드는 현상을 확인하였다. 이는 리소스 제한으로 인해 모든 컨테이너가 동시에 스케줄링 되지 못한 것이 가장 큰 원인으로 분석하였다.
https://doi.org/10.3745/PKIPS.y2023m11a.93 인용 PDF

Jeong, Jinwon;Yu, Heonchang
- KIPS Transactions on Computer and Communication Systems
- /
- v.11 no.7
- /
- pp.205-216
- /
- 2022
One of the many tools used for distributed deep learning training is Kubeflow, which runs on Kubernetes, a container orchestration tool. TensorFlow jobs can be managed using the existing operator provided by Kubeflow. However, when considering the distributed deep learning training jobs based on the parameter server architecture, the scheduling policy used by the existing operator does not consider the task affinity of the distributed training job and does not provide the ability to dynamically allocate or release resources. This can lead to long job completion time and low resource utilization rate. Therefore, in this paper we proposes a new operator that efficiently schedules distributed deep learning training jobs to minimize the job completion time and increase resource utilization rate. We implemented the new operator by modifying the existing operator and conducted experiments to evaluate its performance. The experiment results showed that our scheduling policy improved the average job completion time reduction rate of up to 84% and average CPU utilization increase rate of up to 92%.
https://doi.org/10.3745/KTCCS.2022.11.7.205 인용 PDF KSCI

Yoo, Ji-Hyun
- Journal of IKEEE
- /
- v.25 no.4
- /
- pp.672-678
- /
- 2021
New malware continues to increase and become advanced by every year. Although various studies are going on executable files to diagnose malicious codes, it is difficult to detect attacks that internalize malicious code threats in emails by exploiting non-executable document files, malicious URLs, and malicious macros and JS in documents. In this paper, we introduce a method of analyzing malicious code for email security through proactive detection and blocking of malicious email attacks, and propose a method for determining whether a non-executable document file is malicious based on AI. Among various algorithms, an efficient machine learning modeling is choosed, and an ML workflow system to diagnose malicious code using Kubeflow is proposed.
https://doi.org/10.7471/ikeee.2021.25.4.672 인용 PDF KSCI

Dong-Hyeon Youn;Seokil Song
- Journal of Platform Technology
- /
- v.11 no.4
- /
- pp.71-79
- /
- 2023
In this paper, we analyze the characteristics of machine learning workloads and, based on them, propose a distributed in-memory caching technique to improve the performance of machine learning workloads. The core of machine learning workload is model training, and model training is a computationally intensive task. Performing machine learning workloads in a Kubernetes-based cloud environment in which the computing framework and storage are separated can effectively allocate resources, but delays can occur because IO must be performed through network communication. In this paper, we propose a distributed in-memory caching technique to improve the performance of machine learning workloads performed in such an environment. In particular, we propose a new method of precaching data required for machine learning workloads into the distributed in-memory cache by considering Kubflow pipelines, a Kubernetes-based machine learning pipeline management tool.
PDF