• Title/Summary/Keyword: Distributed Machine Learning

Search Result 127, Processing Time 0.025 seconds

Design of a ParamHub for Machine Learning in a Distributed Cloud Environment

  • Su-Yeon Kim;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.161-168
    • /
    • 2024
  • As the size of big data models grows, distributed training is emerging as an essential element for large-scale machine learning tasks. In this paper, we propose ParamHub for distributed data training. During the training process, this agent utilizes the provided data to adjust various conditions of the model's parameters, such as the model structure, learning algorithm, hyperparameters, and bias, aiming to minimize the error between the model's predictions and the actual values. Furthermore, it operates autonomously, collecting and updating data in a distributed environment, thereby reducing the burden of load balancing that occurs in a centralized system. And Through communication between agents, resource management and learning processes can be coordinated, enabling efficient management of distributed data and resources. This approach enhances the scalability and stability of distributed machine learning systems while providing flexibility to be applied in various learning environments.

Distributed In-Memory Caching Method for ML Workload in Kubernetes (쿠버네티스에서 ML 워크로드를 위한 분산 인-메모리 캐싱 방법)

  • Dong-Hyeon Youn;Seokil Song
    • Journal of Platform Technology
    • /
    • v.11 no.4
    • /
    • pp.71-79
    • /
    • 2023
  • In this paper, we analyze the characteristics of machine learning workloads and, based on them, propose a distributed in-memory caching technique to improve the performance of machine learning workloads. The core of machine learning workload is model training, and model training is a computationally intensive task. Performing machine learning workloads in a Kubernetes-based cloud environment in which the computing framework and storage are separated can effectively allocate resources, but delays can occur because IO must be performed through network communication. In this paper, we propose a distributed in-memory caching technique to improve the performance of machine learning workloads performed in such an environment. In particular, we propose a new method of precaching data required for machine learning workloads into the distributed in-memory cache by considering Kubflow pipelines, a Kubernetes-based machine learning pipeline management tool.

  • PDF

Load Balancing Scheme for Machine Learning Distributed Environment (기계학습 분산 환경을 위한 부하 분산 기법)

  • Kim, Younggwan;Lee, Jusuk;Kim, Ajung;Hong, Jiman
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.25-31
    • /
    • 2021
  • As the machine learning becomes more common, development of application using machine learning is actively increasing. In addition, research on machine learning platform to support development of application is also increasing. However, despite the increasing of research on machine learning platform, research on suitable load balancing for machine learning platform is insufficient. Therefore, in this paper, we propose a load balancing scheme that can be applied to machine learning distributed environment. The proposed scheme composes distributed servers in a level hash table structure and assigns machine learning task to the server in consideration of the performance of each server. We implemented distributed servers and experimented, and compared the performance with the existing hashing scheme. Compared with the existing hashing scheme, the proposed scheme showed an average 26% speed improvement, and more than 38% reduced the number of waiting tasks to assign to the server.

Systematic Research on Privacy-Preserving Distributed Machine Learning (프라이버시를 보호하는 분산 기계 학습 연구 동향)

  • Min Seob Lee;Young Ah Shin;Ji Young Chun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.2
    • /
    • pp.76-90
    • /
    • 2024
  • Although artificial intelligence (AI) can be utilized in various domains such as smart city, healthcare, it is limited due to concerns about the exposure of personal and sensitive information. In response, the concept of distributed machine learning has emerged, wherein learning occurs locally before training a global model, mitigating the concentration of data on a central server. However, overall learning phase in a collaborative way among multiple participants poses threats to data privacy. In this paper, we systematically analyzes recent trends in privacy protection within the realm of distributed machine learning, considering factors such as the presence of a central server, distribution environment of the training datasets, and performance variations among participants. In particular, we focus on key distributed machine learning techniques, including horizontal federated learning, vertical federated learning, and swarm learning. We examine privacy protection mechanisms within these techniques and explores potential directions for future research.

Performance Factor of Distributed Processing of Machine Learning using Spark (스파크를 이용한 머신러닝의 분산 처리 성능 요인)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.1
    • /
    • pp.19-24
    • /
    • 2021
  • In this paper, we study performance factor of machine learning in the distributed environment using Apache Spark and presents an efficient distributed processing method through experiments. This work firstly presents performance factor when performing machine learning in a distributed cluster by classifying cluster performance, data size, and configuration of spark engine. In addition, performance study of regression analysis using Spark MLlib running on the Hadoop cluster is performed while changing the configuration of the node and the Spark Executor. As a result of the experiment, it was confirmed that the effective number of executors was affected by the number of data blocks, but depending on the cluster size, the maximum and minimum values were limited by the number of cores and the number of worker nodes, respectively.

Analysis of massive data in astronomy (천문학에서의 대용량 자료 분석)

  • Shin, Min-Su
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1107-1116
    • /
    • 2016
  • Recent astronomical survey observations have produced substantial amounts of data as well as completely changed conventional methods of analyzing astronomical data. Both classical statistical inference and modern machine learning methods have been used in every step of data analysis that range from data calibration to inferences of physical models. We are seeing the growing popularity of using machine learning methods in classical problems of astronomical data analysis due to low-cost data acquisition using cheap large-scale detectors and fast computer networks that enable us to share large volumes of data. It is common to consider the effects of inhomogeneous spatial and temporal coverage in the analysis of big astronomical data. The growing size of the data requires us to use parallel distributed computing environments as well as machine learning algorithms. Distributed data analysis systems have not been adopted widely for the general analysis of massive astronomical data. Gathering adequate training data is expensive in observation and learning data are generally collected from multiple data sources in astronomy; therefore, semi-supervised and ensemble machine learning methods will become important for the analysis of big astronomical data.

Design of Block-based Modularity Architecture for Machine Learning (머신러닝을 위한 블록형 모듈화 아키텍처 설계)

  • Oh, Yoosoo
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.3
    • /
    • pp.476-482
    • /
    • 2020
  • In this paper, we propose a block-based modularity architecture design method for distributed machine learning. The proposed architecture is a block-type module structure with various machine learning algorithms. It allows free expansion between block-type modules and allows multiple machine learning algorithms to be organically interlocked according to the situation. The architecture enables open data communication using the metadata query protocol. Also, the architecture makes it easy to implement an application service combining various edge computing devices by designing a communication method suitable for surrounding applications. To confirm the interlocking between the proposed block-type modules, we implemented a hardware-based modularity application system.

FedGCD: Federated Learning Algorithm with GNN based Community Detection for Heterogeneous Data

  • Wooseok Shin;Jitae Shin
    • Journal of Internet Computing and Services
    • /
    • v.24 no.6
    • /
    • pp.1-11
    • /
    • 2023
  • Federated learning (FL) is a ground breaking machine learning paradigm that allow smultiple participants to collaboratively train models in a cloud environment, all while maintaining the privacy of their raw data. This approach is in valuable in applications involving sensitive or geographically distributed data. However, one of the challenges in FL is dealing with heterogeneous and non-independent and identically distributed (non-IID) data across participants, which can result in suboptimal model performance compared to traditionalmachine learning methods. To tackle this, we introduce FedGCD, a novel FL algorithm that employs Graph Neural Network (GNN)-based community detection to enhance model convergence in federated settings. In our experiments, FedGCD consistently outperformed existing FL algorithms in various scenarios: for instance, in a non-IID environment, it achieved an accuracy of 0.9113, a precision of 0.8798,and an F1-Score of 0.8972. In a semi-IID setting, it demonstrated the highest accuracy at 0.9315 and an impressive F1-Score of 0.9312. We also introduce a new metric, nonIIDness, to quantitatively measure the degree of data heterogeneity. Our results indicate that FedGCD not only addresses the challenges of data heterogeneity and non-IIDness but also sets new benchmarks for FL algorithms. The community detection approach adopted in FedGCD has broader implications, suggesting that it could be adapted for other distributed machine learning scenarios, thereby improving model performance and convergence across a range of applications.

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data (빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계)

  • Kim, Do Gyun;Choi, Jin Young
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.4
    • /
    • pp.553-566
    • /
    • 2020
  • Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.

Agent with Low-latency Overcoming Technique for Distributed Cluster-based Machine Learning

  • Seo-Yeon, Gu;Seok-Jae, Moon;Byung-Joon, Park
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.1
    • /
    • pp.157-163
    • /
    • 2023
  • Recently, as businesses and data types become more complex and diverse, efficient data analysis using machine learning is required. However, since communication in the cloud environment is greatly affected by network latency, data analysis is not smooth if information delay occurs. In this paper, SPT (Safe Proper Time) was applied to the cluster-based machine learning data analysis agent proposed in previous studies to solve this delay problem. SPT is a method of remotely and directly accessing memory to a cluster that processes data between layers, effectively improving data transfer speed and ensuring timeliness and reliability of data transfer.