• Title/Summary/Keyword: Distributed Machine Learning

Search Result 127, Processing Time 0.023 seconds

Distributed Processing System Design and Implementation for Feature Extraction from Large-Scale Malicious Code (대용량 악성코드의 특징 추출 가속화를 위한 분산 처리 시스템 설계 및 구현)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.2
    • /
    • pp.35-40
    • /
    • 2019
  • Traditional Malware Detection is susceptible for detecting malware which is modified by polymorphism or obfuscation technology. By learning patterns that are embedded in malware code, machine learning algorithms can detect similar behaviors and replace the current detection methods. Data must collected continuously in order to learn malicious code patterns that change over time. However, the process of storing and processing a large amount of malware files is accompanied by high space and time complexity. In this paper, an HDFS-based distributed processing system is designed to reduce space complexity and accelerate feature extraction time. Using a distributed processing system, we extract two API features based on filtering basis, 2-gram feature and APICFG feature and the generalization performance of ensemble learning models is compared. In experiments, the time complexity of the feature extraction was improved about 3.75 times faster than the processing time of a single computer, and the space complexity was about 5 times more efficient. The 2-gram feature was the best when comparing the classification performance by feature, but the learning time was long due to high dimensionality.

Combination Methods for Distribution Codes (분산 부호의 결합 기법)

  • Chung, Jin-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.365-366
    • /
    • 2022
  • The distributed code is a type of linear codes that can be used for coding and federated learning for privacy. In the distributed code, privacy or confidential information is not dependent to each other because the information of each code is not included with other codes. In this paper, we examine the properties of these distributed codes and present techniques for synthesizing new sets of distributed codes from previously known distributed codes. In addition, we propose several scenarios in which combined codes can be used.

  • PDF

Estimating GARCH models using kernel machine learning (커널기계 기법을 이용한 일반화 이분산자기회귀모형 추정)

  • Hwang, Chang-Ha;Shin, Sa-Im
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.419-425
    • /
    • 2010
  • Kernel machine learning is gaining a lot of popularities in analyzing large or high dimensional nonlinear data. We use this technique to estimate a GARCH model for predicting the conditional volatility of stock market returns. GARCH models are usually estimated using maximum likelihood (ML) procedures, assuming that the data are normally distributed. In this paper, we show that GARCH models can be estimated using kernel machine learning and that kernel machine has a higher predicting ability than ML methods and support vector machine, when estimating volatility of financial time series data with fat tail.

Machine Learning-based Detection of DoS and DRDoS Attacks in IoT Networks

  • Yeo, Seung-Yeon;Jo, So-Young;Kim, Jiyeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.7
    • /
    • pp.101-108
    • /
    • 2022
  • We propose an intrusion detection model that detects denial-of-service(DoS) and distributed reflection denial-of-service(DRDoS) attacks, based on the empirical data of each internet of things(IoT) device by training system and network metrics that can be commonly collected from various IoT devices. First, we collect 37 system and network metrics from each IoT device considering IoT attack scenarios; further, we train them using six types of machine learning models to identify the most effective machine learning models as well as important metrics in detecting and distinguishing IoT attacks. Our experimental results show that the Random Forest model has the best performance with accuracy of over 96%, followed by the K-Nearest Neighbor model and Decision Tree model. Of the 37 metrics, we identified five types of CPU, memory, and network metrics that best imply the characteristics of the attacks in all the experimental scenarios. Furthermore, we found out that packets with higher transmission speeds than larger size packets represent the characteristics of DoS and DRDoS attacks more clearly in IoT networks.

Naive Bayes Learning Algorithm based on Map-Reduce Programming Model (Map-Reduce 프로그래밍 모델 기반의 나이브 베이스 학습 알고리즘)

  • Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.208-209
    • /
    • 2011
  • In this paper, we introduce a Naive Bayes learning algorithm for learning and reasoning in Map-Reduce model based environment. For this purpose, we use Apache Mahout to execute Distributed Naive Bayes on University of California, Irvine (UCI) benchmark data sets. From the experimental results, we see that Apache Mahout' s Distributed Naive Bayes algorithm is comparable to WEKA' s Naive Bayes algorithm in terms of performance. These results indicates that in the future Big Data environment, Map-Reduce model based systems such as Apache Mahout can be promising for machine learning usage.

  • PDF

Forecasting of Iron Ore Prices using Machine Learning (머신러닝을 이용한 철광석 가격 예측에 대한 연구)

  • Lee, Woo Chang;Kim, Yang Sok;Kim, Jung Min;Lee, Choong Kwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.2
    • /
    • pp.57-72
    • /
    • 2020
  • The price of iron ore has continued to fluctuate with high demand and supply from many countries and companies. In this business environment, forecasting the price of iron ore has become important. This study developed the machine learning model forecasting the price of iron ore a one month after the trading events. The forecasting model used distributed lag model and deep learning models such as MLP (Multi-layer perceptron), RNN (Recurrent neural network) and LSTM (Long short-term memory). According to the results of comparing individual models through metrics, LSTM showed the lowest predictive error. Also, as a result of comparing the models using the ensemble technique, the distributed lag and LSTM ensemble model showed the lowest prediction.

Machine Learning-based Phase Picking Algorithm of P and S Waves for Distributed Acoustic Sensing Data (분포형 광섬유 센서 자료 적용을 위한 기계학습 기반 P, S파 위상 발췌 알고리즘 개발)

  • Yonggyu, Choi;Youngseok, Song;Soon Jee, Seol;Joongmoo, Byun
    • Geophysics and Geophysical Exploration
    • /
    • v.25 no.4
    • /
    • pp.177-188
    • /
    • 2022
  • Recently, the application of distributed acoustic sensors (DAS), which can replace geophones and seismometers, has significantly increased along with interest in micro-seismic monitoring technique, which is one of the CO2 storage monitoring techniques. A significant amount of temporally and spatially continuous data is recorded in a DAS monitoring system, thereby necessitating fast and accurate data processing techniques. Because event detection and seismic phase picking are the most basic data processing techniques, they should be performed on all data. In this study, a machine learning-based P, S wave phase picking algorithm was developed to compensate for the limitations of conventional phase picking algorithms, and it was modified using a transfer learning technique for the application of DAS data consisting of a single component with a low signal-to-noise ratio. Our model was constructed by modifying the convolution-based EQTransformer, which performs well in phase picking, to the ResUNet structure. Not only the global earthquake dataset, STEAD but also the augmented dataset was used as training datasets to enhance the prediction performance on the unseen characteristics of the target dataset. The performance of the developed algorithm was verified using K-net and KiK-net data with characteristics different from the training data. Additionally, after modifying the trained model to suit DAS data using the transfer learning technique, the performance was verified by applying it to the DAS field data measured in the Pohang Janggi basin.

Simulation of a CIM Workflow System Using Parallel Virtual Machine (PVM)

  • Chang-Ouk Kim
    • Journal of the Korea Society for Simulation
    • /
    • v.5 no.2
    • /
    • pp.13-24
    • /
    • 1996
  • Workflow is an ordered sequence of interdependent component data activities each of which can be executed on an integrated information system by accessing a remote information system. In our previous research [4], we proposed a distributed CIM Workflow system which consists of a workflow execution model called DAF-Net and an agent-based information systems called AIMIS. Given a component data activity, there needs an interaction protocol among agents which allocates the component data activity to a relevant information systems exist. The objective of this research is to propose and test two protocols: ARR(Asynchronous Request and Response)protocol and NCL(Negotiation with Case based Learning) protocol. To test the effectiveness of the protocols, we applied the PVM(Parallel Virtual Machine) software to simulate the distributed CIM Workflow system. PVM provides a distributed computing environment in which users can run different software processes in different computers while allowing communication among the processes.

  • PDF

Behavior Learning and Evolution of Swarm Robot System using Support Vector Machine (SVM을 이용한 군집로봇의 행동학습 및 진화)

  • Seo, Sang-Wook;Yang, Hyun-Chang;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.5
    • /
    • pp.712-717
    • /
    • 2008
  • In swarm robot systems, each robot must act by itself according to the its states and environments, and if necessary, must cooperate with other robots in order to carry out a given task. Therefore it is essential that each robot has both learning and evolution ability to adapt the dynamic environments. In this paper, reinforcement learning method with SVM based on structural risk minimization and distributed genetic algorithms is proposed for behavior learning and evolution of collective autonomous mobile robots. By distributed genetic algorithm exchanging the chromosome acquired under different environments by communication each robot can improve its behavior ability. Specially, in order to improve the performance of evolution, selective crossover using the characteristic of reinforcement learning that basis of SVM is adopted in this paper.

Efficient distributed consensus optimization based on patterns and groups for federated learning (연합학습을 위한 패턴 및 그룹 기반 효율적인 분산 합의 최적화)

  • Kang, Seung Ju;Chun, Ji Young;Noh, Geontae;Jeong, Ik Rae
    • Journal of Internet Computing and Services
    • /
    • v.23 no.4
    • /
    • pp.73-85
    • /
    • 2022
  • In the era of the 4th industrial revolution, where automation and connectivity are maximized with artificial intelligence, the importance of data collection and utilization for model update is increasing. In order to create a model using artificial intelligence technology, it is usually necessary to gather data in one place so that it can be updated, but this can infringe users' privacy. In this paper, we introduce federated learning, a distributed machine learning method that can update models in cooperation without directly sharing distributed stored data, and introduce a study to optimize distributed consensus among participants without an existing server. In addition, we propose a pattern and group-based distributed consensus optimization algorithm that uses an algorithm for generating patterns and groups based on the Kirkman Triple System, and performs parallel updates and communication. This algorithm guarantees more privacy than the existing distributed consensus optimization algorithm and reduces the communication time until the model converges.