• Title/Summary/Keyword: Distributed Machine Learning

Search Result 128, Processing Time 0.028 seconds

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

  • Ko, Seyoon;Won, Joong-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1077-1094
    • /
    • 2016
  • Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.

Estimation of Forest Carbon Stock in South Korea Using Machine Learning with High-Resolution Remote Sensing Data (고해상도 원격탐사 자료와 기계학습을 이용한 한국 산림의 탄소 저장량 산정)

  • Jaewon Shin;Sujong Jeong;Dongyeong Chang
    • Atmosphere
    • /
    • v.33 no.1
    • /
    • pp.61-72
    • /
    • 2023
  • Accurate estimation of forest carbon stocks is important in establishing greenhouse gas reduction plans. In this study, we estimate the spatial distribution of forest carbon stocks using machine learning techniques based on high-resolution remote sensing data and detailed field survey data. The high-resolution remote sensing data used in this study are Landsat indices (EVI, NDVI, NDII) for monitoring vegetation vitality and Shuttle Radar Topography Mission (SRTM) data for describing topography. We also used the forest growing stock data from the National Forest Inventory (NFI) for estimating forest biomass. Based on these data, we built a model based on machine learning methods and optimized for Korean forest types to calculate the forest carbon stocks per grid unit. With the newly developed estimation model, we created forest carbon stocks maps and estimated the forest carbon stocks in South Korea. As a result, forest carbon stock in South Korea was estimated to be 432,214,520 tC in 2020. Furthermore, we estimated the loss of forest carbon stocks due to the Donghae-Uljin forest fire in 2022 using the forest carbon stock map in this study. The surrounding forest destroyed around the fire area was estimated to be about 24,835 ha and the loss of forest carbon stocks was estimated to be 1,396,457 tC. Our model serves as a tool to estimate spatially distributed local forest carbon stocks and facilitates accounting of real-time changes in the carbon balance as well as managing the LULUCF part of greenhouse gas inventories.

DISTRIBUTED HMI SYSTEM FOR MANAGING ALL SPAN OF PLANT CONTROL AND MAINTENANCE

  • Yoshikawa, Hidekazu
    • Nuclear Engineering and Technology
    • /
    • v.41 no.3
    • /
    • pp.237-246
    • /
    • 2009
  • Digitalization of not only non-safety but also safety-grade I &C systems with full computerized Main Control Room (MCR) is the recent trend of I&C systems of nuclear power plants (NPP) around the world, while plant maintenance has been shifting from traditional time based maintenance to condition based maintenance. In order to cope with the new trend of operation and maintenance in NPP, a concept of online distributed diagnostic system for both plant operation and maintenance has been proposed in order to further improve both the plant efficiency and the work environment of plant operation staff members by organizational learning. In this respect, the research subjects of human machine interface (HMI) for the online distributed diagnostic system are also discussed for supporting the plant personnel at both MCR and local working places in the plant by the application of advanced ICT (Information and Communication Technologies).

A Review on Advanced Methodologies to Identify the Breast Cancer Classification using the Deep Learning Techniques

  • Bandaru, Satish Babu;Babu, G. Rama Mohan
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.4
    • /
    • pp.420-426
    • /
    • 2022
  • Breast cancer is among the cancers that may be healed as the disease diagnosed at early times before it is distributed through all the areas of the body. The Automatic Analysis of Diagnostic Tests (AAT) is an automated assistance for physicians that can deliver reliable findings to analyze the critically endangered diseases. Deep learning, a family of machine learning methods, has grown at an astonishing pace in recent years. It is used to search and render diagnoses in fields from banking to medicine to machine learning. We attempt to create a deep learning algorithm that can reliably diagnose the breast cancer in the mammogram. We want the algorithm to identify it as cancer, or this image is not cancer, allowing use of a full testing dataset of either strong clinical annotations in training data or the cancer status only, in which a few images of either cancers or noncancer were annotated. Even with this technique, the photographs would be annotated with the condition; an optional portion of the annotated image will then act as the mark. The final stage of the suggested system doesn't need any based labels to be accessible during model training. Furthermore, the results of the review process suggest that deep learning approaches have surpassed the extent of the level of state-of-of-the-the-the-art in tumor identification, feature extraction, and classification. in these three ways, the paper explains why learning algorithms were applied: train the network from scratch, transplanting certain deep learning concepts and constraints into a network, and (another way) reducing the amount of parameters in the trained nets, are two functions that help expand the scope of the networks. Researchers in economically developing countries have applied deep learning imaging devices to cancer detection; on the other hand, cancer chances have gone through the roof in Africa. Convolutional Neural Network (CNN) is a sort of deep learning that can aid you with a variety of other activities, such as speech recognition, image recognition, and classification. To accomplish this goal in this article, we will use CNN to categorize and identify breast cancer photographs from the available databases from the US Centers for Disease Control and Prevention.

Distributed Support Vector Machines for Localization on a Sensor Newtork (센서 네트워크에서 위치 측정을 위한 분산 지지 벡터 머신)

  • Moon, Sangook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.944-946
    • /
    • 2014
  • Localization of a sensor network node using machine learning has been recently studied. It is easy for Support vector machines algorithm to implement in high level language enabling parallelism. In this paper, we realized Support vector machine using python language and built a sensor network cluster with 5 Pi's. We also established a Hadoop software framework to employ MapReduce mechanism. We modified the existing Support vector machine algorithm to fit into the distributed hadoop architecture system for localization of a sensor node. In our experiment, we implemented the test sensor network with a variety of parameters and examined based on proficiency, resource evaluation, and processing time.

  • PDF

Construction of Incremental Federated Learning System using Flower (Flower을 사용한 점진적 연합학습시스템 구성)

  • Yun-Hee Kang;Myungju Kang
    • Journal of Platform Technology
    • /
    • v.11 no.4
    • /
    • pp.80-88
    • /
    • 2023
  • To construct a learning model in the field of artificial intelligence, a dataset should be collected and be delivered to the central server where the learning model is constructed. Federated learning is a machine learning method building a global learning model without transmitting data located in a client side in a collaborative manner. It can be used to protect privacy, and after constructing a local trained model on individual clients, the parameters of the local model are aggregated centrally to update the global model. In this paper, we reuse the existing learning parameter to improve federated learning, describe incremental federated learning. For this work, we do experiments using the federated learning framework named Flower, and evaluate the experiment results with regard to elapsed time and precision when executing optimization algorithms.

  • PDF

A Distributed Scheduling Algorithm based on Deep Reinforcement Learning for Device-to-Device communication networks (단말간 직접 통신 네트워크를 위한 심층 강화학습 기반 분산적 스케쥴링 알고리즘)

  • Jeong, Moo-Woong;Kim, Lyun Woo;Ban, Tae-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.11
    • /
    • pp.1500-1506
    • /
    • 2020
  • In this paper, we study a scheduling problem based on reinforcement learning for overlay device-to-device (D2D) communication networks. Even though various technologies for D2D communication networks using Q-learning, which is one of reinforcement learning models, have been studied, Q-learning causes a tremendous complexity as the number of states and actions increases. In order to solve this problem, D2D communication technologies based on Deep Q Network (DQN) have been studied. In this paper, we thus design a DQN model by considering the characteristics of wireless communication systems, and propose a distributed scheduling scheme based on the DQN model that can reduce feedback and signaling overhead. The proposed model trains all parameters in a centralized manner, and transfers the final trained parameters to all mobiles. All mobiles individually determine their actions by using the transferred parameters. We analyze the performance of the proposed scheme by computer simulation and compare it with optimal scheme, opportunistic selection scheme and full transmission scheme.

Design and Utilization of Connected Data Architecture-based AI Service of Mass Distributed Abyss Storage (대용량 분산 Abyss 스토리지의 CDA (Connected Data Architecture) 기반 AI 서비스의 설계 및 활용)

  • Cha, ByungRae;Park, Sun;Seo, JaeHyun;Kim, JongWon;Shin, Byeong-Chun
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.99-107
    • /
    • 2021
  • In addition to the 4th Industrial Revolution and Industry 4.0, the recent megatrends in the ICT field are Big-data, IoT, Cloud Computing, and Artificial Intelligence. Therefore, rapid digital transformation according to the convergence of various industrial areas and ICT fields is an ongoing trend that is due to the development of technology of AI services suitable for the era of the 4th industrial revolution and the development of subdivided technologies such as (Business Intelligence), IA (Intelligent Analytics, BI + AI), AIoT (Artificial Intelligence of Things), AIOPS (Artificial Intelligence for IT Operations), and RPA 2.0 (Robotic Process Automation + AI). This study aims to integrate and advance various machine learning services of infrastructure-side GPU, CDA (Connected Data Architecture) framework, and AI based on mass distributed Abyss storage in accordance with these technical situations. Also, we want to utilize AI business revenue model in various industries.

A Study of Big data-based Machine Learning Techniques for Wheel and Bearing Fault Diagnosis (차륜 및 차축베어링 고장진단을 위한 빅데이터 기반 머신러닝 기법 연구)

  • Jung, Hoon;Park, Moonsung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.1
    • /
    • pp.75-84
    • /
    • 2018
  • Increasing the operation rate of components and stabilizing the operation through timely management of the core parts are crucial for improving the efficiency of the railroad maintenance industry. The demand for diagnosis technology to assess the condition of rolling stock components, which employs history management and automated big data analysis, has increased to satisfy both aspects of increasing reliability and reducing the maintenance cost of the core components to cope with the trend of rapid maintenance. This study developed a big data platform-based system to manage the rolling stock component condition to acquire, process, and analyze the big data generated at onboard and wayside devices of railroad cars in real time. The system can monitor the conditions of the railroad car component and system resources in real time. The study also proposed a machine learning technique that enabled the distributed and parallel processing of the acquired big data and automatic component fault diagnosis. The test, which used the virtual instance generation system of the Amazon Web Service, proved that the algorithm applying the distributed and parallel technology decreased the runtime and confirmed the fault diagnosis model utilizing the random forest machine learning for predicting the condition of the bearing and wheel parts with 83% accuracy.

Phishing Attack Detection Using Deep Learning

  • Alzahrani, Sabah M.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12
    • /
    • pp.213-218
    • /
    • 2021
  • This paper proposes a technique for detecting a significant threat that attempts to get sensitive and confidential information such as usernames, passwords, credit card information, and more to target an individual or organization. By definition, a phishing attack happens when malicious people pose as trusted entities to fraudulently obtain user data. Phishing is classified as a type of social engineering attack. For a phishing attack to happen, a victim must be convinced to open an email or a direct message [1]. The email or direct message will contain a link that the victim will be required to click on. The aim of the attack is usually to install malicious software or to freeze a system. In other instances, the attackers will threaten to reveal sensitive information obtained from the victim. Phishing attacks can have devastating effects on the victim. Sensitive and confidential information can find its way into the hands of malicious people. Another devastating effect of phishing attacks is identity theft [1]. Attackers may impersonate the victim to make unauthorized purchases. Victims also complain of loss of funds when attackers access their credit card information. The proposed method has two major subsystems: (1) Data collection: different websites have been collected as a big data corresponding to normal and phishing dataset, and (2) distributed detection system: different artificial algorithms are used: a neural network algorithm and machine learning. The Amazon cloud was used for running the cluster with different cores of machines. The experiment results of the proposed system achieved very good accuracy and detection rate as well.