• Title/Summary/Keyword: Distributed Machine Learning

Search Result 128, Processing Time 0.026 seconds

The study of a full cycle semi-automated business process re-engineering: A comprehensive framework

  • Lee, Sanghwa;Sutrisnowati, Riska A.;Won, Seokrae;Woo, Jong Seong;Bae, Hyerim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.11
    • /
    • pp.103-109
    • /
    • 2018
  • This paper presents an idea and framework to automate a full cycle business process management and re-engineering by integrating traditional business process management systems, process mining, data mining, machine learning, and simulation. We build our framework on the cloud-based platform such that various data sources can be incorporated. We design our systems to be extensible so that not only beneficial for practitioners of BPM, but also for researchers. Our framework can be used as a test bed for researchers without the complication of system integration. The automation of redesigning phase and selecting a baseline process model for deployment are the two main contributions of this study. In the redesigning phase, we deal with both the analysis of the existing process model and what-if analysis on how to improve the process at the same time, Additionally, improving a business process can be applied in a case by case basis that needs a lot of trial and error and huge data. In selecting the baseline process model, we need to compare many probable routes of business execution and calculate the most efficient one in respect to production cost and execution time. We also discuss the challenges and limitation of the framework, including the systems adoptability, technical difficulties and human factors.

A Load Balancing Scheme for Distributed SDN Based on Harmony Search with K-means Clustering (K-means 군집화 및 Harmony Search 알고리즘을 이용한 분산 SDN의 부하 분산 기법)

  • Kim, Se-Jun;Yoo, Seung-Eon;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.29-30
    • /
    • 2019
  • 본 논문에서는 다중 컨트롤러가 존재하는 분산 SDN 환경에서 과도한 제어 메시지로 인한 과부하된 컨트롤러의 부하를 줄이기 위하여 이주할 스위치를 K-means 군집화와 Harmony Search(HS)를 기반으로 선정 하는 기법을 제안하였다. 기존에 HS를 이용하여 이주할 스위치를 선택하는 기법이 제시되었으나, 시간 소모에 비하여 정확도가 부족한 단점이 있다. 또한 Harmony Memory(HM) 구축을 위해 메모리 소모 또한 크다. 이를 해결하기 위하여 본 논문에서는 유클리드 거리를 기반으로 하는 K-means 군집화를 이용하여 이주할 스위치를 골라내어 HM의 크기를 줄이고 이주 효율을 향상 시킨다.

  • PDF

The Singular Economy: End of the Digital/Physical Divide

  • Meceda, Ann M.;Vonortas, Nicholas S.
    • STI Policy Review
    • /
    • v.9 no.1
    • /
    • pp.133-157
    • /
    • 2018
  • The divide between the "digital" economy and the traditional "physical" economy is outdated. In fact, we are in a transition to a singular economy. This paper classifies economic objects (including actors) as either physical or virtual and argues that due to emerging technologies, these objects are interacting with each other in both physical and increasingly digital spheres in tandem. This paper recognizes the elemental difference between atoms and bytes but argues that physical and digital economic activities are becoming inseparably intertwined. Furthermore, arbitrarily dividing the economy into two categories - one "physical" and the other "digital" - distorts the overall view of the actual execution of economic activity. A wide range of innovations emerging concurrently is fueling the transition to a singular economy. Often referred to as the elements of the Fourth Industrial Revolution (4IR), four emerging technological areas are reviewed here: distributed ledger technology, artificial intelligence/machine learning/data sciences, biometrics and remote sensor technologies, and access infrastructure (universal internet access/electricity/cloud computing). The financial services sector is presented as a case study for the potential impact of these 4IR technologies and the blurring physical/digital line. To reach the potential of these innovations and a truly singular economy, it requires the concurrent development of social, organizational, and regulatory innovations, though they lag in terms of technological progress thus far.

A Study on a Smart Home Access Control using Lightweight Proof of Work (경량 작업증명시스템을 이용한 스마트 홈 접근제어 연구)

  • Kim, DaeYoub
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.931-941
    • /
    • 2020
  • As natural language processing technology using machine learning develops, a Smart Home Network Service (SHNS) is drawing attention again. However, it is difficult to apply a standardized authentication scheme for SHNS because of the diversity of components and the variability of users. Blockchain is proposed for data authentication in a distributed environment. But there is a limit to applying it to SHNS due to the computational overhead required when implementing a proof-of-work system. In this paper, a lightweight work proof system is proposed. The proposed lightweight proof-of-work system is proposed to manage block generation by controlling the work authority of the device. In addition, this paper proposes an access control scheme for SHNS.

An IPSO-KELM based malicious behaviour detection and SHA256-RSA based secure data transmission in the cloud paradigm

  • Ponnuviji, N.P.;Prem, M. Vigilson
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.4011-4027
    • /
    • 2021
  • Cloud Computing has emerged as an extensively used technology not only in the IT sector but almost in all sectors. As the nature of the cloud is distributed and dynamic, the jeopardies present in the current implementations of virtualization, numerous security threats and attacks have been reported. Considering the potent architecture and the system complexity, it is indispensable to adopt fundamentals. This paper proposes a secure authentication and data sharing scheme for providing security to the cloud data. An efficient IPSO-KELM is proposed for detecting the malicious behaviour of the user. Initially, the proposed method starts with the authentication phase of the data sender. After authentication, the sender sends the data to the cloud, and the IPSO-KELM identifies if the received data from the sender is an attacked one or normal data i.e. the algorithm identifies if the data is received from a malicious sender or authenticated sender. If the data received from the sender is identified to be normal data, then the data is securely shared with the data receiver using SHA256-RSA algorithm. The upshot of the proposed method are scrutinized by identifying the dissimilarities with the other existing techniques to confirm that the proposed IPSO-KELM and SHA256-RSA works well for malicious user detection and secure data sharing in the cloud.

Prediction of Global Industrial Water Demand using Machine Learning

  • Panda, Manas Ranjan;Kim, Yeonjoo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.156-156
    • /
    • 2022
  • Explicitly spatially distributed and reliable data on industrial water demand is very much important for both policy makers and researchers in order to carry a region-specific analysis of water resources management. However, such type of data remains scarce particularly in underdeveloped and developing countries. Current research is limited in using different spatially available socio-economic, climate data and geographical data from different sources in accordance to predict industrial water demand at finer resolution. This study proposes a random forest regression (RFR) model to predict the industrial water demand at 0.50× 0.50 spatial resolution by combining various features extracted from multiple data sources. The dataset used here include National Polar-orbiting Partnership (NPP)/Visible Infrared Imaging Radiometer Suite (VIIRS) night-time light (NTL), Global Power Plant database, AQUASTAT country-wise industrial water use data, Elevation data, Gross Domestic Product (GDP), Road density, Crop land, Population, Precipitation, Temperature, and Aridity. Compared with traditional regression algorithms, RF shows the advantages of high prediction accuracy, not requiring assumptions of a prior probability distribution, and the capacity to analyses variable importance. The final RF model was fitted using the parameter settings of ntree = 300 and mtry = 2. As a result, determinate coefficients value of 0.547 is achieved. The variable importance of the independent variables e.g. night light data, elevation data, GDP and population data used in the training purpose of RF model plays the major role in predicting the industrial water demand.

  • PDF

A Design on Informal Big Data Topic Extraction System Based on Spark Framework (Spark 프레임워크 기반 비정형 빅데이터 토픽 추출 시스템 설계)

  • Park, Kiejin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.521-526
    • /
    • 2016
  • As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user's real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.

Color Analyses on Digital Photos Using Machine Learning and KSCA - Focusing on Korean Natural Daytime/nighttime Scenery - (머신러닝과 KSCA를 활용한 디지털 사진의 색 분석 -한국 자연 풍경 낮과 밤 사진을 중심으로-)

  • Gwon, Huieun;KOO, Ja Joon
    • Trans-
    • /
    • v.12
    • /
    • pp.51-79
    • /
    • 2022
  • This study investigates the methods for deriving colors which can serve as a reference to users such as designers and or contents creators who search for online images from the web portal sites using specific words for color planning and more. Two experiments were conducted in order to accomplish this. Digital scenery photos within the geographic scope of Korea were downloaded from web portal sites, and those photos were studied to find out what colors were used to describe daytime and nighttime. Machine learning was used as the study methodology to classify colors in daytime and nighttime, and KSCA was used to derive the color frequency of daytime and nighttime photos and to compare and analyze the two results. The results of classifying the colors of daytime and nighttime photos using machine learning show that, when classifying the colors by 51~100%, the area of daytime colors was approximately 2.45 times greater than that of nighttime colors. The colors of the daytime class were distributed by brightness with white as its center, while that of the nighttime class was distributed with black as its center. Colors that accounted for over 70% of the daytime class were 647, those over 70% of the nighttime class were 252, and the rest (31-69%) were 101. The number of colors in the middle area was low, while other colors were classified relatively clearly into day and night. The resulting color distributions in the daytime and nighttime classes were able to provide the borderline color values of the two classes that are classified by brightness. As a result of analyzing the frequency of digital photos using KSCA, colors around yellow were expressed in generally bright daytime photos, while colors around blue value were expressed in dark night photos. For frequency of daytime photos, colors on the upper 40% had low chroma, almost being achromatic. Also, colors that are close to white and black showed the highest frequency, indicating a large difference in brightness. Meanwhile, for colors with frequency from top 5 to 10, yellow green was expressed darkly, and navy blue was expressed brightly, partially composing a complex harmony. When examining the color band, various colors, brightness, and chroma including light blue, achromatic colors, and warm colors were shown, failing to compose a generally harmonious arrangement of colors. For the frequency of nighttime photos, colors in approximately the upper 50% are dark colors with a brightness value of 2 (Munsell signal). In comparison, the brightness of middle frequency (50-80%) is relatively higher (brightness values of 3-4), and the brightness difference of various colors was large in the lower 20%. Colors that are not cool colors could be found intermittently in the lower 8% of frequency. When examining the color band, there was a general harmonious arrangement of colors centered on navy blue. As the results of conducting the experiment using two methods in this study, machine learning could classify colors into two or more classes, and could evaluate how close an image was with certain colors to a certain class. This method cannot be used if an image cannot be classified into a certain class. The result of such color distribution would serve as a reference when determining how close a certain color is to one of the two classes when the color is used as a dominant color in the base or background color of a certain design. Also, when dividing the analyzed images into several classes, even colors that have not been used in the analyzed image can be determined to find out how close they are to a certain class according to the color distribution properties of each class. Nevertheless, the results cannot be used to find out whether a specific color was used in the class and by how much it was used. To investigate such an issue, frequency analysis was conducted using KSCA. The color frequency could be measured within the range of images used in the experiment. The resulting values of color distribution and frequency from this study would serve as references for color planning of digital design regarding natural scenery in the geographic scope of Korea. Also, the two experiments are meaningful attempts for searching the methods for deriving colors that can be a useful reference among numerous images for content creator users of the relevant field.

Performance Optimization Strategies for Fully Utilizing Apache Spark (아파치 스파크 활용 극대화를 위한 성능 최적화 기법)

  • Myung, Rohyoung;Yu, Heonchang;Choi, Sukyong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.1
    • /
    • pp.9-18
    • /
    • 2018
  • Enhancing performance of big data analytics in distributed environment has been issued because most of the big data related applications such as machine learning techniques and streaming services generally utilize distributed computing frameworks. Thus, optimizing performance of those applications at Spark has been actively researched. Since optimizing performance of the applications at distributed environment is challenging because it not only needs optimizing the applications themselves but also requires tuning of the distributed system configuration parameters. Although prior researches made a huge effort to improve execution performance, most of them only focused on one of three performance optimization aspect: application design, system tuning, hardware utilization. Thus, they couldn't handle an orchestration of those aspects. In this paper, we deeply analyze and model the application processing procedure of the Spark. Through the analyzed results, we propose performance optimization schemes for each step of the procedure: inner stage and outer stage. We also propose appropriate partitioning mechanism by analyzing relationship between partitioning parallelism and performance of the applications. We applied those three performance optimization schemes to WordCount, Pagerank, and Kmeans which are basic big data analytics and found nearly 50% performance improvement when all of those schemes are applied.

Animal Infectious Diseases Prevention through Big Data and Deep Learning (빅데이터와 딥러닝을 활용한 동물 감염병 확산 차단)

  • Kim, Sung Hyun;Choi, Joon Ki;Kim, Jae Seok;Jang, Ah Reum;Lee, Jae Ho;Cha, Kyung Jin;Lee, Sang Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.137-154
    • /
    • 2018
  • Animal infectious diseases, such as avian influenza and foot and mouth disease, occur almost every year and cause huge economic and social damage to the country. In order to prevent this, the anti-quarantine authorities have tried various human and material endeavors, but the infectious diseases have continued to occur. Avian influenza is known to be developed in 1878 and it rose as a national issue due to its high lethality. Food and mouth disease is considered as most critical animal infectious disease internationally. In a nation where this disease has not been spread, food and mouth disease is recognized as economic disease or political disease because it restricts international trade by making it complex to import processed and non-processed live stock, and also quarantine is costly. In a society where whole nation is connected by zone of life, there is no way to prevent the spread of infectious disease fully. Hence, there is a need to be aware of occurrence of the disease and to take action before it is distributed. Epidemiological investigation on definite diagnosis target is implemented and measures are taken to prevent the spread of disease according to the investigation results, simultaneously with the confirmation of both human infectious disease and animal infectious disease. The foundation of epidemiological investigation is figuring out to where one has been, and whom he or she has met. In a data perspective, this can be defined as an action taken to predict the cause of disease outbreak, outbreak location, and future infection, by collecting and analyzing geographic data and relation data. Recently, an attempt has been made to develop a prediction model of infectious disease by using Big Data and deep learning technology, but there is no active research on model building studies and case reports. KT and the Ministry of Science and ICT have been carrying out big data projects since 2014 as part of national R &D projects to analyze and predict the route of livestock related vehicles. To prevent animal infectious diseases, the researchers first developed a prediction model based on a regression analysis using vehicle movement data. After that, more accurate prediction model was constructed using machine learning algorithms such as Logistic Regression, Lasso, Support Vector Machine and Random Forest. In particular, the prediction model for 2017 added the risk of diffusion to the facilities, and the performance of the model was improved by considering the hyper-parameters of the modeling in various ways. Confusion Matrix and ROC Curve show that the model constructed in 2017 is superior to the machine learning model. The difference between the2016 model and the 2017 model is that visiting information on facilities such as feed factory and slaughter house, and information on bird livestock, which was limited to chicken and duck but now expanded to goose and quail, has been used for analysis in the later model. In addition, an explanation of the results was added to help the authorities in making decisions and to establish a basis for persuading stakeholders in 2017. This study reports an animal infectious disease prevention system which is constructed on the basis of hazardous vehicle movement, farm and environment Big Data. The significance of this study is that it describes the evolution process of the prediction model using Big Data which is used in the field and the model is expected to be more complete if the form of viruses is put into consideration. This will contribute to data utilization and analysis model development in related field. In addition, we expect that the system constructed in this study will provide more preventive and effective prevention.