• Title/Summary/Keyword: Generate Data

Search Result 3,066, Processing Time 0.028 seconds

Centralized Machine Learning Versus Federated Averaging: A Comparison using MNIST Dataset

  • Peng, Sony;Yang, Yixuan;Mao, Makara;Park, Doo-Soon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.2
    • /
    • pp.742-756
    • /
    • 2022
  • A flood of information has occurred with the rise of the internet and digital devices in the fourth industrial revolution era. Every millisecond, massive amounts of structured and unstructured data are generated; smartphones, wearable devices, sensors, and self-driving cars are just a few examples of devices that currently generate massive amounts of data in our daily. Machine learning has been considered an approach to support and recognize patterns in data in many areas to provide a convenient way to other sectors, including the healthcare sector, government sector, banks, military sector, and more. However, the conventional machine learning model requires the data owner to upload their information to train the model in one central location to perform the model training. This classical model has caused data owners to worry about the risks of transferring private information because traditional machine learning is required to push their data to the cloud to process the model training. Furthermore, the training of machine learning and deep learning models requires massive computing resources. Thus, many researchers have jumped to a new model known as "Federated Learning". Federated learning is emerging to train Artificial Intelligence models over distributed clients, and it provides secure privacy information to the data owner. Hence, this paper implements Federated Averaging with a Deep Neural Network to classify the handwriting image and protect the sensitive data. Moreover, we compare the centralized machine learning model with federated averaging. The result shows the centralized machine learning model outperforms federated learning in terms of accuracy, but this classical model produces another risk, like privacy concern, due to the data being stored in the data center. The MNIST dataset was used in this experiment.

Bio-signal Data Augumentation Technique for CNN based Human Activity Recognition (CNN 기반 인간 동작 인식을 위한 생체신호 데이터의 증강 기법)

  • Gerelbat BatGerel;Chun-Ki Kwon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.2
    • /
    • pp.90-96
    • /
    • 2023
  • Securing large amounts of training data in deep learning neural networks, including convolutional neural networks, is of importance for avoiding overfitting phenomenon or for the excellent performance. However, securing labeled training data in deep learning neural networks is very limited in reality. To overcome this, several augmentation methods have been proposed in the literature to generate an additional large amount of training data through transformation or manipulation of the already acquired traing data. However, unlike training data such as images and texts, it is barely to find an augmentation method in the literature that additionally generates bio-signal training data for convolutional neural network based human activity recognition. Thus, this study proposes a simple but effective augmentation method of bio-signal training data for convolutional neural network based human activity recognition. The usefulness of the proposed augmentation method is validated by showing that human activity is recognized with high accuracy by convolutional neural network trained with its augmented bio-signal training data.

Design of an Aquaculture Decision Support Model for Improving Profitability of Land-based Fish Farm Based on Statistical Data

  • Jaeho Lee;Wongi Jeon;Juhyoung Sung;Kiwon Kwon;Yangseob Kim;Kyungwon Park;Jongho Paik;Sungyoon Cho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.8
    • /
    • pp.2431-2449
    • /
    • 2024
  • As problems such as water pollution and fish species depletion have become serious, a land-based fish farming is receiving a great attention for ensuring stable productivity. In the fish farming, it is important to determine the timing of shipments, as one of key factors to increase net profit on the aquaculture. In this paper, we propose a system for predicting net profit to support decision of timing of shipment using fish farming-related statistical data. The prediction system consists of growth and farm-gate price prediction models, a cost statistics table, and a net profit estimation algorithm. The Gaussian process regression (GPR) model is exploited for weight prediction based on the analysis that represents the characteristics of the weight data of cultured fish under the assumption of Gaussian probability processes. Moreover, the long short-term memory (LSTM) model is applied considering the simple time series characteristics of the farm-gate price data. In the case of GPR model, it allows to cope with data missing problem of the weight data collected from the fish farm in the time and temperature domains. To solve the problem that the data acquired from the fish farm is aperiodic and small in amount, we generate the corresponding data by adopting a data augmentation method based on the Gaussian model. Finally, the estimation method for net profit is proposed by concatenating weight, price, and cost predictions. The performance of the proposed system is analyzed by applying the system to the Korean flounder data.

An Automated Test Data Generator for Debugging Esterel Programs (에스테렐 프로그램 디버깅을 위한 테스트 데이터 자동 생성)

  • Yun, Jeong-Han;Cho, Min-Kyung;Seo, Sun-Ae;Han, Tai-Sook
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.10
    • /
    • pp.793-799
    • /
    • 2009
  • Esterel is an imperative synchronous language that is well-adopted to specify reactive systems. Programmers sometimes want simple validations that can be applied while the system is under development. Since a reactive system reacts to environment changes, a test data is a sequence of input events. Generating proper test data by hand is complex and error-prone. Although several test data generators exist, they are hard to learn and use. Mostly, system designers need test data to reach a specific status of a target program. In this paper, we develop a test data generator to generate test input sequences for debugging Esterel programs. Our tool is focused on easy usage; users can describe test data properties with simple specifications. We show a case study in which the test data generator is used for a practical development process.

H-Anim-based Definition of Character Animation Data (캐릭터 애니메이션 데이터의 H-Anim 기반 정의)

  • Lee, Jae-Wook;Lee, Myeong-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.10
    • /
    • pp.796-800
    • /
    • 2009
  • Currently, there are many software tools that can generate 3D human figure models and animations based on the advancement of computer graphics technology. However, we still have problems in interoperability of human data models in different applications because common data models do not exist. To address this issue, the Web3D Consortium and the ISO/IEC JTC1 SC24 WG6 have developed the H-Anim standard. However, H-Anim does not include human motion data formats although it defines the structure of a human figure. This research is intended to obtain interoperable human animation by defining the data for human motions in H- Anim figures. In this paper, we describe a syntactic method to define motion data for the H-Anim figure and its implementation. In addition, we describe a method of specifying motion parameters necessary for generating animations by using an arbitrary character model data set created by a general graphics tool.

Trip Generation Analysis Using Mobile Phone Data (무선통신 자료를 활용한 통행발생량 분석)

  • Kim, Kyoungtae;Lee, Inmook;Min, Jae Hong;Kwak, Ho-Chan
    • Journal of the Korean Society for Railway
    • /
    • v.18 no.5
    • /
    • pp.481-488
    • /
    • 2015
  • The recent trend in transportation planning information is to reduce traffic survey costs and enhance accuracy by using and converging various sources of external data. In Korea, mobile phone data can help generate useful transportation planning information, thanks to the universal use of mobile phones, which are present in a number greater than that of the population. This paper addresses measures to derive trip generation information from mobile phone data and verifies the value of the system for practical use by correlation analysis with KTDB trip generation data. The results show that trip generation information produced by mobile phone data correlates with existing (KTDB) trip generation data.

Synthetic data augmentation for pixel-wise steel fatigue crack identification using fully convolutional networks

  • Zhai, Guanghao;Narazaki, Yasutaka;Wang, Shuo;Shajihan, Shaik Althaf V.;Spencer, Billie F. Jr.
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.237-250
    • /
    • 2022
  • Structural health monitoring (SHM) plays an important role in ensuring the safety and functionality of critical civil infrastructure. In recent years, numerous researchers have conducted studies to develop computer vision and machine learning techniques for SHM purposes, offering the potential to reduce the laborious nature and improve the effectiveness of field inspections. However, high-quality vision data from various types of damaged structures is relatively difficult to obtain, because of the rare occurrence of damaged structures. The lack of data is particularly acute for fatigue crack in steel bridge girder. As a result, the lack of data for training purposes is one of the main issues that hinders wider application of these powerful techniques for SHM. To address this problem, the use of synthetic data is proposed in this article to augment real-world datasets used for training neural networks that can identify fatigue cracks in steel structures. First, random textures representing the surface of steel structures with fatigue cracks are created and mapped onto a 3D graphics model. Subsequently, this model is used to generate synthetic images for various lighting conditions and camera angles. A fully convolutional network is then trained for two cases: (1) using only real-word data, and (2) using both synthetic and real-word data. By employing synthetic data augmentation in the training process, the crack identification performance of the neural network for the test dataset is seen to improve from 35% to 40% and 49% to 62% for intersection over union (IoU) and precision, respectively, demonstrating the efficacy of the proposed approach.

SHM data anomaly classification using machine learning strategies: A comparative study

  • Chou, Jau-Yu;Fu, Yuguang;Huang, Shieh-Kung;Chang, Chia-Ming
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.77-91
    • /
    • 2022
  • Various monitoring systems have been implemented in civil infrastructure to ensure structural safety and integrity. In long-term monitoring, these systems generate a large amount of data, where anomalies are not unusual and can pose unique challenges for structural health monitoring applications, such as system identification and damage detection. Therefore, developing efficient techniques is quite essential to recognize the anomalies in monitoring data. In this study, several machine learning techniques are explored and implemented to detect and classify various types of data anomalies. A field dataset, which consists of one month long acceleration data obtained from a long-span cable-stayed bridge in China, is employed to examine the machine learning techniques for automated data anomaly detection. These techniques include the statistic-based pattern recognition network, spectrogram-based convolutional neural network, image-based time history convolutional neural network, image-based time-frequency hybrid convolution neural network (GoogLeNet), and proposed ensemble neural network model. The ensemble model deliberately combines different machine learning models to enhance anomaly classification performance. The results show that all these techniques can successfully detect and classify six types of data anomalies (i.e., missing, minor, outlier, square, trend, drift). Moreover, both image-based time history convolutional neural network and GoogLeNet are further investigated for the capability of autonomous online anomaly classification and found to effectively classify anomalies with decent performance. As seen in comparison with accuracy, the proposed ensemble neural network model outperforms the other three machine learning techniques. This study also evaluates the proposed ensemble neural network model to a blind test dataset. As found in the results, this ensemble model is effective for data anomaly detection and applicable for the signal characteristics changing over time.

Research on Mining Technology for Explainable Decision Making (설명가능한 의사결정을 위한 마이닝 기술)

  • Kyungyong Chung
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.4
    • /
    • pp.186-191
    • /
    • 2023
  • Data processing techniques play a critical role in decision-making, including handling missing and outlier data, prediction, and recommendation models. This requires a clear explanation of the validity, reliability, and accuracy of all processes and results. In addition, it is necessary to solve data problems through explainable models using decision trees, inference, etc., and proceed with model lightweight by considering various types of learning. The multi-layer mining classification method that applies the sixth principle is a method that discovers multidimensional relationships between variables and attributes that occur frequently in transactions after data preprocessing. This explains how to discover significant relationships using mining on transactions and model the data through regression analysis. It develops scalable models and logistic regression models and proposes mining techniques to generate class labels through data cleansing, relevance analysis, data transformation, and data augmentation to make explanatory decisions.

Use of Tree Traversal Algorithms for Chain Formation in the PEGASIS Data Gathering Protocol for Wireless Sensor Networks

  • Meghanathan, Natarajan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.3 no.6
    • /
    • pp.612-627
    • /
    • 2009
  • The high-level contribution of this paper is to illustrate the effectiveness of using graph theory tree traversal algorithms (pre-order, in-order and post-order traversals) to generate the chain of sensor nodes in the classical Power Efficient-Gathering in Sensor Information Systems (PEGASIS) data aggregation protocol for wireless sensor networks. We first construct an undirected minimum-weight spanning tree (ud-MST) on a complete sensor network graph, wherein the weight of each edge is the Euclidean distance between the constituent nodes of the edge. A Breadth-First-Search of the ud-MST, starting with the node located closest to the center of the network, is now conducted to iteratively construct a rooted directed minimum-weight spanning tree (rd-MST). The three tree traversal algorithms are then executed on the rd-MST and the node sequence resulting from each of the traversals is used as the chain of nodes for the PEGASIS protocol. Simulation studies on PEGASIS conducted for both TDMA and CDMA systems illustrate that using the chain of nodes generated from the tree traversal algorithms, the node lifetime can improve as large as by 19%-30% and at the same time, the energy loss per node can be 19%-35% lower than that obtained with the currently used distance-based greedy heuristic.