• Title/Summary/Keyword: Generate Data

Search Result 3,066, Processing Time 0.031 seconds

Big Data-based Sensor Data Processing and Analysis for IoT Environment (IoT 환경을 위한 빅데이터 기반 센서 데이터 처리 및 분석)

  • Shin, Dong-Jin;Park, Ji-Hun;Kim, Ju-Ho;Kwak, Kwang-Jin;Park, Jeong-Min;Kim, Jeong-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.117-126
    • /
    • 2019
  • The data generated in the IoT environment is very diverse. Especially, the development of the fourth industrial revolution has made it possible to increase the number of fixed and unstructured data generated in manufacturing facilities such as Smart Factory. With Big Data related solutions, it is possible to collect, store, process, analyze and visualize various large volumes of data quickly and accurately. Therefore, in this paper, we will directly generate data using Raspberry Pi used in IoT environment, and analyze using various Big Data solutions. Collected by using an Sqoop solution collected and stored in the database to the HDFS, and the process is to process the data by using the solutions available Hive parallel processing is associated with Hadoop. Finally, the analysis and visualization of the processed data via the R programming will be used universally to end verification.

Normal data based rotating machine anomaly detection using CNN with self-labeling

  • Bae, Jaewoong;Jung, Wonho;Park, Yong-Hwa
    • Smart Structures and Systems
    • /
    • v.29 no.6
    • /
    • pp.757-766
    • /
    • 2022
  • To train deep learning algorithms, a sufficient number of data are required. However, in most engineering systems, the acquisition of fault data is difficult or sometimes not feasible, while normal data are secured. The dearth of data is one of the major challenges to developing deep learning models, and fault diagnosis in particular cannot be made in the absence of fault data. With this context, this paper proposes an anomaly detection methodology for rotating machines using only normal data with self-labeling. Since only normal data are used for anomaly detection, a self-labeling method is used to generate a new labeled dataset. The overall procedure includes the following three steps: (1) transformation of normal data to self-labeled data based on a pretext task, (2) training the convolutional neural networks (CNN), and (3) anomaly detection using defined anomaly score based on the softmax output of the trained CNN. The softmax value of the abnormal sample shows different behavior from the normal softmax values. To verify the proposed method, four case studies were conducted, on the Case Western Reserve University (CWRU) bearing dataset, IEEE PHM 2012 data challenge dataset, PHMAP 2021 data challenge dataset, and laboratory bearing testbed; and the results were compared to those of existing machine learning and deep learning methods. The results showed that the proposed algorithm could detect faults in the bearing testbed and compressor with over 99.7% accuracy. In particular, it was possible to detect not only bearing faults but also structural faults such as unbalance and belt looseness with very high accuracy. Compared with the existing GAN, the autoencoder-based anomaly detection algorithm, the proposed method showed high anomaly detection performance.

A Study of Pattern Defect Data Augmentation with Image Generation Model (이미지 생성 모델을 이용한 패턴 결함 데이터 증강에 대한 연구)

  • Byungjoon Kim;Yongduek Seo
    • Journal of the Korea Computer Graphics Society
    • /
    • v.29 no.3
    • /
    • pp.79-84
    • /
    • 2023
  • Image generation models have been applied in various fields to overcome data sparsity, time and cost issues. However, it has limitations in generating images from regular pattern images and detecting defects in such data. In this paper, we verified the feasibility of the image generation model to generate pattern images and applied it to data augmentation for defect detection of OLED panels. The data required to train an OLED defect detection model is difficult to obtain due to the high cost of OLED panels. Therefore, even if the data set is obtained, it is necessary to define and classify various defect types. This paper introduces an OLED panel defect data acquisition system that acquires a hypothetical data set and augments the data with an image generation model. In addition, the difficulty of generating pattern images in the diffusion model is identified and a possibility is proposed, and the limitations of data augmentation and defect detection data augmentation using the image generation model are improved.

Real-Time Indexing Performance Optimization of Search Platform Based on Big Data Cluster (빅데이터 클러스터 기반 검색 플랫폼의 실시간 인덱싱 성능 최적화)

  • Nayeon Keum;Dongchul Park
    • Journal of Platform Technology
    • /
    • v.11 no.6
    • /
    • pp.89-105
    • /
    • 2023
  • With the development of information technology, most of the information has been converted into digital information, leading to the Big Data era. The demand for search platform has increased to enhance accessibility and usability of information in the databases. Big data search software platforms consist of two main components: (1) an indexing component to generate and store data indices for a fast and efficient data search and (2) a searching component to look up the given data fast. As an amount of data has explosively increased, data indexing performance has become a key performance bottleneck of big data search platforms. Though many companies adopted big data search platforms, relatively little research has been made to improve indexing performance. This research study employs Elasticsearch platform, one of the most famous enterprise big data search platforms, and builds physical clusters of 3 nodes to investigate optimal indexing performance configurations. Our comprehensive experiments and studies demonstrate that the proposed optimal Elasticsearch configuration achieves high indexing performance by an average of 3.13 times.

  • PDF

Data Cleaning System using XMDR-DAI in Cloud (클라우드 환경에서 XMDR-DAI를 이용한 데이터 정제 시스템)

  • Moon, Seok-Jae;Jeong, Kye-Dong;Lee, Jong-Yong;Cho, Young-Keun
    • Journal of Digital Convergence
    • /
    • v.12 no.2
    • /
    • pp.263-270
    • /
    • 2014
  • In cloud environment, business intelligence data warehouse is used for decision making and enterprise policy. But if new system is added in cloud environment, much cost and time is needed due to heterogenous characteristics in data integration. This paper suggests a data cleaning system for business intelligence in cloud environment. The proposed system minimizes the effect of local system when it integrates distributed system using XMDR-DAI. And this system provides standardized information to generate information of data warehouse in real time. Also the proposed system saves cost and time by integrating the data without a change of existed system. And it can improve quality of information by generating coherent information through data extraction and cleaning work in real time.

Design of Integrated Database Schema for Improving Usability of Rural Information (농촌정보 활용성 증대를 위한 통합데이터베이스 설계)

  • Lee, Ji-Min;Kyo, Suh;Kim, Han-Joong;Lee, Jeong-Jae
    • Journal of Korean Society of Rural Planning
    • /
    • v.11 no.2 s.27
    • /
    • pp.43-49
    • /
    • 2005
  • As information has been brought to public attention, information storage as well as information usability has been important. Rural information is produced in many areas and institutions. However, it is difficult to use rural information comprehensively. Since formats for management are various, it is difficult to have unified frame. In this research, a schema of database fer integrating rural data is designed to improve usability using dimensional modeling. First of all, rural data are analyzed for designing integrated rural database schema. Rural data used are 'National Agricultural Statistics' and 'Gun annual statistical report'. Analysis shows that there are three considerations; administrative district, time-dependency and classification of data. Considering these three requisite, we designed database schema using dimensional modeling. The reason of using dimensional modeling is to improve usability and effectiveness. If the database was designed using ER modeling, many tables have to be joined every searching time. Separately from integrated rural database schema, user's database schema is designed considering usability. Through user's database, users can modify data or generate new data and save these processes. These make it possible to use generated data repeatedly. We evaluate usability, contribution, and effectiveness of data manipulation on the integrated rural database. We propose an integrated rural database structure improving the accessibility and usability of rural data and information and verified the data model based on a practical example.

KISTI-ML Platform: A Community-based Rapid AI Model Development Tool for Scientific Data (KISTI-ML 플랫폼: 과학기술 데이터를 위한 커뮤니티 기반 AI 모델 개발 도구)

  • Lee, Jeongcheol;Ahn, Sunil
    • Journal of Internet Computing and Services
    • /
    • v.20 no.6
    • /
    • pp.73-84
    • /
    • 2019
  • Machine learning as a service, the so-called MLaaS, has recently attracted much attention in almost all industries and research groups. The main reason for this is that you do not need network servers, storage, or even data scientists, except for the data itself, to build a productive service model. However, machine learning is often very difficult for most developers, especially in traditional science due to the lack of well-structured big data for scientific data. For experiment or application researchers, the results of an experiment are rarely shared with other researchers, so creating big data in specific research areas is also a big challenge. In this paper, we introduce the KISTI-ML platform, a community-based rapid AI model development for scientific data. It is a place where machine learning beginners use their own data to automatically generate code by providing a user-friendly online development environment. Users can share datasets and their Jupyter interactive notebooks among authorized community members, including know-how such as data preprocessing to extract features, hidden network design, and other engineering techniques.

Convolutional Neural Network and Data Mutation for Time Series Pattern Recognition (컨벌루션 신경망과 변종데이터를 이용한 시계열 패턴 인식)

  • Ahn, Myong-ho;Ryoo, Mi-hyeon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.727-730
    • /
    • 2016
  • TSC means classifying time series data based on pattern. Time series data is quite common data type and it has high potential in many fields, so data mining and machine learning have paid attention for long time. In traditional approach, distance and dictionary based methods are quite popular. but due to time scale and random noise problems, it has clear limitation. In this paper, we propose a novel approach to deal with these problems with CNN and data mutation. CNN is regarded as proven neural network model in image recognition, and could be applied to time series pattern recognition by extracting pattern. Data mutation is a way to generate mutated data with different methods to make CNN more robust and solid. The proposed method shows better performance than traditional approach.

  • PDF

Multiple imputation and synthetic data (다중대체와 재현자료 작성)

  • Kim, Joungyoun;Park, Min-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.83-97
    • /
    • 2019
  • As society develops, the dissemination of microdata has increased to respond to diverse analytical needs of users. Analysis of microdata for policy making, academic purposes, etc. is highly desirable in terms of value creation. However, the provision of microdata, whose usefulness is guaranteed, has a risk of exposure of personal information. Several methods have been considered to ensure the protection of personal information while ensuring the usefulness of the data. One of these methods has been studied to generate and utilize synthetic data. This paper aims to understand the synthetic data by exploring methodologies and precautions related to synthetic data. To this end, we first explain muptiple imputation, Bayesian predictive model, and Bayesian bootstrap, which are basic foundations for synthetic data. And then, we link these concepts to the construction of fully/partially synthetic data. To understand the creation of synthetic data, we review a real longitudinal synthetic data example which is based on sequential regression multivariate imputation.

Development and Accuracy Analysis of the Discharge-Supply System to Generate Hydrographs for Unsteady Flow in the Open Channel (개수로에서의 부정류 수문곡선 재현을 위한 유량공급장치의 개발 및 정확도 분석)

  • Kim, Seo-Jun;Kim, Sang-Hyuk;Yoon, Byung-Man;Ji, Un
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.8
    • /
    • pp.783-794
    • /
    • 2012
  • The analysis for unsteady flow is necessary to design the hydraulic structures affected by water level and discharge changes through time. The numerical model has been generally used for unsteady flow analysis, however it is difficult to acquire field data to calibrate and validate the numerical model. Even though it is possible to collect field data for some case, high cost and labor are required and sometimes it is considered that the confidence of measured data is very low. In this case, the experimental data for unsteady flow can be used to calibrate and validate the numerical model as an alternative. Therefore, the discharge-supply system which could generate various type of unsteady flow hydrograph was developed in this study. Also, the accuracy of the unsteady flow hydrograph generated by developed dischargesupply system in the experiment was evaluated by comparing with target hydrograph. Accuracy errors and Root Mean Square Error (RMSE) were analyzed for the rectangular-type hydrograph with sudden changes of flow, triangular-type hydrograph with short peak time, and bell-type flood hydrograph. As a result, the generating error of the discharge-supply system for the rectangular-type hydrograph was about 59% which was maximum error among various types. Also, it was represented that RMSE for the triangular-type hydrographs with single and double peaks were approximately corresponding to 10%. However, RMSE for the bell-type flood hydrograph was lower than 2%.