• 제목/요약/키워드: Preprocessed data

검색결과 188건 처리시간 0.033초

빅데이터 기반의 IoT 이상 장애 탐지 시스템 설계 (Design of Anomaly Detection System Based on Big Data in Internet of Things)

  • 나성일;김형중
    • 디지털콘텐츠학회 논문지
    • /
    • 제19권2호
    • /
    • pp.377-383
    • /
    • 2018
  • 사물인터넷(IoT) 서비스는 스마트 환경이 발전하면서 다양한 데이터를 생산하고 있다. 이 데이터는 사용자의 상황을 판단하는 중요한 데이터로 사용된다. 그렇기 때문에 센서의 이상 상태를 실시간으로 모니터링하고 이상 데이터를 탐지하는 것이 중요하다. 하지만 데이터 구조와 프로토콜이 다양하기 때문에 표준화된 데이터 구조로 변환하는 과정이 필요하다. 그럼으로써 데이터의 품질을 보장하고 정확한 분석을 통해 서비스의 품질까지 좋아지는 효과를 기대할 수 있다. 본 논문은 수집된 센서의 이상탐지를 위해 빅데이터 기반의 이상탐지 시스템을 제안한다. 제안한 시스템은 이상탐지를 위해 데이터 표준화 전처리와 시계열 기반의 이상탐지가 우수한 SVM(Support Vector Machine) 모델을 적용하였다. 실험에서는 전처리와 전처리되지 않은 데이터를 각각 학습시키고 비교하였다. 그 결과, 전처리된 데이터는 이상 장애를 정확히 탐지하고 예측하였다.

A Study on the Development of a Problem Bank in an Automated Assessment Module for Data Visualization Based on Public Data

  • HakNeung Go;Sangsu Jeong;Youngjun Lee
    • 한국컴퓨터정보학회논문지
    • /
    • 제29권5호
    • /
    • pp.203-211
    • /
    • 2024
  • 프로그래밍 언어를 활용한 데이터 시각화는 처리하는 데이터 양, 처리 시간, 유연성에서 효율성과 효과성을 향상시킬 수 있으나 프로그래밍에 익숙해지기 위해 연습이 필요하다. 이에 본 연구에서는 프로그래밍 자동 평가 시스템에서 데이터 시각화를 연습하기 위한 공공데이터 기반 문제은행을 개발하였다. 공공데이터는 교육과정에서 제시한 주제로 수집하였으며 학습자가 데이터 시각화하기에 적절한 형태로 가공하였다. 문제는 다양한 데이터 시각화 방법을 학습하기 위해 수학교육과정과 연계하여 개발하였다. 개발한 문제는 전문가 검토 및 파일럿 테스트를 실시하였으며 문항의 수준, 데이터 시각화를 통한 수학 교육의 가능성을 확인하였다. 하지만 학생에게 흥미가 떨어지는 주제라는 의견을 받았으며 이를 보완하기 위해 학생이 중심이 되는 데이터를 활용하여 추가로 문항을 개발하였다. 개발한 문제 은행은 초등학교 정보영재 또는 중학교 이상에서 파이썬을 학습한 경험이 있는 학생이 데이터 시각화를 배울 때 활용될 수 있을 것으로 기대된다.

Increasing Splicing Site Prediction by Training Gene Set Based on Species

  • Ahn, Beunguk;Abbas, Elbashir;Park, Jin-Ah;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제6권11호
    • /
    • pp.2784-2799
    • /
    • 2012
  • Biological data have been increased exponentially in recent years, and analyzing these data using data mining tools has become one of the major issues in the bioinformatics research community. This paper focuses on the protein construction process in higher organisms where the deoxyribonucleic acid, or DNA, sequence is filtered. In the process, "unmeaningful" DNA sub-sequences (called introns) are removed, and their meaningful counterparts (called exons) are retained. Accurate recognition of the boundaries between these two classes of sub-sequences, however, is known to be a difficult problem. Conventional approaches for recognizing these boundaries have sought for solely enhancing machine learning techniques, while inherent nature of the data themselves has been overlooked. In this paper we present an approach which makes use of the data attributes inherent to species in order to increase the accuracy of the boundary recognition. For experimentation, we have taken the data sets for four different species from the University of California Santa Cruz (UCSC) data repository, divided the data sets based on the species types, then trained a preprocessed version of the data sets on neural network(NN)-based and support vector machine(SVM)-based classifiers. As a result, we have observed that each species has its own specific features related to the splice sites, and that it implies there are related distances among species. To conclude, dividing the training data set based on species would increase the accuracy of predicting splicing junction and propose new insight to the biological research.

태양 에너지 수집형 IoT 엣지 컴퓨팅 환경에서 효율적인 오디오 딥러닝을 위한 에너지 적응형 데이터 전처리 기법 (Energy-Aware Data-Preprocessing Scheme for Efficient Audio Deep Learning in Solar-Powered IoT Edge Computing Environments)

  • 유연태;노동건
    • 대한임베디드공학회논문지
    • /
    • 제18권4호
    • /
    • pp.159-164
    • /
    • 2023
  • Solar energy harvesting IoT devices prioritize maximizing the utilization of collected energy due to the periodic recharging nature of solar energy, rather than minimizing energy consumption. Meanwhile, research on edge AI, which performs machine learning near the data source instead of the cloud, is actively conducted for reasons such as data confidentiality and privacy, response time, and cost. One such research area involves performing various audio AI applications using audio data collected from multiple IoT devices in an IoT edge computing environment. However, in most studies, IoT devices only perform sensing data transmission to the edge server, and all processes, including data preprocessing, are performed on the edge server. In this case, it not only leads to overload issues on the edge server but also causes network congestion by transmitting unnecessary data for learning. On the other way, if data preprocessing is delegated to each IoT device to address this issue, it leads to another problem of increased blackout time due to energy shortages in the devices. In this paper, we aim to alleviate the problem of increased blackout time in devices while mitigating issues in server-centric edge AI environments by determining where the data preprocessed based on the energy state of each IoT device. In the proposed method, IoT devices only perform the preprocessing process, which includes sound discrimination and noise removal, and transmit to the server if there is more energy available than the energy threshold required for the basic operation of the device.

전처리 방법과 인공지능 모델 차이에 따른 대전과 부산의 태양광 발전량 예측성능 비교: 기상관측자료와 예보자료를 이용하여 (Comparison of Solar Power Generation Forecasting Performance in Daejeon and Busan Based on Preprocessing Methods and Artificial Intelligence Techniques: Using Meteorological Observation and Forecast Data)

  • 심채연;백경민;박현수;박종연
    • 대기
    • /
    • 제34권2호
    • /
    • pp.177-185
    • /
    • 2024
  • As increasing global interest in renewable energy due to the ongoing climate crisis, there is a growing need for efficient technologies to manage such resources. This study focuses on the predictive skill of daily solar power generation using weather observation and forecast data. Meteorological data from the Korea Meteorological Administration and solar power generation data from the Korea Power Exchange were utilized for the period from January 2017 to May 2023, considering both inland (Daejeon) and coastal (Busan) regions. Temperature, wind speed, relative humidity, and precipitation were selected as relevant meteorological variables for solar power prediction. All data was preprocessed by removing their systematic components to use only their residuals and the residual of solar data were further processed with weighted adjustments for homoscedasticity. Four models, MLR (Multiple Linear Regression), RF (Random Forest), DNN (Deep Neural Network), and RNN (Recurrent Neural Network), were employed for solar power prediction and their performances were evaluated based on predicted values utilizing observed meteorological data (used as a reference), 1-day-ahead forecast data (referred to as fore1), and 2-day-ahead forecast data (fore2). DNN-based prediction model exhibits superior performance in both regions, with RNN performing the least effectively. However, MLR and RF demonstrate competitive performance comparable to DNN. The disparities in the performance of the four different models are less pronounced than anticipated, underscoring the pivotal role of fitting models using residuals. This emphasizes that the utilized preprocessing approach, specifically leveraging residuals, is poised to play a crucial role in the future of solar power generation forecasting.

웨이브렛과 신경 회로망을 이용한 EEG의 간질 파형 검출 (Detection of epileptiform activities in the EEG using wavelet and neural network)

  • 박현석;이두수;김선일
    • 전자공학회논문지S
    • /
    • 제35S권2호
    • /
    • pp.70-78
    • /
    • 1998
  • Spike detection in long-term EEG monitoring forepilepsy by wavelet transform(WT), artificial neural network(ANN) and the expert system is presented. First, a small set of wavelet coefficients is used to represent the characteristics of a singlechannel epileptic spikes and normal activities. In this stage, two parameters are also extracted from the relation between EEG activities before the spike event and EEG activities with the spike. then, three-layer feed-forward network employing the error back propagation algorithm is trained and tested using parameters obtained from the first stage. Spikes are identified in individual EEG channels by 16 identical neural networks. Finally, 16-channel expert system based on the context information of adjacent channels is introducedto yield more reliable results and reject artifacts. In this study, epileptic spikes and normal activities are selected from 32 patient's EEG in consensus among experts. The result showed that the WT reduced data input size and the preprocessed ANN had more accuracy than that of ANN with the same input size of raw data. Ina clinical test, our expert rule system was capable of rejecting artifacts commonly found in EEG recodings.

  • PDF

투과 스펙트럼을 이용한 토마토 수확 후 저장일자 예측모형 개발 (Development of Prediction Model to Estimate the Storage Days of Tomato Using Transmittance Spectrum)

  • 김영태;서상룡
    • Journal of Biosystems Engineering
    • /
    • 제33권5호
    • /
    • pp.309-316
    • /
    • 2008
  • The goal of this study was to develop prediction models to estimate the storage days of tomato. The transmittance spectral data measured on tomato were preprocessed through normalization, SNV, Savitzky-Golay, and Norris Gap and then were used to build the prediction models using partial least square (PLS) method. For the experiments, the tomato samples of different varieties were collected at different harvest time. The samples were taken right after harvest from the field and then were stored in a low-temperature storage room in which room temperature was maintained at $10^{\circ}C$. The transmittance spectral data of the tomato samples were measured at three-day intervals for 16 days. The performance of the prediction models was affected by the preprocessing techniques as well as the varieties and harvest time of the tomato. The best model was found when SNV was applied. The accuracy of the best model was 90.2%. It can be concluded that the transmittance spectra are useful information for predicting the period of storage of tomato.

머신 러닝 회귀 방안을 이용한 인공지지체 기공 크기 예측모델 성능에 관한 연구 (A Study on Prediction Model Performance of Scaffold Pore Size Using Machine Learning Regression Method)

  • 이송연;허용정
    • 반도체디스플레이기술학회지
    • /
    • 제19권1호
    • /
    • pp.36-41
    • /
    • 2020
  • In this paper, We need to change all print factors when which print scaffold with 400 ㎛ pore using FDM 3d printer. Therefore the print quantity is 10 billion times, So we are difficult to print on workplace. To solve the problem, we used the prediction model based machine learning regression. We preprocessed and learned the securing print condition data, and we produced different kinds of prediction models. We predicted the pore size of scaffolds not securing with new print condition data using prediction models. We have derived the print conditions that satisfy the pore size of 400 ㎛ among the predicted print conditions of pore size. We printed the scaffolds 5 times on the condition. We measured the pore size of the printed scaffold and compared the average pore size with the predicted pore size. We confirmed that error was less than 1%, and we were identify the model with the highest pore size prediction performance of scaffold.

Malaria Epidemic Prediction Model by Using Twitter Data and Precipitation Volume in Nigeria

  • Nduwayezu, Maurice;Satyabrata, Aicha;Han, Suk Young;Kim, Jung Eon;Kim, Hoon;Park, Junseok;Hwang, Won-Joo
    • 한국멀티미디어학회논문지
    • /
    • 제22권5호
    • /
    • pp.588-600
    • /
    • 2019
  • Each year Malaria affects over 200 million people worldwide. Particularly, African continent is highly hit by this disease. According to many researches, this continent is ideal for Anopheles mosquitoes which transmit Malaria parasites to thrive. Rainfall volume is one of the major factor favoring the development of these Anopheles in the tropical Sub-Sahara Africa (SSA). However, the surveillance, monitoring and reporting of this epidemic is still poor and bureaucratic only. In our paper, we proposed a method to fast monitor and report Malaria instances by using Social Network Systems (SNS) and precipitation volume in Nigeria. We used Twitter search Application Programming Interface (API) to live-stream Twitter messages mentioning Malaria, preprocessed those Tweets and classified them into Malaria cases in Nigeria by using Support Vector Machine (SVM) classification algorithm and compared those Malaria cases with average precipitation volume. The comparison yielded a correlation of 0.75 between Malaria cases recorded by using Twitter and average precipitations in Nigeria. To ensure the certainty of our classification algorithm, we used an oversampling technique and eliminated the imbalance in our training Tweets.

머신러닝을 이용한 이러닝 학습자 집중도 평가 연구 (A Study on Evaluation of e-learners' Concentration by using Machine Learning)

  • 정영상;주민성;조남욱
    • 디지털산업정보학회논문지
    • /
    • 제18권4호
    • /
    • pp.67-75
    • /
    • 2022
  • Recently, e-learning has been attracting significant attention due to COVID-19. However, while e-learning has many advantages, it has disadvantages as well. One of the main disadvantages of e-learning is that it is difficult for teachers to continuously and systematically monitor learners. Although services such as personalized e-learning are provided to compensate for the shortcoming, systematic monitoring of learners' concentration is insufficient. This study suggests a method to evaluate the learner's concentration by applying machine learning techniques. In this study, emotion and gaze data were extracted from 184 videos of 92 participants. First, the learners' concentration was labeled by experts. Then, statistical-based status indicators were preprocessed from the data. Random Forests (RF), Support Vector Machines (SVMs), Multilayer Perceptron (MLP), and an ensemble model have been used in the experiment. Long Short-Term Memory (LSTM) has also been used for comparison. As a result, it was possible to predict e-learners' concentration with an accuracy of 90.54%. This study is expected to improve learners' immersion by providing a customized educational curriculum according to the learner's concentration level.