• Title/Summary/Keyword: time series cross-validation

Search Result 29, Processing Time 0.022 seconds

Hierarchical Smoothing Technique by Empirical Mode Decomposition (경험적 모드분해법에 기초한 계층적 평활방법)

  • Kim Dong-Hoh;Oh Hee-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.319-330
    • /
    • 2006
  • A signal in real world usually composes of multiple signals having different scales of frequencies. For example sun-spot data is fluctuated over 11 year and 85 year. Economic data is supposed to be compound of seasonal component, cyclic component and long-term trend. Decomposition of the signal is one of the main topics in time series analysis. However when the signal is subject to nonstationarity, traditional time series analysis such as spectral analysis is not suitable. Huang et. at(1998) proposed data-adaptive method called empirical mode decomposition (EMD) . Due to its robustness to nonstationarity, EMD has been applied to various fields. Huang et. at, however, have not considered denoising when data is contaminated by error. In this paper we propose efficient denoising method utilizing cross-validation.

Fault Diagnosis of Bearing Based on Convolutional Neural Network Using Multi-Domain Features

  • Shao, Xiaorui;Wang, Lijiang;Kim, Chang Soo;Ra, Ilkyeun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1610-1629
    • /
    • 2021
  • Failures frequently occurred in manufacturing machines due to complex and changeable manufacturing environments, increasing the downtime and maintenance costs. This manuscript develops a novel deep learning-based method named Multi-Domain Convolutional Neural Network (MDCNN) to deal with this challenging task with vibration signals. The proposed MDCNN consists of time-domain, frequency-domain, and statistical-domain feature channels. The Time-domain channel is to model the hidden patterns of signals in the time domain. The frequency-domain channel uses Discrete Wavelet Transformation (DWT) to obtain the rich feature representations of signals in the frequency domain. The statistic-domain channel contains six statistical variables, which is to reflect the signals' macro statistical-domain features, respectively. Firstly, in the proposed MDCNN, time-domain and frequency-domain channels are processed by CNN individually with various filters. Secondly, the CNN extracted features from time, and frequency domains are merged as time-frequency features. Lastly, time-frequency domain features are fused with six statistical variables as the comprehensive features for identifying the fault. Thereby, the proposed method could make full use of those three domain-features for fault diagnosis while keeping high distinguishability due to CNN's utilization. The authors designed massive experiments with 10-folder cross-validation technology to validate the proposed method's effectiveness on the CWRU bearing data set. The experimental results are calculated by ten-time averaged accuracy. They have confirmed that the proposed MDCNN could intelligently, accurately, and timely detect the fault under the complex manufacturing environments, whose accuracy is nearly 100%.

A SVR Based-Pseudo Modified Einstein Procedure Incorporating H-ADCP Model for Real-Time Total Sediment Discharge Monitoring (실시간 총유사량 모니터링을 위한 H-ADCP 연계 수정 아인슈타인 방법의 의사 SVR 모형)

  • Noh, Hyoseob;Son, Geunsoo;Kim, Dongsu;Park, Yong Sung
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.321-335
    • /
    • 2023
  • Monitoring sediment loads in natural rivers is the key process in river engineering, but it is costly and dangerous. In practice, suspended loads are directly measured, and total loads, which is a summation of suspended loads and bed loads, are estimated. This study proposes a real-time sediment discharge monitoring system using the horizontal acoustic Doppler current profiler (H-ADCP) and support vector regression (SVR). The proposed system is comprised of the SVR model for suspended sediment concentration (SVR-SSC) and for total loads (SVR-QTL), respectively. SVR-SSC estimates SSC and SVR-QTL mimics the modified Einstein procedure. The grid search with K-fold cross validation (Grid-CV) and the recursive feature elimination (RFE) were employed to determine SVR's hyperparameters and input variables. The two SVR models showed reasonable cross-validation scores (R2) with 0.885 (SVR-SSC) and 0.860 (SVR-QTL). During the time-series sediment load monitoring period, we successfully detected various sediment transport phenomena in natural streams, such as hysteresis loops and sensitive sediment fluctuations. The newly proposed sediment monitoring system depends only on the gauged features by H-ADCP without additional assumptions in hydraulic variables (e.g., friction slope and suspended sediment size distribution). This method can be applied to any ADCP-installed discharge monitoring station economically and is expected to enhance temporal resolution in sediment monitoring.

Estimating GARCH models using kernel machine learning (커널기계 기법을 이용한 일반화 이분산자기회귀모형 추정)

  • Hwang, Chang-Ha;Shin, Sa-Im
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.419-425
    • /
    • 2010
  • Kernel machine learning is gaining a lot of popularities in analyzing large or high dimensional nonlinear data. We use this technique to estimate a GARCH model for predicting the conditional volatility of stock market returns. GARCH models are usually estimated using maximum likelihood (ML) procedures, assuming that the data are normally distributed. In this paper, we show that GARCH models can be estimated using kernel machine learning and that kernel machine has a higher predicting ability than ML methods and support vector machine, when estimating volatility of financial time series data with fat tail.

An Electric Load Forecasting Scheme with High Time Resolution Based on Artificial Neural Network (인공 신경망 기반의 고시간 해상도를 갖는 전력수요 예측기법)

  • Park, Jinwoong;Moon, Jihoon;Hwang, Eenjun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.527-536
    • /
    • 2017
  • With the recent development of smart grid industry, the necessity for efficient EMS(Energy Management System) has been increased. In particular, in order to reduce electric load and energy cost, sophisticated electric load forecasting and efficient smart grid operation strategy are required. In this paper, for more accurate electric load forecasting, we extend the data collected at demand time into high time resolution and construct an artificial neural network-based forecasting model appropriate for the high time resolution data. Furthermore, to improve the accuracy of electric load forecasting, time series data of sequence form are transformed into continuous data of two-dimensional space to solve that problem that machine learning methods cannot reflect the periodicity of time series data. In addition, to consider external factors such as temperature and humidity in accordance with the time resolution, we estimate their value at the time resolution using linear interpolation method. Finally, we apply the PCA(Principal Component Analysis) algorithm to the feature vector composed of external factors to remove data which have little correlation with the power data. Finally, we perform the evaluation of our model through 5-fold cross-validation. The results show that forecasting based on higher time resolution improve the accuracy and the best error rate of 3.71% was achieved at the 3-min resolution.

A ResNet based multiscale feature extraction for classifying multi-variate medical time series

  • Zhu, Junke;Sun, Le;Wang, Yilin;Subramani, Sudha;Peng, Dandan;Nicolas, Shangwe Charmant
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.5
    • /
    • pp.1431-1445
    • /
    • 2022
  • We construct a deep neural network model named ECGResNet. This model can diagnosis diseases based on 12-lead ECG data of eight common cardiovascular diseases with a high accuracy. We chose the 16 Blocks of ResNet50 as the main body of the model and added the Squeeze-and-Excitation module to learn the data information between channels adaptively. We modified the first convolutional layer of ResNet50 which has a convolutional kernel of 7 to a superposition of convolutional kernels of 8 and 16 as our feature extraction method. This way allows the model to focus on the overall trend of the ECG signal while also noticing subtle changes. The model further improves the accuracy of cardiovascular and cerebrovascular disease classification by using a fully connected layer that integrates factors such as gender and age. The ECGResNet model adds Dropout layers to both the residual block and SE module of ResNet50, further avoiding the phenomenon of model overfitting. The model was eventually trained using a five-fold cross-validation and Flooding training method, with an accuracy of 95% on the test set and an F1-score of 0.841.We design a new deep neural network, innovate a multi-scale feature extraction method, and apply the SE module to extract features of ECG data.

Effect of Ambient Air Pollution on Years of Life Lost from Deaths due to Injury in Seoul, South Korea (대기오염물질이 손상으로 인한 손실수명연수에 미치는 영향: 서울특별시를 중심으로)

  • Sun-Woo Kang;Subin Jeong;Hyewon Lee
    • Journal of Environmental Health Sciences
    • /
    • v.49 no.3
    • /
    • pp.149-158
    • /
    • 2023
  • Background: Injury is one of the major health problems in South Korea. Few studies have evaluated both intentional and unintentional injury when investigating the association between exposure to air pollutants and injury. Objectives: We aimed to explore the association between short-term exposure to ambient air pollution and years of life lost (YLLs) due to injury. Methods: Data on daily YLLs for 2002~2019 were obtained from the the Death Statistics Database of the Korean National Statistical Office. This study estimated short-term exposure to particulate matter with an aerodynamic diameter of <10 ㎛ (PM10), particulate matter with an aerodynamic diameter of <2.5 ㎛ (PM2.5), sulfur dioxide (SO2), nitrogen dioxide (NO2), carbon monoxide (CO), and ozone (O3). This time series study was conducted using a generalized additive model (GAM) assuming a Gaussian distribution. We also evaluated a delayed effect of ambient air pollution by constructing a lag structure up to seven days. The best-fitting lag was selected based on smallest generalized cross validation (GCV) value. To explore effect modification by intentionality of injury (i.e., intentional injury [self-harm, assault] and unintentional injury), we conducted stratified subgroup analyses. Additionally, we stratified unintentional injury by mechanism (traffic accident, fall, etc.). Results: During the study period, the average daily YLLs due to injury was 307.5 years. In the intentional injury, YLLs due to self-harm and assault showed positive association with air pollutants. In the unintentional injury, YLLs due to fall, electric current, fire and poisoning showed positive association with air pollutants, whereas YLLs due to traffic accident, mechanical force and drowning/submersion showed negative associations with air pollutants. Conclusions: Injury is recognized as preventable, and effective strategies to create a safe society are important. Therefore, we need to establish strategies to prevent injury and consider air pollutants in this regard.

Satellite-Based Cabbage and Radish Yield Prediction Using Deep Learning in Kangwon-do (딥러닝을 활용한 위성영상 기반의 강원도 지역의 배추와 무 수확량 예측)

  • Hyebin Park;Yejin Lee;Seonyoung Park
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.1031-1042
    • /
    • 2023
  • In this study, a deep learning model was developed to predict the yield of cabbage and radish, one of the five major supply and demand management vegetables, using satellite images of Landsat 8. To predict the yield of cabbage and radish in Gangwon-do from 2015 to 2020, satellite images from June to September, the growing period of cabbage and radish, were used. Normalized difference vegetation index, enhanced vegetation index, lead area index, and land surface temperature were employed in this study as input data for the yield model. Crop yields can be effectively predicted using satellite images because satellites collect continuous spatiotemporal data on the global environment. Based on the model developed previous study, a model designed for input data was proposed in this study. Using time series satellite images, convolutional neural network, a deep learning model, was used to predict crop yield. Landsat 8 provides images every 16 days, but it is difficult to acquire images especially in summer due to the influence of weather such as clouds. As a result, yield prediction was conducted by splitting June to July into one part and August to September into two. Yield prediction was performed using a machine learning approach and reference models , and modeling performance was compared. The model's performance and early predictability were assessed using year-by-year cross-validation and early prediction. The findings of this study could be applied as basic studies to predict the yield of field crops in Korea.

Application of groundwater-level prediction models using data-based learning algorithms to National Groundwater Monitoring Network data (자료기반 학습 알고리즘을 이용한 지하수위 변동 예측 모델의 국가지하수관측망 자료 적용에 대한 비교 평가 연구)

  • Yoon, Heesung;Kim, Yongcheol;Ha, Kyoochul;Kim, Gyoo-Bum
    • The Journal of Engineering Geology
    • /
    • v.23 no.2
    • /
    • pp.137-147
    • /
    • 2013
  • For the effective management of groundwater resources, it is necessary to predict groundwater level fluctuations in response to rainfall events. In the present study, time series models using artificial neural networks (ANNs) and support vector machines (SVMs) have been developed and applied to groundwater level data from the Gasan, Shingwang, and Cheongseong stations of the National Groundwater Monitoring Network. We designed four types of model according to input structure and compared their performances. The results show that the rainfall input model is not effective, especially for the prediction of groundwater recession behavior; however, the rainfall-groundwater input model is effective for the entire prediction stage, yielding a high model accuracy. Recursive prediction models were also effective, yielding correlation coefficients of 0.75-0.95 with observed values. The prediction errors were highest for Shingwang station, where the cross-correlation coefficient is lowest among the stations. Overall, the model performance of SVM models was slightly higher than that of ANN models for all cases. Assessment of the model parameter uncertainty of the recursive prediction models, using the ratio of errors in the validation stage to that in the calibration stage, showed that the range of the ratio is much narrower for the SVM models than for the ANN models, which implies that the SVM models are more stable and effective for the present case studies.