• 제목/요약/키워드: time series cross-validation

검색결과 29건 처리시간 0.022초

시계열 교차검증을 적용한 2,3-BDO 분리공정 온도예측 모델의 초매개변수 최적화 (Application of Time-series Cross Validation in Hyperparameter Tuning of a Predictive Model for 2,3-BDO Distillation Process)

  • 안나현;최영렬;조형태;김정환
    • Korean Chemical Engineering Research
    • /
    • 제59권4호
    • /
    • pp.532-541
    • /
    • 2021
  • 최근 인공지능에 대한 관심이 높아짐에 따라 화학공정분야에서도 인공지능을 활용한 연구가 많아지고 있다. 그러나 인공지능 기반 모델이 충분히 일반화되지 않아 학습에 이용되지 않은 새로운 데이터에 대한 예측률이 떨어지는 과적합 현상이 빈번하게 일어나고 있으며, 교차검증은 과적합을 해결하는 방법 중 하나이다. 본 연구에서는 2,3-BDO 분리 공정 온도 예측 모델의 초매개변수 중에서 배치 개수와 반복횟수를 조정하기 위해 시계열 교차검증을 적용하고 일반적으로 사용되는 K 겹 교차검증과 비교하였다. 결과적으로 K 겹 교차검증을 사용했을 때 보다 시계열 교차검증 방식을 사용했을 때 MAPE는 0.61% 증가한 반면 RMSE는 9.06% 감소하였고 학습 시간은 198.29초 적게 소요되었다.

자동기계학습 TPOT 기반 저수위 예측 정확도 향상을 위한 시계열 교차검증 기법 연구 (A Study on Time Series Cross-Validation Techniques for Enhancing the Accuracy of Reservoir Water Level Prediction Using Automated Machine Learning TPOT)

  • 배주현;박운지;이서로;박태선;박상빈;김종건;임경재
    • 한국농공학회논문집
    • /
    • 제66권1호
    • /
    • pp.1-13
    • /
    • 2024
  • This study assessed the efficacy of improving the accuracy of reservoir water level prediction models by employing automated machine learning models and efficient cross-validation methods for time-series data. Considering the inherent complexity and non-linearity of time-series data related to reservoir water levels, we proposed an optimized approach for model selection and training. The performance of twelve models was evaluated for the Obong Reservoir in Gangneung, Gangwon Province, using the TPOT (Tree-based Pipeline Optimization Tool) and four cross-validation methods, which led to the determination of the optimal pipeline model. The pipeline model consisting of Extra Tree, Stacking Ridge Regression, and Simple Ridge Regression showed outstanding predictive performance for both training and test data, with an R2 (Coefficient of determination) and NSE (Nash-Sutcliffe Efficiency) exceeding 0.93. On the other hand, for predictions of water levels 12 hours later, the pipeline model selected through time-series split cross-validation accurately captured the change pattern of time-series water level data during the test period, with an NSE exceeding 0.99. The methodology proposed in this study is expected to greatly contribute to the efficient generation of reservoir water level predictions in regions with high rainfall variability.

Solar radiation forecasting using boosting decision tree and recurrent neural networks

  • Hyojeoung, Kim;Sujin, Park;Sahm, Kim
    • Communications for Statistical Applications and Methods
    • /
    • 제29권6호
    • /
    • pp.709-719
    • /
    • 2022
  • Recently, as the importance of environmental protection has emerged, interest in new and renewable energy is also increasing worldwide. In particular, the solar energy sector accounts for the highest production rate among new and renewable energy in Korea due to its infinite resources, easy installation and maintenance, and eco-friendly characteristics such as low noise emission levels and less pollutants during power generation. However, although climate prediction is essential since solar power is affected by weather and climate change, solar radiation, which is closely related to solar power, is not currently forecasted by the Korea Meteorological Administration. Solar radiation prediction can be the basis for establishing a reasonable new and renewable energy operation plan, and it is very important because it can be used not only in solar power but also in other fields such as power consumption prediction. Therefore, this study was conducted for the purpose of improving the accuracy of solar radiation. Solar radiation was predicted by a total of three weather variables, temperature, humidity, and cloudiness, and solar radiation outside the atmosphere, and the results were compared using various models. The CatBoost model was best obtained by fitting and comparing the Boosting series (XGB, CatBoost) and RNN series (Simple RNN, LSTM, GRU) models. In addition, the results were further improved through Time series cross-validation.

유사 시계열 데이터 분석에 기반을 둔 교육기관의 전력 사용량 예측 기법 (Power Consumption Forecasting Scheme for Educational Institutions Based on Analysis of Similar Time Series Data)

  • 문지훈;박진웅;한상훈;황인준
    • 정보과학회 논문지
    • /
    • 제44권9호
    • /
    • pp.954-965
    • /
    • 2017
  • 안정적인 전력 공급은 전력 인프라의 유지 보수 및 작동에 매우 중요하며, 이를 위해 정확한 전력 사용량 예측이 요구된다. 대학 캠퍼스는 전력 사용량이 많은 곳이며, 시간과 환경에 따른 전력 사용량 변화폭이 다양하다. 이러한 이유로, 전력계통의 효율적인 운영을 위해서는 전력 사용량을 정확하게 예측할 수 있는 모델이 요구된다. 기존의 시계열 예측 기법은 학습 시점과 예측 시점 간의 차이가 클수록 예측 구간이 넓어짐으로 예측 성능이 크게 떨어진다는 단점이 있다. 본 논문은 이를 보완하려는 방안으로, 먼저 의사결정나무를 이용해 날짜, 요일, 공휴일 여부, 학기 등을 고려하여 시계열 형태가 유사한 전력 데이터를 분류한다. 다음으로 분류된 데이터 셋에 각각의 자기회귀누적이동평균모형을 구성하여, 예측 시점에서 시계열 교차검증을 적용해 대학 캠퍼스의 일간 전력 사용량 예측 기법을 제안한다. 예측의 정확성을 평가하기 위해, 성능 평가 지표를 이용하여 제안한 기법의 타당성을 검증하였다.

엘만 순환 신경망을 사용한 전력 에너지 시계열의 예측 및 분석 (The Prediction and Analysis of the Power Energy Time Series by Using the Elman Recurrent Neural Network)

  • 이창용;김진호
    • 산업경영시스템학회지
    • /
    • 제41권1호
    • /
    • pp.84-93
    • /
    • 2018
  • In this paper, we propose an Elman recurrent neural network to predict and analyze a time series of power energy consumption. To this end, we consider the volatility of the time series and apply the sample variance and the detrended fluctuation analyses to the volatilities. We demonstrate that there exists a correlation in the time series of the volatilities, which suggests that the power consumption time series contain a non-negligible amount of the non-linear correlation. Based on this finding, we adopt the Elman recurrent neural network as the model for the prediction of the power consumption. As the simplest form of the recurrent network, the Elman network is designed to learn sequential or time-varying pattern and could predict learned series of values. The Elman network has a layer of "context units" in addition to a standard feedforward network. By adjusting two parameters in the model and performing the cross validation, we demonstrated that the proposed model predicts the power consumption with the relative errors and the average errors in the range of 2%~5% and 3kWh~8kWh, respectively. To further confirm the experimental results, we performed two types of the cross validations designed for the time series data. We also support the validity of the model by analyzing the multi-step forecasting. We found that the prediction errors tend to be saturated although they increase as the prediction time step increases. The results of this study can be used to the energy management system in terms of the effective control of the cross usage of the electric and the gas energies.

Spatial-Temporal Modelling of Road Traffic Data in Seoul City

  • 이상열;안수한;박창이;전종우
    • Journal of the Korean Data and Information Science Society
    • /
    • 제13권2호
    • /
    • pp.261-270
    • /
    • 2002
  • Recently, the demand of the Intelligent Transportation System(ITS) has been increased to a large extent, and a real-time traffic information service based on the internet system became very important. When ITS companies carry out real-time traffic services, they find some traffic data missing, and use the conventional method of reconstructing missing values by calculating average time trend. However, the method is found unsatisfactory, so that we develop a new method based the spatial and spatial-temporal models. A cross-validation technique shows that the spatial-temporal model outperforms the others.

  • PDF

Time Series Classification of Cryptocurrency Price Trend Based on a Recurrent LSTM Neural Network

  • Kwon, Do-Hyung;Kim, Ju-Bong;Heo, Ju-Sung;Kim, Chan-Myung;Han, Youn-Hee
    • Journal of Information Processing Systems
    • /
    • 제15권3호
    • /
    • pp.694-706
    • /
    • 2019
  • In this study, we applied the long short-term memory (LSTM) model to classify the cryptocurrency price time series. We collected historic cryptocurrency price time series data and preprocessed them in order to make them clean for use as train and target data. After such preprocessing, the price time series data were systematically encoded into the three-dimensional price tensor representing the past price changes of cryptocurrencies. We also presented our LSTM model structure as well as how to use such price tensor as input data of the LSTM model. In particular, a grid search-based k-fold cross-validation technique was applied to find the most suitable LSTM model parameters. Lastly, through the comparison of the f1-score values, our study showed that the LSTM model outperforms the gradient boosting model, a general machine learning model known to have relatively good prediction performance, for the time series classification of the cryptocurrency price trend. With the LSTM model, we got a performance improvement of about 7% compared to using the GB model.

스태킹 앙상블 기법을 활용한 고속도로 교통정보 예측모델 개발 및 교차검증에 따른 성능 비교 (Development of Highway Traffic Information Prediction Models Using the Stacking Ensemble Technique Based on Cross-validation)

  • 이요셉;오석진;김예진;박성호;윤일수
    • 한국ITS학회 논문지
    • /
    • 제22권6호
    • /
    • pp.1-16
    • /
    • 2023
  • 정확도가 높은 교통정보 예측은 지능형교통체계(intelligent transport systems, ITS)를 통한 교통 시설 이용자들의 혼잡 경로 회피 안내 등에서 활용되는 중요한 기능이다. 정확한 교통정보예측을 위해 다양한 딥러닝 모델들이 발전되어 왔다. 최근에는 앙상블 기법을 활용하여 다양한 모델들의 장단점을 결합하여 예측 정확도와 안정성을 높이고 있다. 따라서, 본 연구에서는 다양한 딥러닝 모델들을 활용하여 교통정보 예측 모델을 개발하였으며, 개발된 딥러닝 모델들을 스태킹 앙상블(stacking ensemble)하여 성능을 개선하였다. 개별 모델들은 교통량 예측에서 10% 이내의 오차율을, 속도 예측에서 3% 이내의 오차율을 보였다. 앙상블 모델은 교차검증을 수행하지 않았을 때, 타 모델과 비교하여 더욱 높은 정확도를 보였다. 교차검증을 수행한 앙상블 모델은 장기예측에서 타 모델보다 균일한 오차율을 보이는 것으로 나타났다.

Airline In-flight Meal Demand Forecasting with Neural Networks and Time Series Models

  • Lee, Young-Chan
    • 한국정보시스템학회:학술대회논문집
    • /
    • 한국정보시스템학회 2000년도 추계학술대회
    • /
    • pp.36-44
    • /
    • 2000
  • The purpose of this study is to introduce a more efficient forecasting technique, which could help result the reduction of cost in removing the waste of airline in-flight meals. We will use a neural network approach known to many researchers as the “Outstanding Forecasting Technique”. We employed a multi-layer perceptron neural network using a backpropagation algorithm. We also suggested using other related information to improve the forecasting performances of neural networks. We divided the data into three sets, which are training data set, cross validation data set, and test data set. Time lag variables are still employed in our model according to the general view of time series forecasting. We measured the accuracy of our model by “Mean Square Error”(MSE). The suggested model proved most excellent in serving economy class in-flight meals. Forecasting the exact amount of meals needed for each airline could reduce the waste of meals and therefore, lead to the reduction of cost. Better yet, it could enhance the cost competition of each airline, keep the schedules on time, and lead to better service.

  • PDF

비선형 분리모형에 의한 증발접시 증발량의 해석 (Pan Evaporation Analysis using Nonlinear Disaggregation Model)

  • 김성원;김정헌;박기범
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2008년도 학술발표회 논문집
    • /
    • pp.1147-1150
    • /
    • 2008
  • The goal of this research is to apply the neural networks models for the disaggregation of the pan evaporation (PE) data, Republic of Korea. The neural networks models consist of the support vector machines neural networks model (SVM-NNM) and multilayer perceptron neural networks model (MLP-NNM), respectively. The SVM-NNM in time series modeling is relatively new and it is more problematic in comparison with classifications. In this study, The disaggregation means that the yearly PE data divides into the monthly PE data. And, for the performances of the neural networks models, they are composed of training, cross validation, and testing data, respectively. From this research, we evaluate the impact of the SVM-NNM and the MLP-NNM for the disaggregation of the nonlinear time series data. We should, furthermore, construct the credible data of the monthly PE data from the disaggregation of the yearly PE data, and can suggest the methodology for the irrigation and drainage networks system.

  • PDF