• Title/Summary/Keyword: time series cross-validation

Search Result 29, Processing Time 0.024 seconds

Application of Time-series Cross Validation in Hyperparameter Tuning of a Predictive Model for 2,3-BDO Distillation Process (시계열 교차검증을 적용한 2,3-BDO 분리공정 온도예측 모델의 초매개변수 최적화)

  • An, Nahyeon;Choi, Yeongryeol;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.59 no.4
    • /
    • pp.532-541
    • /
    • 2021
  • Recently, research on the application of artificial intelligence in the chemical process has been increasing rapidly. However, overfitting is a significant problem that prevents the model from being generalized well to predict unseen data on test data, as well as observed training data. Cross validation is one of the ways to solve the overfitting problem. In this study, the time-series cross validation method was applied to optimize the number of batch and epoch in the hyperparameters of the prediction model for the 2,3-BDO distillation process, and it compared with K-fold cross validation generally used. As a result, the RMSE of the model with time-series cross validation was lower by 9.06%, and the MAPE was higher by 0.61% than the model with K-fold cross validation. Also, the calculation time was 198.29 sec less than the K-fold cross validation method.

A Study on Time Series Cross-Validation Techniques for Enhancing the Accuracy of Reservoir Water Level Prediction Using Automated Machine Learning TPOT (자동기계학습 TPOT 기반 저수위 예측 정확도 향상을 위한 시계열 교차검증 기법 연구)

  • Bae, Joo-Hyun;Park, Woon-Ji;Lee, Seoro;Park, Tae-Seon;Park, Sang-Bin;Kim, Jonggun;Lim, Kyoung-Jae
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.66 no.1
    • /
    • pp.1-13
    • /
    • 2024
  • This study assessed the efficacy of improving the accuracy of reservoir water level prediction models by employing automated machine learning models and efficient cross-validation methods for time-series data. Considering the inherent complexity and non-linearity of time-series data related to reservoir water levels, we proposed an optimized approach for model selection and training. The performance of twelve models was evaluated for the Obong Reservoir in Gangneung, Gangwon Province, using the TPOT (Tree-based Pipeline Optimization Tool) and four cross-validation methods, which led to the determination of the optimal pipeline model. The pipeline model consisting of Extra Tree, Stacking Ridge Regression, and Simple Ridge Regression showed outstanding predictive performance for both training and test data, with an R2 (Coefficient of determination) and NSE (Nash-Sutcliffe Efficiency) exceeding 0.93. On the other hand, for predictions of water levels 12 hours later, the pipeline model selected through time-series split cross-validation accurately captured the change pattern of time-series water level data during the test period, with an NSE exceeding 0.99. The methodology proposed in this study is expected to greatly contribute to the efficient generation of reservoir water level predictions in regions with high rainfall variability.

Solar radiation forecasting using boosting decision tree and recurrent neural networks

  • Hyojeoung, Kim;Sujin, Park;Sahm, Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.709-719
    • /
    • 2022
  • Recently, as the importance of environmental protection has emerged, interest in new and renewable energy is also increasing worldwide. In particular, the solar energy sector accounts for the highest production rate among new and renewable energy in Korea due to its infinite resources, easy installation and maintenance, and eco-friendly characteristics such as low noise emission levels and less pollutants during power generation. However, although climate prediction is essential since solar power is affected by weather and climate change, solar radiation, which is closely related to solar power, is not currently forecasted by the Korea Meteorological Administration. Solar radiation prediction can be the basis for establishing a reasonable new and renewable energy operation plan, and it is very important because it can be used not only in solar power but also in other fields such as power consumption prediction. Therefore, this study was conducted for the purpose of improving the accuracy of solar radiation. Solar radiation was predicted by a total of three weather variables, temperature, humidity, and cloudiness, and solar radiation outside the atmosphere, and the results were compared using various models. The CatBoost model was best obtained by fitting and comparing the Boosting series (XGB, CatBoost) and RNN series (Simple RNN, LSTM, GRU) models. In addition, the results were further improved through Time series cross-validation.

Power Consumption Forecasting Scheme for Educational Institutions Based on Analysis of Similar Time Series Data (유사 시계열 데이터 분석에 기반을 둔 교육기관의 전력 사용량 예측 기법)

  • Moon, Jihoon;Park, Jinwoong;Han, Sanghoon;Hwang, Eenjun
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.954-965
    • /
    • 2017
  • A stable power supply is very important for the maintenance and operation of the power infrastructure. Accurate power consumption prediction is therefore needed. In particular, a university campus is an institution with one of the highest power consumptions and tends to have a wide variation of electrical load depending on time and environment. For this reason, a model that can accurately predict power consumption is required for the effective operation of the power system. The disadvantage of the existing time series prediction technique is that the prediction performance is greatly degraded because the width of the prediction interval increases as the difference between the learning time and the prediction time increases. In this paper, we first classify power data with similar time series patterns considering the date, day of the week, holiday, and semester. Next, each ARIMA model is constructed based on the classified data set and a daily power consumption forecasting method of the university campus is proposed through the time series cross-validation of the predicted time. In order to evaluate the accuracy of the prediction, we confirmed the validity of the proposed method by applying performance indicators.

The Prediction and Analysis of the Power Energy Time Series by Using the Elman Recurrent Neural Network (엘만 순환 신경망을 사용한 전력 에너지 시계열의 예측 및 분석)

  • Lee, Chang-Yong;Kim, Jinho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.1
    • /
    • pp.84-93
    • /
    • 2018
  • In this paper, we propose an Elman recurrent neural network to predict and analyze a time series of power energy consumption. To this end, we consider the volatility of the time series and apply the sample variance and the detrended fluctuation analyses to the volatilities. We demonstrate that there exists a correlation in the time series of the volatilities, which suggests that the power consumption time series contain a non-negligible amount of the non-linear correlation. Based on this finding, we adopt the Elman recurrent neural network as the model for the prediction of the power consumption. As the simplest form of the recurrent network, the Elman network is designed to learn sequential or time-varying pattern and could predict learned series of values. The Elman network has a layer of "context units" in addition to a standard feedforward network. By adjusting two parameters in the model and performing the cross validation, we demonstrated that the proposed model predicts the power consumption with the relative errors and the average errors in the range of 2%~5% and 3kWh~8kWh, respectively. To further confirm the experimental results, we performed two types of the cross validations designed for the time series data. We also support the validity of the model by analyzing the multi-step forecasting. We found that the prediction errors tend to be saturated although they increase as the prediction time step increases. The results of this study can be used to the energy management system in terms of the effective control of the cross usage of the electric and the gas energies.

Spatial-Temporal Modelling of Road Traffic Data in Seoul City

  • Lee, Sang-Yeol;Ahn, Soo-Han;Park, Chang-Yi;Jeon, Jong-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.261-270
    • /
    • 2002
  • Recently, the demand of the Intelligent Transportation System(ITS) has been increased to a large extent, and a real-time traffic information service based on the internet system became very important. When ITS companies carry out real-time traffic services, they find some traffic data missing, and use the conventional method of reconstructing missing values by calculating average time trend. However, the method is found unsatisfactory, so that we develop a new method based the spatial and spatial-temporal models. A cross-validation technique shows that the spatial-temporal model outperforms the others.

  • PDF

Time Series Classification of Cryptocurrency Price Trend Based on a Recurrent LSTM Neural Network

  • Kwon, Do-Hyung;Kim, Ju-Bong;Heo, Ju-Sung;Kim, Chan-Myung;Han, Youn-Hee
    • Journal of Information Processing Systems
    • /
    • v.15 no.3
    • /
    • pp.694-706
    • /
    • 2019
  • In this study, we applied the long short-term memory (LSTM) model to classify the cryptocurrency price time series. We collected historic cryptocurrency price time series data and preprocessed them in order to make them clean for use as train and target data. After such preprocessing, the price time series data were systematically encoded into the three-dimensional price tensor representing the past price changes of cryptocurrencies. We also presented our LSTM model structure as well as how to use such price tensor as input data of the LSTM model. In particular, a grid search-based k-fold cross-validation technique was applied to find the most suitable LSTM model parameters. Lastly, through the comparison of the f1-score values, our study showed that the LSTM model outperforms the gradient boosting model, a general machine learning model known to have relatively good prediction performance, for the time series classification of the cryptocurrency price trend. With the LSTM model, we got a performance improvement of about 7% compared to using the GB model.

Development of Highway Traffic Information Prediction Models Using the Stacking Ensemble Technique Based on Cross-validation (스태킹 앙상블 기법을 활용한 고속도로 교통정보 예측모델 개발 및 교차검증에 따른 성능 비교)

  • Yoseph Lee;Seok Jin Oh;Yejin Kim;Sung-ho Park;Ilsoo Yun
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.22 no.6
    • /
    • pp.1-16
    • /
    • 2023
  • Accurate traffic information prediction is considered to be one of the most important aspects of intelligent transport systems(ITS), as it can be used to guide users of transportation facilities to avoid congested routes. Various deep learning models have been developed for accurate traffic prediction. Recently, ensemble techniques have been utilized to combine the strengths and weaknesses of various models in various ways to improve prediction accuracy and stability. Therefore, in this study, we developed and evaluated a traffic information prediction model using various deep learning models, and evaluated the performance of the developed deep learning models as a stacking ensemble. The individual models showed error rates within 10% for traffic volume prediction and 3% for speed prediction. The ensemble model showed higher accuracy compared to other models when no cross-validation was performed, and when cross-validation was performed, it showed a uniform error rate in long-term forecasting.

Airline In-flight Meal Demand Forecasting with Neural Networks and Time Series Models

  • Lee, Young-Chan
    • Proceedings of the Korea Association of Information Systems Conference
    • /
    • 2000.11a
    • /
    • pp.36-44
    • /
    • 2000
  • The purpose of this study is to introduce a more efficient forecasting technique, which could help result the reduction of cost in removing the waste of airline in-flight meals. We will use a neural network approach known to many researchers as the “Outstanding Forecasting Technique”. We employed a multi-layer perceptron neural network using a backpropagation algorithm. We also suggested using other related information to improve the forecasting performances of neural networks. We divided the data into three sets, which are training data set, cross validation data set, and test data set. Time lag variables are still employed in our model according to the general view of time series forecasting. We measured the accuracy of our model by “Mean Square Error”(MSE). The suggested model proved most excellent in serving economy class in-flight meals. Forecasting the exact amount of meals needed for each airline could reduce the waste of meals and therefore, lead to the reduction of cost. Better yet, it could enhance the cost competition of each airline, keep the schedules on time, and lead to better service.

  • PDF

Pan Evaporation Analysis using Nonlinear Disaggregation Model (비선형 분리모형에 의한 증발접시 증발량의 해석)

  • Kim, Seong-Won;Kim, Jeong-Heon;Park, Gi-Beom
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2008.05a
    • /
    • pp.1147-1150
    • /
    • 2008
  • The goal of this research is to apply the neural networks models for the disaggregation of the pan evaporation (PE) data, Republic of Korea. The neural networks models consist of the support vector machines neural networks model (SVM-NNM) and multilayer perceptron neural networks model (MLP-NNM), respectively. The SVM-NNM in time series modeling is relatively new and it is more problematic in comparison with classifications. In this study, The disaggregation means that the yearly PE data divides into the monthly PE data. And, for the performances of the neural networks models, they are composed of training, cross validation, and testing data, respectively. From this research, we evaluate the impact of the SVM-NNM and the MLP-NNM for the disaggregation of the nonlinear time series data. We should, furthermore, construct the credible data of the monthly PE data from the disaggregation of the yearly PE data, and can suggest the methodology for the irrigation and drainage networks system.

  • PDF