• Title/Summary/Keyword: Time Series Data Prediction

Search Result 613, Processing Time 0.025 seconds

Oil Price Forecasting Based on Machine Learning Techniques (기계학습기법에 기반한 국제 유가 예측 모델)

  • Park, Kang-Hee;Hou, Tianya;Shin, Hyun-Jung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.37 no.1
    • /
    • pp.64-73
    • /
    • 2011
  • Oil price prediction is an important issue for the regulators of the government and the related industries. When employing the time series techniques for prediction, however, it becomes difficult and challenging since the behavior of the series of oil prices is dominated by quantitatively unexplained irregular external factors, e.g., supply- or demand-side shocks, political conflicts specific to events in the Middle East, and direct or indirect influences from other global economical indices, etc. Identifying and quantifying the relationship between oil price and those external factors may provide more relevant prediction than attempting to unclose the underlying structure of the series itself. Technically, this implies the prediction is to be based on the vectoral data on the degrees of the relationship rather than the series data. This paper proposes a novel method for time series prediction of using Semi-Supervised Learning that was originally designed only for the vector types of data. First, several time series of oil prices and other economical indices are transformed into the multiple dimensional vectors by the various types of technical indicators and the diverse combination of the indicator-specific hyper-parameters. Then, to avoid the curse of dimensionality and redundancy among the dimensions, the wellknown feature extraction techniques, PCA and NLPCA, are employed. With the extracted features, a timepointspecific similarity matrix of oil prices and other economical indices is built and finally, Semi-Supervised Learning generates one-timepoint-ahead prediction. The series of crude oil prices of West Texas Intermediate (WTI) was used to verify the proposed method, and the experiments showed promising results : 0.86 of the average AUC.

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.

Machine Learning Based Architecture and Urban Data Analysis - Construction of Floating Population Model Using Deep Learning - (머신러닝을 통한 건축 도시 데이터 분석의 기초적 연구 - 딥러닝을 이용한 유동인구 모델 구축 -)

  • Shin, Dong-Youn
    • Journal of KIBIM
    • /
    • v.9 no.1
    • /
    • pp.22-31
    • /
    • 2019
  • In this paper, we construct a prototype model for city data prediction by using time series data of floating population, and use machine learning to analyze urban data of complex structure. A correlation prediction model was constructed using three of the 10 data (total flow population, male flow population, and Monday flow population), and the result was compared with the actual data. The results of the accuracy were evaluated. The results of this study show that the predicted model of the floating population predicts the correlation between the predicted floating population and the current state of commerce. It is expected that it will help efficient and objective design in the planning stages of architecture, landscape, and urban areas such as tree environment design and layout of trails. Also, it is expected that the dynamic population prediction using multivariate time series data and collected location data will be able to perform integrated simulation with time series data of various fields.

Model-Free Interval Prediction in a Class of Time Series with Varying Coefficients

  • Park, Sang-Woo;Cho, Sin-Sup;Lee, Sang-Yeol;Hwang, Sun-Y.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.2
    • /
    • pp.173-179
    • /
    • 2000
  • Interval prediction based on the empirical distribution function for the class of time series with time varying coefficients is discussed. To this end, strong mixing property of the model is shown and results due to Fotopoulos et. al.(1994) are employed. A simulation study is presented to assess the accuracy of the proposed interval predictor.

  • PDF

Two-dimensional attention-based multi-input LSTM for time series prediction

  • Kim, Eun Been;Park, Jung Hoon;Lee, Yung-Seop;Lim, Changwon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.1
    • /
    • pp.39-57
    • /
    • 2021
  • Time series prediction is an area of great interest to many people. Algorithms for time series prediction are widely used in many fields such as stock price, temperature, energy and weather forecast; in addtion, classical models as well as recurrent neural networks (RNNs) have been actively developed. After introducing the attention mechanism to neural network models, many new models with improved performance have been developed; in addition, models using attention twice have also recently been proposed, resulting in further performance improvements. In this paper, we consider time series prediction by introducing attention twice to an RNN model. The proposed model is a method that introduces H-attention and T-attention for output value and time step information to select useful information. We conduct experiments on stock price, temperature and energy data and confirm that the proposed model outperforms existing models.

A model of predicting performance of Olympic female weightlifters using time series analysis

  • Won, Jin-hee;Cho, In-ho
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.216-222
    • /
    • 2020
  • The purpose of this study was to predict the performance of female weightlifters using time series analysis. Based on this purpose, a time series analysis was used to calculate the performance prediction model for women(58kg) among the domestic women weightlifters who participated in the Olympics. As a result of creating time series data based on 10 years of record and then evaluating the sequential charts of each athlete group, the female athletes' records did not show any seasonality or difference. In addition, after examining the independence of the data through the creation of a time series model, it was shown that the models produced conformed to the criteria for compliance and that there was no difference in the data, but there was a trend. Accordingly, Holt linear trend analysis of the exponential smoothing model was applied. As a result of deriving the prediction model of the athletes through this process, it was found that the women (58kg) who participated in the Olympics continued to improve within the range of 166.11kg to 184.1kg.

Design of HCBKA-Based TSK Fuzzy Prediction System with Error Compensation (HCBKA 기반 오차 보정형 TSK 퍼지 예측시스템 설계)

  • Bang, Young-Keun;Lee, Chul-Heui
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.6
    • /
    • pp.1159-1166
    • /
    • 2010
  • To improve prediction quality of a nonlinear prediction system, the system's capability for uncertainty of nonlinear data should be satisfactory. This paper presents a TSK fuzzy prediction system that can consider and deal with the uncertainty of nonlinear data sufficiently. In the design procedures of the proposed system, HCBKA(Hierarchical Correlationship-Based K-means clustering Algorithm) was used to generate the accurate fuzzy rule base that can control output according to input efficiently, and the first-order difference method was applied to reflect various characteristics of the nonlinear data. Also, multiple prediction systems were designed to analyze the prediction tendencies of each difference data generated by the difference method. In addition, to enhance the prediction quality of the proposed system, an error compensation method was proposed and it compensated the prediction error of the systems suitably. Finally, the prediction performance of the proposed system was verified by simulating two typical time series examples.

Power Consumption Forecasting Scheme for Educational Institutions Based on Analysis of Similar Time Series Data (유사 시계열 데이터 분석에 기반을 둔 교육기관의 전력 사용량 예측 기법)

  • Moon, Jihoon;Park, Jinwoong;Han, Sanghoon;Hwang, Eenjun
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.954-965
    • /
    • 2017
  • A stable power supply is very important for the maintenance and operation of the power infrastructure. Accurate power consumption prediction is therefore needed. In particular, a university campus is an institution with one of the highest power consumptions and tends to have a wide variation of electrical load depending on time and environment. For this reason, a model that can accurately predict power consumption is required for the effective operation of the power system. The disadvantage of the existing time series prediction technique is that the prediction performance is greatly degraded because the width of the prediction interval increases as the difference between the learning time and the prediction time increases. In this paper, we first classify power data with similar time series patterns considering the date, day of the week, holiday, and semester. Next, each ARIMA model is constructed based on the classified data set and a daily power consumption forecasting method of the university campus is proposed through the time series cross-validation of the predicted time. In order to evaluate the accuracy of the prediction, we confirmed the validity of the proposed method by applying performance indicators.

EWMA Based Fusion for Time Series Forecasting (시계열 예측을 위한 EWMA 퓨전)

  • Shin, Hyung Won;Sohn, So Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.28 no.2
    • /
    • pp.171-177
    • /
    • 2002
  • In this paper, we propose a new data fusion method to improve the performance of individual prediction models for time series data. Individual models used are ARIMA and neural network and their results are combined based on the weight reflecting the inverse of EWMA of squared prediction error of each individual model. Monte Carlo simulation is used to identify the situation where the proposed approach can take a vintage point over typical fusion methods which utilize MSE for weight. Study results indicate the following: EWMA performs better than MSE fusion when the data size is large with a relatively big amplitude, which is often observed in intra-cranial pressure data. Additionally, EWMA turns out to be a best choice among MSE fusion and the two individual prediction models when the data size is large with relatively small random noises, often appearing in tax revenue data.

Comparison of Mortality Estimate and Prediction by the Period of Time Series Data Used (시계열 적용기간에 따른 사망력 추정 및 예측결과 비교 - LC모형과 LC 코호트효과 확장모형을 중심으로 -)

  • Jung, Kyunam;Baek, Jeeseon;Kim, Donguk
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.6
    • /
    • pp.1019-1032
    • /
    • 2013
  • The accurate prediction of future mortality is an important issue due to recent rapid increases in life expectancy. An accurate estimation and prediction of mortality is important to future welfare policies. The optimal selection of a mortality model is important to estimate and predict mortality; however, the period of time series data used is also an important issue. It is essential to understand that the time series data for mortality is short in Korea and the data before 1982 is incomplete. This paper divides the time series of Korean mortality into two sets to compare the parameter estimates of the LC model and LC model with a cohort effect by the period of data used. A modeling and prediction of the mortality index and cohort effect index as well as the evaluation of future life expectancy is conducted. Finally, some suggestions are proposed for the future prediction of mortality.