• Title/Summary/Keyword: Multivariate time series model

Search Result 87, Processing Time 0.024 seconds

Visual Analytics Approach for Performance Improvement of predicting youth physical growth model (청소년 신체 성장 예측 모델의 성능 향상을 위한 시각적 분석 방법)

  • Yeon, Hanbyul;Pi, Mingyu;Seo, Seongbum;Ha, Seoho;Oh, Byungjun;Jang, Yun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.4
    • /
    • pp.21-29
    • /
    • 2017
  • Previous visual analytics researches has focused on reducing the uncertainty of predicted results using a variety of interactive visual data exploration techniques. The main purpose of the interactive search technique is to reduce the quality difference of the predicted results according to the level of the decision maker by understanding the relationship between the variables and choosing the appropriate model to predict the unknown variables. However, it is difficult to create a predictive model which forecast time series data whose overall trends is unknown such as youth physical growth data. In this paper, we pro pose a novel predictive analysis technique to forecast the physical growth value in small pieces of time series data with un certain trends. This model estimates the distribution of data at a particular point in time. We also propose a visual analytics system that minimizes the possible uncertainties in predictive modeling process.

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

On Useful Principal Component Features for EEG Classification (뇌파 분류에 유용한 주성분 특징)

  • Park, Sungcheol;Lee, Hyekyoung;Park, Seungjin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.178-180
    • /
    • 2003
  • EEG-based brain computer interface(BCI) provides a new communication channel between human brain and computer. EEG data is a multivariate time series so that hidden Markov model (HMM) might be a good choice for classification. However EEG is very noisy data and contains artifacts, so useful features mr expected to improve the performance of HMM. In this paper we addresses the usefulness of principal component features with Hidden Markov model (HHM). We show that some selected principal component features can suppress small noises and artifacts, hence improves classification performance. Experimental study for the classification of EEG data during imagination of a left, right up or down hand movement confirms the validity of our proposed method.

  • PDF

A Kullback-Leibler divergence based comparison of approximate Bayesian estimations of ARMA models

  • Amin, Ayman A
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.471-486
    • /
    • 2022
  • Autoregressive moving average (ARMA) models involve nonlinearity in the model coefficients because of unobserved lagged errors, which complicates the likelihood function and makes the posterior density analytically intractable. In order to overcome this problem of posterior analysis, some approximation methods have been proposed in literature. In this paper we first review the main analytic approximations proposed to approximate the posterior density of ARMA models to be analytically tractable, which include Newbold, Zellner-Reynolds, and Broemeling-Shaarawy approximations. We then use the Kullback-Leibler divergence to study the relation between these three analytic approximations and to measure the distance between their derived approximate posteriors for ARMA models. In addition, we evaluate the impact of the approximate posteriors distance in Bayesian estimates of mean and precision of the model coefficients by generating a large number of Monte Carlo simulations from the approximate posteriors. Simulation study results show that the approximate posteriors of Newbold and Zellner-Reynolds are very close to each other, and their estimates have higher precision compared to those of Broemeling-Shaarawy approximation. Same results are obtained from the application to real-world time series datasets.

Inverter-Based Solar Power Prediction Algorithm Using Artificial Neural Network Regression Model (인공 신경망 회귀 모델을 활용한 인버터 기반 태양광 발전량 예측 알고리즘)

  • Gun-Ha Park;Su-Chang Lim;Jong-Chan Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.2
    • /
    • pp.383-388
    • /
    • 2024
  • This paper is a study to derive the predicted value of power generation based on the photovoltaic power generation data measured in Jeollanam-do, South Korea. Multivariate variables such as direct current, alternating current, and environmental data were measured in the inverter to measure the amount of power generation, and pre-processing was performed to ensure the stability and reliability of the measured values. Correlation analysis used only data with high correlation with power generation in time series data for prediction using partial autocorrelation function (PACF). Deep learning models were used to measure the amount of power generation to predict the amount of photovoltaic power generation, and the results of correlation analysis of each multivariate variable were used to increase the prediction accuracy. Learning using refined data was more stable than when existing data were used as it was, and the solar power generation prediction algorithm was improved by using only highly correlated variables among multivariate variables by reflecting the correlation analysis results.

Multivariate Analysis of EEG Signal using Intervention Models (개입모형을 이용한 EEG 신호의 다변량 분석에 관한 연구)

  • Im, Seong-Sik;Kim, Jin-Ho;Kim, Chi-Yong;Hwang, Min-Cheol
    • Journal of the Ergonomics Society of Korea
    • /
    • v.18 no.1
    • /
    • pp.13-24
    • /
    • 1999
  • The objective of the study is to discriminate EEG(electroencephalogram) due to emotional changes. Emotion was evoked by the series of auditory stimuli which were selected from the natural sounds in the sound effect collection of compact disc. Seventeen university students participated and experienced positive or negative emotions by six auditory stimuli with intermission between stimuli. Temporal EEG ($T_3$, $T_4$, $T_5$, and $T_6$) was recorded at the same time and a subjective test was performed on the eleven point scales after the experiment. The maximum and minimum scores of the EEG among six stimuli EEG were analyzed for discrimination of emotion. The EEG signals were transformed into feature objects based on scalar intervention model coefficients. Auditory stimulus was considered as intervention variable. They were classified by Discriminant Analysis for each channel. The features showed results with the best classification accuracy of 91.2 % in $T_4$ for auditory stimuli. This study could be extended to establish an algorithm which quantifies and classifies emotions evoked by auditory stimulus using time-series models.

  • PDF

Development of the Autoregressive and Cross-Regressive Model for Groundwater Level Prediction at Muan Coastal Aquifer in Korea (전남 무안 해안 대수층에서의 지하수위 예측을 위한 자기교차회귀모형 구축)

  • Kim, Hyun Jung;Yeo, In Wook
    • Journal of Soil and Groundwater Environment
    • /
    • v.19 no.4
    • /
    • pp.23-30
    • /
    • 2014
  • Coastal aquifer in Muan, Jeonnam, has experienced heavy seawater intrusion caused by the extraction of a substantial amount of groundwater for the agricultural purpose throughout the year. It was observed that groundwater level dropped below sea level due to heavy pumping during a dry season, which could accelerate seawater intrusion. Therefore, water level needs to be monitored and managed to prevent further seawater intrusion. The purpose of this study is to develop the autoregressive-cross-regressive (ARCR) models that can predict the present or future groundwater level using its own previous values and pumping events. The ARCR model with pumping and water level data of the proceeding five hours (i.e., the model order of five) predicted groundwater level better than that of the model orders of ten and twenty. This was contrary to expectation that higher orders do increase the coefficient of determination ($R^2$) as a measure of the model's goodness. It was found that the ARCR model with order five was found to make a good prediction of next 48 hour groundwater levels after the start of pumping with $R^2$ higher than 0.9.

Evaluation of Multivariate Stream Data Reduction Techniques (다변량 스트림 데이터 축소 기법 평가)

  • Jung, Hung-Jo;Seo, Sung-Bo;Cheol, Kyung-Joo;Park, Jeong-Seok;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.889-900
    • /
    • 2006
  • Even though sensor networks are different in user requests and data characteristics depending on each application area, the existing researches on stream data transmission problem focus on the performance improvement of their methods rather than considering the original characteristic of stream data. In this paper, we introduce a hierarchical or distributed sensor network architecture and data model, and then evaluate the multivariate data reduction methods suitable for user requirements and data features so as to apply reduction methods alternatively. To assess the relative performance of the proposed multivariate data reduction methods, we used the conventional techniques, such as Wavelet, HCL(Hierarchical Clustering), Sampling and SVD (Singular Value Decomposition) as well as the experimental data sets, such as multivariate time series, synthetic data and robot execution failure data. The experimental results shows that SVD and Sampling method are superior to Wavelet and HCL ia respect to the relative error ratio and execution time. Especially, since relative error ratio of each data reduction method is different according to data characteristic, it shows a good performance using the selective data reduction method for the experimental data set. The findings reported in this paper can serve as a useful guideline for sensor network application design and construction including multivariate stream data.

The sparse vector autoregressive model for PM10 in Korea (희박 벡터자기상관회귀 모형을 이용한 한국의 미세먼지 분석)

  • Lee, Wonseok;Baek, Changryong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.807-817
    • /
    • 2014
  • This paper considers multivariate time series modelling of PM10 data in Korea collected from 2008 to 2011. We consider both temporal and spatial dependencies of PM10 by applying the sparse vector autoregressive (sVAR) modelling proposed by Davis et al. (2013). It utilizes the partial spectral coherence to measure cross correlation between different regions, in turn provides the sparsity in the model while balancing the parsimony of model and the goodness of fit. It is also shown that sVAR performs better than usual vector autoregressive model (VAR) in forecasting.

Assessment of Water Quality Characteristics in the Middle and Upper Watershed of the Geumho River Using Multivariate Statistical Analysis and Watershed Environmental Model (다변량통계분석 및 유역환경모델을 이용한 금호강 중·상류 유역의 수질특성평가)

  • Seo, Youngmin;Kwon, Kooho;Choi, Yun Young;Lee, Byung Joon
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.6
    • /
    • pp.520-530
    • /
    • 2021
  • Multivariate statistical analysis and an environmental hydrological model were applied for investigating the causes of water pollution and providing best management practices for water quality improvement in urban and agricultural watersheds. Principal component analysis (PCA) and cluster analysis (CA) for water quality time series data show that chemical oxygen demand (COD), total organic carbon (TOC), suspended solids (SS) and total phosphorus (T-P) are classified as non-point source pollutants that are highly correlated with river discharge. Total nitrogen (T-N), which has no correlation with river discharge and inverse relationship with water temperature, behaves like a point source with slow and consistent release. Biochemical oxygen demand (BOD) shows intermediate characteristics between point and non-point source pollutants. The results of the PCA and CA for the spatial water quality data indicate that the cluster 1 of the watersheds was characterized as upstream watersheds with good water quality and high proportion of forest. The cluster 3 shows however indicates the most polluted watersheds with substantial discharge of BOD and nutrients from urban sewage, agricultural and industrial activities. The cluster 2 shows intermediate characteristics between the clusters 1 and 3. The results of hydrological simulation program-Fortran (HSPF) model simulation indicated that the seasonal patterns of BOD, T-N and T-P are affected substantially by agricultural and livestock farming activities, untreated wastewater, and environmental flow. The spatial analysis on the model results indicates that the highly-populated watersheds are the prior contributors to the water quality degradation of the river.