• Title/Summary/Keyword: 시계열 데이터 예측

Search Result 535, Processing Time 0.027 seconds

Deep Learning Based Group Synchronization for Networked Immersive Interactions (네트워크 환경에서의 몰입형 상호작용을 위한 딥러닝 기반 그룹 동기화 기법)

  • Lee, Joong-Jae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.373-380
    • /
    • 2022
  • This paper presents a deep learning based group synchronization that supports networked immersive interactions between remote users. The goal of group synchronization is to enable all participants to synchronously interact with others for increasing user presence Most previous methods focus on NTP-based clock synchronization to enhance time accuracy. Moving average filters are used to control media playout time on the synchronization server. As an example, the exponentially weighted moving average(EWMA) would be able to track and estimate accurate playout time if the changes in input data are not significant. However it needs more time to be stable for any given change over time due to codec and system loads or fluctuations in network status. To tackle this problem, this work proposes the Deep Group Synchronization(DeepGroupSync), a group synchronization based on deep learning that models important features from the data. This model consists of two Gated Recurrent Unit(GRU) layers and one fully-connected layer, which predicts an optimal playout time by utilizing the sequential playout delays. The experiments are conducted with an existing method that uses the EWMA and the proposed method that uses the DeepGroupSync. The results show that the proposed method are more robust against unpredictable or rapid network condition changes than the existing method.

Seq2Seq model-based Prognostics and Health Management of Robot Arm (Seq2Seq 모델 기반의 로봇팔 고장예지 기술)

  • Lee, Yeong-Hyeon;Kim, Kyung-Jun;Lee, Seung-Ik;Kim, Dong-Ju
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.3
    • /
    • pp.242-250
    • /
    • 2019
  • In this paper, we propose a method to predict the failure of industrial robot using Seq2Seq (Sequence to Sequence) model, which is a model for transforming time series data among Artificial Neural Network models. The proposed method uses the data of the joint current and angular value, which can be measured by the robot itself, without additional sensor for fault diagnosis. After preprocessing the measured data for the model to learn, the Seq2Seq model was trained to convert the current to angle. Abnormal degree for fault diagnosis uses RMSE (Root Mean Squared Error) during unit time between predicted angle and actual angle. The performance evaluation of the proposed method was performed using the test data measured under different conditions of normal and defective condition of the robot. When the Abnormal degree exceed the threshold, it was classified as a fault, and the accuracy of the fault diagnosis was 96.67% from the experiment. The proposed method has the merit that it can perform fault prediction without additional sensor, and it has been confirmed from the experiment that high diagnostic performance and efficiency are available without requiring deep expert knowledge of the robot.

A prediction study on the number of emergency patients with ASTHMA according to the concentration of air pollutants (대기오염물질 농도에 따른 천식 응급환자 수 예측 연구)

  • Han Joo Lee;Min Kyu Jee;Cheong Won Kim
    • Journal of Service Research and Studies
    • /
    • v.13 no.1
    • /
    • pp.63-75
    • /
    • 2023
  • Due to the development of industry, interest in air pollutants has increased. Air pollutants have affected various fields such as environmental pollution and global warming. Among them, environmental diseases are one of the fields affected by air pollutants. Air pollutants can affect the human body's skin or respiratory tract due to their small molecular size. As a result, various studies on air pollutants and environmental diseases have been conducted. Asthma, part of an environmental disease, can be life-threatening if symptoms worsen and cause asthma attacks, and in the case of adult asthma, it is difficult to cure once it occurs. Factors that worsen asthma include particulate matter and air pollution. Asthma is an increasing prevalence worldwide. In this paper, we study how air pollutants correlate with the number of emergency room admissions in asthma patients and predict the number of future asthma emergency patients using highly correlated air pollutants. Air pollutants used concentrations of five pollutants: sulfur dioxide(SO2), carbon monoxide(CO), ozone(O3), nitrogen dioxide(NO2), and fine dust(PM10), and environmental diseases used data on the number of hospitalizations of asthma patients in the emergency room. Data on the number of emergency patients of air pollutants and asthma were used for a total of 5 years from January 1, 2013 to December 31, 2017. The model made predictions using two models, Informer and LTSF-Linear, and performance indicators of MAE, MAPE, and RMSE were used to measure the performance of the model. The results were compared by making predictions for both cases including and not including the number of emergency patients. This paper presents air pollutants that improve the model's performance in predicting the number of asthma emergency patients using Informer and LTSF-Linear models.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

A comparison between the real and synthetic cohort of mortality for Korea (가상코호트와 실제코호트 사망력 비교)

  • Oh, Jinho
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.427-446
    • /
    • 2018
  • Korea will have a super-aged society within only 30 years according to the United Nations' definition of an aging society and the statistics on Korea's Population projections (2016), indicates that Korea has the fastest ageing speed in the world. There is a lack of data on long-term time-series data on death as related to pension and welfare policies compared to the rapid rate of aging. This paper estimates life expectancy over 245 years (from 1955 to 2200) through past and future forecasts as well as compares the expected life expectancy of the synthetic cohort and the real cohort. In addition, an international comparisons were made to understand the level of aging in Korea. Estimates of the back-projection period were compared with previous studies and the LC model to improve accuracy and objectivity. In addition, the predictions after 2016 reflected the declined mortality rate effect of Korea using the LC-ER model. The results showed an increase in life expectancy of about 30 years over 60 years (1955-2015) with an expected life expectancy of the real cohort over the second century (1955-2155) higher than the synthetic cohort. The comparative advantage of life expectancy of real cohorts was confirmed to be a common trend among comparative countries. In addition, Japan and Korea have a higher life expectancy and starting from 85 to 90 years old, all comparative countries show that the growth rate for the life expectancy of synthetic and real cohorts is less than previous years.

Analysis of Global Shipping Market Status and Forecasting the Container Freight Volume of Busan New port using Time-series Model (글로벌 해운시장 현황 분석 및 시계열 모형을 이용한 부산 신항 컨테이너 물동량 예측에 관한 연구)

  • JO, Jun-Ho;Byon, Je-Seop;Kim, Hee-Cheul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.4
    • /
    • pp.295-303
    • /
    • 2017
  • In this paper, we analyze the trends of the international shipping market and the domestic and foreign factors of the crisis of the domestic shipping market, and identify the characteristics of the recovery of the Busan New Port trade volume which has decreased since the crisis of the domestic shipping market We quantitatively analyzed the future volume of Busan New Port and analyzed the trends of the prediction and recovery trends. As a result of analyzing Busan New Port container cargo volume by using big data analysis tool R, the variation of Busan New Cargo container cargo volume was estimated by ARIMA model (1,0,1) (1,0,1)[12] Estimation error, AICc and BIC were the most optimal ARIMA models. Therefore, we estimated the estimated value of Busan New Port trade for 36 months by using ARIMA (1, 0, 1)[12], which is the optimal model of Busan New Port trade, and estimated 13,157,184 TEU, 13,418,123 TEU, 13,539,884 TEU, and 4,526,406 TEU, respectively, indicating that it increased by about 2%, 2%, and 1%.

A Study on the Early Warning Model of Crude Oil Shipping Market Using Signal Approach (신호접근법에 의한 유조선 해운시장 위기 예측 연구)

  • Bong Keun Choi;Dong-Keun Ryoo
    • Journal of Navigation and Port Research
    • /
    • v.47 no.3
    • /
    • pp.167-173
    • /
    • 2023
  • The manufacturing industry is the backbone of the Korean economy. Among them, the petrochemical industry is a strategic growth industry, which makes a profit through reexports based on eminent technology in South Korea which imports all of its crude oil. South Korea imports whole amount of crude oil, which is the raw material for many manufacturing industries, by sea transportation. Therefore, it must respond swiftly to a highly volatile tanker freight market. This study aimed to make an early warning model of crude oil shipping market using a signal approach. The crisis of crude oil shipping market is defined by BDTI. The overall leading index is made of 38 factors from macro economy, financial data, and shipping market data. Only leading correlation factors were chosen to be used for the overall leading index. The overall leading index had the highest correlation coefficient factor of 0.499 two months ago. It showed a significant correlation coefficient five months ago. The QPS value was 0.13, which was found to have high accuracy for crisis prediction. Furthermore, unlike other previous time series forecasting model studies, this study quantitatively approached the time lag between economic crisis and the crisis of the tanker ship market, providing workers and policy makers in the shipping industry with an framework for strategies that could effectively deal with the crisis.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Plant Species Richness in Korea Utilizing Integrated Biological Survey Data (생물기초조사 통합자료를 활용한 우리나라 식물종 풍부도 분석)

  • Seungbum Hong;Jieun Oh;Jaegyu Cha;Kyungeun Lee
    • Korean Journal of Ecology and Environment
    • /
    • v.56 no.4
    • /
    • pp.363-374
    • /
    • 2023
  • The limitation in deriving the species richness representing the entire country of South Korea lies in its relatively short history of species field observations and the scattered observation data, which has been collected by various organizations in different fields. In this study, a comprehensive compilation of the observation data for plants held by agencies under the Ministry of Environment was conducted, enabling the construction of a time series dataset spanning over 100 years. The data integration was carried out using minimal criteria such as species name, observed location, and time (year) followed by data verification and correction processes. Based on the integrated plant species data, the comprehensive collection of plant species in South Korea has occurred predominantly since 2000, and the number of plant species explored through these surveys appears to be converging recently. The collection of species survey data necessary for deriving national-level biodiversity information has recently begun to meet the necessary conditions. Applying the Chao 2 method, the species richness of indigenous plants estimated at 3,182.6 for the 70-year period since 1951. A minimum cumulative period of 7 years is required for this estimation. This plant species richness from this study can be a baseline to study future changes in species richness in South Korea. Moreover, the integrated data with the estimation method for species richness used in this study appears to be applicable to derive regional biodiversity indices such as for local government units as well.

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

  • Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.127-148
    • /
    • 2020
  • The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.