• Title/Summary/Keyword: Time Series Data Prediction

Search Result 621, Processing Time 0.028 seconds

Development of a Machine Learning Model for Imputing Time Series Data with Massive Missing Values (결측치 비율이 높은 시계열 데이터 분석 및 예측을 위한 머신러닝 모델 구축)

  • Bangwon Ko;Yong Hee Han
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.3
    • /
    • pp.176-182
    • /
    • 2024
  • In this study, we compared and analyzed various methods of missing data handling to build a machine learning model that can effectively analyze and predict time series data with a high percentage of missing values. For this purpose, Predictive State Model Filtering (PSMF), MissForest, and Imputation By Feature Importance (IBFI) methods were applied, and their prediction performance was evaluated using LightGBM, XGBoost, and Explainable Boosting Machines (EBM) machine learning models. The results of the study showed that MissForest and IBFI performed the best among the methods for handling missing values, reflecting the nonlinear data patterns, and that XGBoost and EBM models performed better than LightGBM. This study emphasizes the importance of combining nonlinear imputation methods and machine learning models in the analysis and prediction of time series data with a high percentage of missing values, and provides a practical methodology.

A Machine Learning Model for Predicting Silica Concentrations through Time Series Analysis of Mining Data (광업 데이터의 시계열 분석을 통해 실리카 농도를 예측하기 위한 머신러닝 모델)

  • Lee, Seung Hoon;Yoon, Yeon Ah;Jung, Jin Hyeong;Sim, Hyun su;Chang, Tai-Woo;Kim, Yong Soo
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.3
    • /
    • pp.511-520
    • /
    • 2020
  • Purpose: The purpose of this study was to devise an accurate machine learning model for predicting silica concentrations following the addition of impurities, through time series analysis of mining data. Methods: The mining data were preprocessed and subjected to time series analysis using the machine learning model. Through correlation analysis, valid variables were selected and meaningless variables were excluded. To reflect changes over time, dependent variables at baseline were treated as independent variables at later time points. The relationship between independent variables and the dependent variable after n point was subjected to Pearson correlation analysis. Results: The correlation (R2) was strongest after 3 hours, which was adopted as a dependent variable. According to root mean square error (RMSE) data, the proposed method was superior to the other machine learning methods. The XGboost algorithm showed the best predictive performance. Conclusion: This study is important given the current lack of machine learning studies pertaining to the domestic mining industry. In addition, using time series analysis in mining data will show further improvement. Before establishing a predictive model for the proposed method, predictions should be made using data with time series characteristics. After doing this work, it should also improve prediction accuracy in other domains.

Deep Prediction of Stock Prices with K-Means Clustered Data Augmentation (K-평균 군집화 데이터 증강을 통한 주가 심층 예측)

  • Kyounghoon Han;Huigyu Yang;Hyunseung Choo
    • Journal of Internet Computing and Services
    • /
    • v.24 no.2
    • /
    • pp.67-74
    • /
    • 2023
  • Stock price prediction research in the financial sector aims to ensure trading stability and achieve profit realization. Conventional statistical prediction techniques are not reliable for actual trading decisions due to low prediction accuracy compared to randomly predicted results. Artificial intelligence models improve accuracy by learning data characteristics and fluctuation patterns to make predictions. However, predicting stock prices using long-term time series data remains a challenging problem. This paper proposes a stable and reliable stock price prediction method using K-means clustering-based data augmentation and normalization techniques and LSTM models specialized in time series learning. This enables obtaining more accurate and reliable prediction results and pursuing high profits, as well as contributing to market stability.

Crime hotspot prediction based on dynamic spatial analysis

  • Hajela, Gaurav;Chawla, Meenu;Rasool, Akhtar
    • ETRI Journal
    • /
    • v.43 no.6
    • /
    • pp.1058-1080
    • /
    • 2021
  • Crime is not a completely random event but rather shows a pattern in space and time. Capturing the dynamic nature of crime patterns is a challenging task. Crime prediction models that rely only on neighborhood influence and demographic features might not be able to capture the dynamics of crime patterns, as demographic data collection does not occur frequently and is static. This work proposes a novel approach for crime count and hotspot prediction to capture the dynamic nature of crime patterns using taxi data along with historical crime and demographic data. The proposed approach predicts crime events in spatial units and classifies each of them into a hotspot category based on the number of crime events. Four models are proposed, which consider different covariates to select a set of independent variables. The experimental results show that the proposed combined subset model (CSM), in which static and dynamic aspects of crime are combined by employing the taxi dataset, is more accurate than the other models presented in this study.

A Study on the Prediction of Power Consumption in the Air-Conditioning System by Using the Gaussian Process (정규 확률과정을 사용한 공조 시스템의 전력 소모량 예측에 관한 연구)

  • Lee, Chang-Yong;Song, Gensoo;Kim, Jinho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.1
    • /
    • pp.64-72
    • /
    • 2016
  • In this paper, we utilize a Gaussian process to predict the power consumption in the air-conditioning system. As the power consumption in the air-conditioning system takes a form of a time-series and the prediction of the power consumption becomes very important from the perspective of the efficient energy management, it is worth to investigate the time-series model for the prediction of the power consumption. To this end, we apply the Gaussian process to predict the power consumption, in which the Gaussian process provides a prior probability to every possible function and higher probabilities are given to functions that are more likely consistent with the empirical data. We also discuss how to estimate the hyper-parameters, which are parameters in the covariance function of the Gaussian process model. We estimated the hyper-parameters with two different methods (marginal likelihood and leave-one-out cross validation) and obtained a model that pertinently describes the data and the results are more or less independent of the estimation method of hyper-parameters. We validated the prediction results by the error analysis of the mean relative error and the mean absolute error. The mean relative error analysis showed that about 3.4% of the predicted value came from the error, and the mean absolute error analysis confirmed that the error in within the standard deviation of the predicted value. We also adopt the non-parametric Wilcoxon's sign-rank test to assess the fitness of the proposed model and found that the null hypothesis of uniformity was accepted under the significance level of 5%. These results can be applied to a more elaborate control of the power consumption in the air-conditioning system.

Building Energy Time Series Data Mining for Behavior Analytics and Forecasting Energy consumption

  • Balachander, K;Paulraj, D
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.6
    • /
    • pp.1957-1980
    • /
    • 2021
  • The significant aim of this research has always been to evaluate the mechanism for efficient and inherently aware usage of vitality in-home devices, thus improving the information of smart metering systems with regard to the usage of selected homes and the time of use. Advances in information processing are commonly used to quantify gigantic building activity data steps to boost the activity efficiency of the building energy systems. Here, some smart data mining models are offered to measure, and predict the time series for energy in order to expose different ephemeral principles for using energy. Such considerations illustrate the use of machines in relation to time, such as day hour, time of day, week, month and year relationships within a family unit, which are key components in gathering and separating the effect of consumers behaviors in the use of energy and their pattern of energy prediction. It is necessary to determine the multiple relations through the usage of different appliances from simultaneous information flows. In comparison, specific relations among interval-based instances where multiple appliances use continue for certain duration are difficult to determine. In order to resolve these difficulties, an unsupervised energy time-series data clustering and a frequent pattern mining study as well as a deep learning technique for estimating energy use were presented. A broad test using true data sets that are rich in smart meter data were conducted. The exact results of the appliance designs that were recognized by the proposed model were filled out by Deep Convolutional Neural Networks (CNN) and Recurrent Neural Networks (LSTM and GRU) at each stage, with consolidated accuracy of 94.79%, 97.99%, 99.61%, for 25%, 50%, and 75%, respectively.

Development of Traffic Speed Prediction Model Reflecting Spatio-temporal Impact based on Deep Neural Network (시공간적 영향력을 반영한 딥러닝 기반의 통행속도 예측 모형 개발)

  • Kim, Youngchan;Kim, Junwon;Han, Yohee;Kim, Jongjun;Hwang, Jewoong
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.1
    • /
    • pp.1-16
    • /
    • 2020
  • With the advent of the fourth industrial revolution era, there has been a growing interest in deep learning using big data, and studies using deep learning have been actively conducted in various fields. In the transportation sector, there are many advantages to using deep learning in research as much as using deep traffic big data. In this study, a short -term travel speed prediction model using LSTM, a deep learning technique, was constructed to predict the travel speed. The LSTM model suitable for time series prediction was selected considering that the travel speed data, which is used for prediction, is time series data. In order to predict the travel speed more precisely, we constructed a model that reflects both temporal and spatial effects. The model is a short-term prediction model that predicts after one hour. For the analysis data, the 5minute travel speed collected from the Seoul Transportation Information Center was used, and the analysis section was selected as a part of Gangnam where traffic was congested.

TIME SERIES ANALYSIS USING GRIDDED WIND-STRESS PRODUCT DERIVED FROM SATELLITE SCATTEROMETER DATA

  • KUTSUWADA KUNIO;MORIMOTO NAOKI
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.52-53
    • /
    • 2005
  • Time series of gridded surface wind and wind-stress vectors over the world ocean have been constructed by satellite scatterometer data. The products are derived from the ERS-l,2 covering 9 years during 1992-2000 and the Sea Winds on board QuikSCAT (Qscat) which has been operating up to the present since June 1999, so they allows us to analyze variabilities with various time scales. In this study, we focus on interannual variability of the wind stress in the mid- and high-latitude region of North Pacific. These are compared with those by numerical weather prediction(NWP) ones (NCEP Reanalysis). We also examine variability in the wind-stress curl field that is an important factor for ocean dynamics and focus its time and spatial characters in the northwestern Pacific around Japan. It is found that the vorticity field in the lower atmosphere tends to increase gradually with time, suggesting the enhancement of the North Pacific subtropical gyre.

  • PDF

Invariant causal prediction for time series data: Application to won dollar exchange rate data (시계열 자료에서 불변하는 인과성 탐색: 원-달러 환율 데이터에 적용)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.837-848
    • /
    • 2021
  • Evaluating or predicting the effectiveness of economic policies is an important issue, but it is difficult to find an economic variable which causes a significant result because there are numerous variables that cannot be taken into account. A randomized controlled experiment is the best way to investigate causality, but it is not realistically possible to control through randomization and intervention in time series data such as macroeconomic data. Although some analysis methods have been proposed to find causality, the methods such as Granger causality method and Chow test are insufficient to explain causality. Recently, Pfister et al. (2019) proposed invariant causal prediction methods which can be applicable in time series data. In this paper, we introduce the method of Pfister et al. (2019) and use the method to find macroeconomic variables invariantly affecting the won-dollar exchange rate.

Real-time PM10 Concentration Prediction LSTM Model based on IoT Streaming Sensor data (IoT 스트리밍 센서 데이터에 기반한 실시간 PM10 농도 예측 LSTM 모델)

  • Kim, Sam-Keun;Oh, Tack-Il
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.310-318
    • /
    • 2018
  • Recently, the importance of big data analysis is increasing as a large amount of data is generated by various devices connected to the Internet with the advent of Internet of Things (IoT). Especially, it is necessary to analyze various large-scale IoT streaming sensor data generated in real time and provide various services through new meaningful prediction. This paper proposes a real-time indoor PM10 concentration prediction LSTM model based on streaming data generated from IoT sensor using AWS. We also construct a real-time indoor PM10 concentration prediction service based on the proposed model. Data used in the paper is streaming data collected from the PM10 IoT sensor for 24 hours. This time series data is converted into sequence data consisting of 30 consecutive values from time series data for use as input data of LSTM. The LSTM model is learned through a sliding window process of moving to the immediately adjacent dataset. In order to improve the performance of the model, incremental learning method is applied to the streaming data collected every 24 hours. The linear regression and recurrent neural networks (RNN) models are compared to evaluate the performance of LSTM model. Experimental results show that the proposed LSTM prediction model has 700% improvement over linear regression and 140% improvement over RNN model for its performance level.