• Title/Summary/Keyword: sMAPE

Search Result 75, Processing Time 0.023 seconds

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

A New Metric for Evaluation of Forecasting Methods : Weighted Absolute and Cumulative Forecast Error (수요 예측 평가를 위한 가중절대누적오차지표의 개발)

  • Choi, Dea-Il;Ok, Chang-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.159-168
    • /
    • 2015
  • Aggregate Production Planning determines levels of production, human resources, inventory to maximize company's profits and fulfill customer's demands based on demand forecasts. Since performance of aggregate production planning heavily depends on accuracy of given forecasting demands, choosing an accurate forecasting method should be antecedent for achieving a good aggregate production planning. Generally, typical forecasting error metrics such as MSE (Mean Squared Error), MAD (Mean Absolute Deviation), MAPE (Mean Absolute Percentage Error), and CFE (Cumulated Forecast Error) are utilized to choose a proper forecasting method for an aggregate production planning. However, these metrics are designed only to measure a difference between real and forecast demands and they are not able to consider any results such as increasing cost or decreasing profit caused by forecasting error. Consequently, the traditional metrics fail to give enough explanation to select a good forecasting method in aggregate production planning. To overcome this limitation of typical metrics for forecasting method this study suggests a new metric, WACFE (Weighted Absolute and Cumulative Forecast Error), to evaluate forecasting methods. Basically, the WACFE is designed to consider not only forecasting errors but also costs which the errors might cause in for Aggregate Production Planning. The WACFE is a product sum of cumulative forecasting error and weight factors for backorder and inventory costs. We demonstrate the effectiveness of the proposed metric by conducting intensive experiments with demand data sets from M3-competition. Finally, we showed that the WACFE provides a higher correlation with the total cost than other metrics and, consequently, is a better performance in selection of forecasting methods for aggregate production planning.

EDNN based prediction of strength and durability properties of HPC using fibres & copper slag

  • Gupta, Mohit;Raj, Ritu;Sahu, Anil Kumar
    • Advances in concrete construction
    • /
    • v.14 no.3
    • /
    • pp.185-194
    • /
    • 2022
  • For producing cement and concrete, the construction field has been encouraged by the usage of industrial soil waste (or) secondary materials since it decreases the utilization of natural resources. Simultaneously, for ensuring the quality, the analyses of the strength along with durability properties of that sort of cement and concrete are required. The prediction of strength along with other properties of High-Performance Concrete (HPC) by optimization and machine learning algorithms are focused by already available research methods. However, an error and accuracy issue are possessed. Therefore, the Enhanced Deep Neural Network (EDNN) based strength along with durability prediction of HPC was utilized by this research method. Initially, the data is gathered in the proposed work. Then, the data's pre-processing is done by the elimination of missing data along with normalization. Next, from the pre-processed data, the features are extracted. Hence, the data input to the EDNN algorithm which predicts the strength along with durability properties of the specific mixing input designs. Using the Switched Multi-Objective Jellyfish Optimization (SMOJO) algorithm, the weight value is initialized in the EDNN. The Gaussian radial function is utilized as the activation function. The proposed EDNN's performance is examined with the already available algorithms in the experimental analysis. Based on the RMSE, MAE, MAPE, and R2 metrics, the performance of the proposed EDNN is compared to the existing DNN, CNN, ANN, and SVM methods. Further, according to the metrices, the proposed EDNN performs better. Moreover, the effectiveness of proposed EDNN is examined based on the accuracy, precision, recall, and F-Measure metrics. With the already-existing algorithms i.e., JO, GWO, PSO, and GA, the fitness for the proposed SMOJO algorithm is also examined. The proposed SMOJO algorithm achieves a higher fitness value than the already available algorithm.

An Energy Consumption Prediction Model for Smart Factory Using Data Mining Algorithms (데이터 마이닝 기반 스마트 공장 에너지 소모 예측 모델)

  • Sathishkumar, VE;Lee, Myeongbae;Lim, Jonghyun;Kim, Yubin;Shin, Changsun;Park, Jangwoo;Cho, Yongyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.5
    • /
    • pp.153-160
    • /
    • 2020
  • Energy Consumption Predictions for Industries has a prominent role to play in the energy management and control system as dynamic and seasonal changes are occurring in energy demand and supply. This paper introduces and explores the steel industry's predictive models of energy consumption. The data used includes lagging and leading reactive power lagging and leading current variable, emission of carbon dioxide (tCO2) and load type. Four statistical models are trained and tested in the test set: (a) Linear Regression (LR), (b) Radial Kernel Support Vector Machine (SVM RBF), (c) Gradient Boosting Machine (GBM), and (d) Random Forest (RF). Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are used for calculating regression model predictive performance. When using all the predictors, the best model RF can provide RMSE value 7.33 in the test set.

Missing Imputation Methodologies for Daily Traffic Counts by Transforming Time Data into Spatial Data (시간자료의 공간화를 통한 일교통량 결측대체 방법론 연구)

  • Heo, Tae-Young;Oh, Ju-Sam
    • International Journal of Highway Engineering
    • /
    • v.9 no.3
    • /
    • pp.21-28
    • /
    • 2007
  • We suggest a new spatial linear interpolation method to substitute linear interpolation method which widely used in transportation engineering to impute the missing daily traffic volume. We layout daily traffic volume which is time series data over the virtual lattice space to consider the spatial correlation. We used Moran Index to evaluate the spatial correlations among daily traffic volume in same week and same date traffic volume by week considering the circularity of daily traffic volume. For real application, we used daily traffic volume on November, 2004 provided by Korea Institute of Construction Technology(KICT) and transformed daily traffic volume to 4 times 7 virtual lattice space to reflect the spatial correlation. Finally we showed that the spatial linear interpolation method has good performance for missing data imputation based on MAPE, RMSE, and Theil's U criteria.

  • PDF

A Comparison of Predictive Power among Forecasting Models of Monthly Frozen Mackerel Consumer Price Models (냉동 고등어 소비자가격 모형 간 예측력 비교)

  • Jeong, Min-Gyeong;Nam, Jong-Oh
    • The Journal of Fisheries Business Administration
    • /
    • v.52 no.4
    • /
    • pp.13-28
    • /
    • 2021
  • The purpose of this study is to compare short-term price predictive power among ARMA ARMAX and VAR forecasting models based on the MDM test using monthly consumer price data of frozen mackerel. This study also aims to help policymakers and economic actors make reasonable choices in the market on monthly consumer price of frozen mackerel. To analyze this study, the frozen wholesale prices and new consumer prices were used as variables while the price time series data were used from December 2013 to July 2021. Through the unit root test, it was confirmed that the time series variables employed in the models were stable while the level variables were used for analysis. As a result of conducting information standards and Granger causality tests, it was found that the wholesale prices and fresh consumer prices from the previous month have affected the frozen consumer prices. Then, the model with the highest predictive power was selected by RMSE, RMSPE, MAE, MAPE, and Theil's inequality coefficient criteria where the predictive power was compared by the MDM test in order to examine which model is superior. As a result of the analysis, ARMAX(1,1) with the frozen wholesale, ARMAX(1,1) with the fresh consumer model and VAR model were selected. Through the five criteria and MDM tests, the VAR model was selected as the superior model in predicting the monthly consumer price of frozen mackerel.

Deep learning-based AI constitutive modeling for sandstone and mudstone under cyclic loading conditions

  • Luyuan Wu;Meng Li;Jianwei Zhang;Zifa Wang;Xiaohui Yang;Hanliang Bian
    • Geomechanics and Engineering
    • /
    • v.37 no.1
    • /
    • pp.49-64
    • /
    • 2024
  • Rocks undergoing repeated loading and unloading over an extended period, such as due to earthquakes, human excavation, and blasting, may result in the gradual accumulation of stress and deformation within the rock mass, eventually reaching an unstable state. In this study, a CNN-CCM is proposed to address the mechanical behavior. The structure and hyperparameters of CNN-CCM include Conv2D layers × 5; Max pooling2D layers × 4; Dense layers × 4; learning rate=0.001; Epoch=50; Batch size=64; Dropout=0.5. Training and validation data for deep learning include 71 rock samples and 122,152 data points. The AI Rock Constitutive Model learned by CNN-CCM can predict strain values(ε1) using Mass (M), Axial stress (σ1), Density (ρ), Cyclic number (N), Confining pressure (σ3), and Young's modulus (E). Five evaluation indicators R2, MAPE, RMSE, MSE, and MAE yield respective values of 0.929, 16.44%, 0.954, 0.913, and 0.542, illustrating good predictive performance and generalization ability of model. Finally, interpreting the AI Rock Constitutive Model using the SHAP explaining method reveals that feature importance follows the order N > M > σ1 > E > ρ > σ3.Positive SHAP values indicate positive effects on predicting strain ε1 for N, M, σ1, and σ3, while negative SHAP values have negative effects. For E, a positive value has a negative effect on predicting strain ε1, consistent with the influence patterns of conventional physical rock constitutive equations. The present study offers a novel approach to the investigation of the mechanical constitutive model of rocks under cyclic loading and unloading conditions.

Water demand forecasting at the DMA level considering sociodemographic and waterworks characteristics (사회인구통계 및 상수도시설 특성을 고려한 소블록 단위 물 수요예측 연구)

  • Saemmul Jin;Dooyong Choi;Kyoungpil Kim;Jayong Koo
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.37 no.6
    • /
    • pp.363-373
    • /
    • 2023
  • Numerous studies have established a correlation between sociodemographic characteristics and water usage, identifying population as a primary independent variable in mid- to long-term demand forecasting. Recent dramatic sociodemographic changes, including urban concentration-rural depopulation, low birth rates-aging population, and the rise in single-person households, are expected to impact water demand and supply patterns. This underscores the necessity for operational and managerial changes in existing water supply systems. While sociodemographic characteristics are regularly surveyed, the conducted surveys use aggregate units that do not align with the actual system. Consequently, many water demand forecasts have been conducted at the administrative district level without adequately considering the water supply system. This study presents an upward water demand forecasting model that accurately reflects real water facilities and consumers. The model comprises three key steps. Firstly, Statistics Korea's SGIS (Statistical Geological Information System) data was reorganized at the DMA level. Secondly, DMAs were classified using the SOM (Self-Organizing Map) algorithm to consider differences in water facilities and consumer characteristics. Lastly, water demand forecasting employed the PCR (Principal Component Regression) method to address multicollinearity and overfitting issues. The performance evaluation of this model was conducted for DMAs classified as rural areas due to the insufficient number of DMAs. The estimation results indicate that the correlation coefficients exceeded 0.9, and the MAPE remained within approximately 10% for the test dataset. This method is expected to be useful for reorganization plans, such as the expansion and contraction of existing facilities.

Development of Prediction Model for the Na Content of Leaves of Spring Potatoes Using Hyperspectral Imagery (초분광 영상을 이용한 봄감자의 잎 Na 함량 예측 모델 개발)

  • Park, Jun-Woo;Kang, Ye-Seong;Ryu, Chan-Seok;Jang, Si-Hyeong;Kang, Kyung-Suk;Kim, Tae-Yang;Park, Min-Jun;Baek, Hyeon-Chan;Song, Hye-Young;Jun, Sae-Rom;Lee, Su-Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.4
    • /
    • pp.316-328
    • /
    • 2021
  • In this study, the leaf Na content prediction model for spring potato was established using 400-1000 nm hyperspectral sensor to develop the multispectral sensor for the salinity monitoring in reclaimed land. The irrigation conditions were standard, drought, and salinity (2, 4, 8 dS/m), and the irrigation amount was calculated based on the amount of evaporation. The leaves' Na contents were measured 1st and 2nd weeks after starting irrigation in the vegetative, tuber formative, and tuber growing periods, respectively. The reflectance of the leaves was converted from 5 nm to 10 nm, 25 nm, and 50 nm of FWHM (full width at half maximum) based on the 10 nm wavelength intervals. Using the variance importance in projections of partial least square regression(PLSR-VIP), ten band ratios were selected as the variables to predict salinity damage levels with Na content of spring potato leaves. The MLR(Multiple linear regression) models were estimated by removing the band ratios one by one in the order of the lowest weight among the ten band ratios. The performance of models was compared by not only R2, MAPE but also the number of band ratios, optimal FWHM to develop the compact multispectral sensor. It was an advantage to use 25 nm of FWHM to predict the amount of Na in leaves for spring potatoes during the 1st and 2nd weeks vegetative and tuber formative periods and 2 weeks tuber growing periods. The selected bandpass filters were 15 bands and mainly in red and red-edge regions such as 430/440, 490/500, 500/510, 550/560, 570/580, 590/600, 640/650, 650/660, 670/680, 680/690, 690/700, 700/710, 710/720, 720/730, 730/740 nm.

A prediction study on the number of emergency patients with ASTHMA according to the concentration of air pollutants (대기오염물질 농도에 따른 천식 응급환자 수 예측 연구)

  • Han Joo Lee;Min Kyu Jee;Cheong Won Kim
    • Journal of Service Research and Studies
    • /
    • v.13 no.1
    • /
    • pp.63-75
    • /
    • 2023
  • Due to the development of industry, interest in air pollutants has increased. Air pollutants have affected various fields such as environmental pollution and global warming. Among them, environmental diseases are one of the fields affected by air pollutants. Air pollutants can affect the human body's skin or respiratory tract due to their small molecular size. As a result, various studies on air pollutants and environmental diseases have been conducted. Asthma, part of an environmental disease, can be life-threatening if symptoms worsen and cause asthma attacks, and in the case of adult asthma, it is difficult to cure once it occurs. Factors that worsen asthma include particulate matter and air pollution. Asthma is an increasing prevalence worldwide. In this paper, we study how air pollutants correlate with the number of emergency room admissions in asthma patients and predict the number of future asthma emergency patients using highly correlated air pollutants. Air pollutants used concentrations of five pollutants: sulfur dioxide(SO2), carbon monoxide(CO), ozone(O3), nitrogen dioxide(NO2), and fine dust(PM10), and environmental diseases used data on the number of hospitalizations of asthma patients in the emergency room. Data on the number of emergency patients of air pollutants and asthma were used for a total of 5 years from January 1, 2013 to December 31, 2017. The model made predictions using two models, Informer and LTSF-Linear, and performance indicators of MAE, MAPE, and RMSE were used to measure the performance of the model. The results were compared by making predictions for both cases including and not including the number of emergency patients. This paper presents air pollutants that improve the model's performance in predicting the number of asthma emergency patients using Informer and LTSF-Linear models.