DOI QR코드

DOI QR Code

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction

데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로

  • Chun, Se-Hak (Department of Business Administration, Seoul National University of Science and Technology)
  • 천세학 (서울과학기술대학교 경영학과)
  • Received : 2019.02.15
  • Accepted : 2019.08.17
  • Published : 2019.09.30

Abstract

Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

본 논문은 학습데이터의 크기에 따른 사례기반추론기법이 주가예측력에 어떻게 영향을 미치는지 살펴본다. 삼성전자 주가를 대상을 학습데이터를 2000년부터 2017년까지 이용한 경우와 2015년부터 2017년까지 이용한 경우를 비교하였다. 테스트데이터는 두 경우 모두 2018년 1월 1일부터 2018년 8월 31일까지 이용하였다. 시계 열데이터의 경우 과거데이터가 얼마나 유용한지 살펴보는 측면과 유사사례개수의 중요성을 살펴보는 측면에서 연구를 진행하였다. 실험결과 학습데이터가 많은 경우가 그렇지 않은 경우보다 예측력이 높았다. MAPE을 기준으로 비교할 때, 학습데이터가 적은 경우, 유사사례 개수와 상관없이 k-NN이 랜덤워크모델에 비해 좋은 결과를 보여주지 못했다. 그러나 학습데이터가 많은 경우, 일반적으로 k-NN의 예측력이 랜덤워크모델에 비해 좋은 결과를 보여주었다. k-NN을 비롯한 다른 데이터마이닝 방법론들이 주가 예측력 제고를 위해 학습데이터의 크기를 증가시키는 것 이외에, 거시경제변수를 고려한 기간유사사례를 찾아 적용하는 것을 제안한다.

Keywords

References

  1. Aamodt, A. Plaza, E., "Case-based reasoning: foundational issues, methodological variations, and system approaches", AI communications: the European journal on artificial intelligence, Vol.7, No 1, (1994), 39-59.
  2. Ahn, H., K.J. Kim., "Bankruptcy prediction modeling with hybrid case-basedreasoning and genetic algorithms approach", Appl. Soft Comput. Vol 9, No 2, (2009), 599-607. https://doi.org/10.1016/j.asoc.2008.08.002
  3. Borrajo, M.L., J.M. Corchado, E.S. Corchado, M.A. Pellicer, J. Bajo. "Multi-agentneural business control system", Inf. Sci. 180, (2010), 911-927. https://doi.org/10.1016/j.ins.2009.11.028
  4. Cao, Q., Leggio, K.B., Schniederjans, M.J. "A comparison between Fama andFrench's model and artificial neural networks in predicting the Chinese stockmarket", Comput. Oper. Res. 32, (2005), 2499-2512. https://doi.org/10.1016/j.cor.2004.03.015
  5. Chen, A.-S., Leung, M.T., Daouk, H. "Application of neural networks to anemerging financial market: forecasting and trading the Taiwan Stock Index", Comput. Oper. Res. 30, (2003). 901-923. https://doi.org/10.1016/S0305-0548(02)00037-0
  6. Chun, Se-Chul, Jin Kim, Ki-Baik Hahm, Yoon-Joo Park., Se-Hak Chun. "Data mining technique for medical informatics: detecting gastric cancer using case-based reasoning and single nucleotide polymorphisms", Expert Systems, (2008), Vol.25, No 2, 163-172. https://doi.org/10.1111/j.1468-0394.2008.00446.x
  7. Chun, Se-Hak, Steven H. Kim. "Data mining for financial prediction and trading: application to single and multiple markets", Expert Systems with Applications, (2004a). Vol.26, No 2, 131-139. https://doi.org/10.1016/S0957-4174(03)00113-1
  8. Chun, Se-Hak, Steven H. Kim. "Automated generation of new knowledge to support managerial decision making: case study in forecasting a stock market", Expert Systems, (2004b), Vol.21, No 4, 192-207. https://doi.org/10.1111/j.1468-0394.2004.00277.x
  9. Chun, S.H., Y.J. Park. "Dynamic adaptive ensemble case-based reasoning: application to stock market prediction", Expert Syst. Appl. (2005). Vol 28, No. 3, 435-443. https://doi.org/10.1016/j.eswa.2004.12.004
  10. Chun, S.H., Y.J. Park. "A new hybrid data mining technique using a regressioncase based reasoning: application to financial forecasting", Expert Syst. Appl. (2006). Vol 31, No 2, 329-336. https://doi.org/10.1016/j.eswa.2005.09.053
  11. Dutta, S., Shekkar, S. Bond rating. A non-conservative application of neural networks. International Joint Conference on Neural Networks, 2, (1988). 443-450.
  12. Gong, X. , Si, Y.-W. , Fong, S. , Biuk-Aghai, R. P. "Financial time series pattern matching with extended ucr suite and support vector machine", Expert Systems with Applications, (2016). 55, 284-296. https://doi.org/10.1016/j.eswa.2016.02.017
  13. Huang, W., Nakamori, Y., Wang, S.-Y., "Forecasting stock market movementdirection with support vector machine", Comput. Oper. Res. (2005). 32, 2513-2522. https://doi.org/10.1016/j.cor.2004.03.016
  14. Hassan, M. R., Nath, B., and Kirley, M. "A fusionmodel ofhmm, annand gafor stock market forecasting", Expert systems with Applications, (2007). 33, 171-180. https://doi.org/10.1016/j.eswa.2006.04.007
  15. Hu, Z., Zhu, J., Tse, K. "Stocks market prediction using support vector machine", IEEE. 2013 6th International Conference on Information Management, Innovation Management and Industrial Engineering, (2013). 115-118, 2.
  16. Huang, C.-F. "A hybrid stock selection model using genetic algorithms and support vector regression", Applied Soft Computing, (2012). 12, 807-818. https://doi.org/10.1016/j.asoc.2011.10.009
  17. Jo, H., I. Han, H. Lee, "Bankruptcy prediction using case-based reasoning, neural networks and discriminant analysis", Expert Syst. Appl. 13 (2) (1997) 97-108. https://doi.org/10.1016/S0957-4174(97)00011-0
  18. Kim, K.J., I. Han, "Maintaining case-based reasoning systems using a geneticalgorithms approach", Expert Syst. Appl. 21 (2001) 139-145. https://doi.org/10.1016/S0957-4174(01)00035-5
  19. Kim Steven H, Chun, Se Hak. "Graded forecasting using an array of bipolar predictions: application of probabilistic neural networks to a stock market index", International Journal of Forecasting, (1998). Vol 14: 323-337. https://doi.org/10.1016/S0169-2070(98)00003-X
  20. Kim, K. "Toward global optimization of case-based reasoning systems for financial forecasting", Appl. Intell. (2004) 21 (3) 239-249. https://doi.org/10.1023/B:APIN.0000043557.93085.72
  21. Kim, K., Han, I. "Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index", Expert Systems with Applications, (2000). Vol.19, 125-132. https://doi.org/10.1016/S0957-4174(00)00027-0
  22. Kodogiannis, V., Lolis, A. "Forecasting financial time series using neuralnetwork and fuzzy system-based techniques", Neural Computing & Applications, (2002). 11, 90-102. https://doi.org/10.1007/s005210200021
  23. Li, H. J. Sun, J. Wu, X.J. Wu. "Supply chain trust diagnosis (SCTD) using inductivecase-based reasoning ensemble (ICBRE): the case of general competence trustdiagnosis", Appl. Soft Comput. (2012). 12 (8) 2312-2321. https://doi.org/10.1016/j.asoc.2012.03.029
  24. Liao TW, Zhang ZM, Mount CR. "A case-based reasoning system for identifying failure mechanisms", Engineering Applications of Artificial Intelligence (2000). 13:199-213. https://doi.org/10.1016/S0952-1976(99)00052-4
  25. Lima, J., R. Francisco, L. Osiro, L.C.R. Carpinetti, "A fuzzy inference andcategorization approach for supplier selection using compensatory and noncompensatory decision rules", Appl. Soft Comput. (2013), http://dx.doi.org/10.1016/j.asoc.2013.06.020.
  26. Lin, Y., Guo, H., Hu, J. "An svm-based approach for stock market trend prediction", IEEE. Neural Networks (IJCNN), The 2013 International Joint Conference on, (2013). 1-7.
  27. Li, H., Sun, J., Sun, B.-L. "Financial distress prediction based on or-cbr in theprinciple of k-nearest neighbors", Expert Systems with Applications, (2009). 36, 643-659. https://doi.org/10.1016/j.eswa.2007.09.038
  28. Li, H., J. Sun, "Gaussian case-based reasoning for business failure prediction with empirical data in China", Inf. Sci. (2009). 179: 89-108. https://doi.org/10.1016/j.ins.2008.09.003
  29. Liao TW, Zhang ZM, Mount CR. "A case-based reasoning system for identifying failure mechanisms", Engineering Applications of Artificial Intelligence 2000;13:199-213. https://doi.org/10.1016/S0952-1976(99)00052-4
  30. Pereira, I., A. Madureira. "Self-optimization module for scheduling using casebased reasoning", Appl. Soft Comput. (2013). 13 (3) 1419-1432 https://doi.org/10.1016/j.asoc.2012.02.009
  31. Park, Y.J., E. Choi, S.H. Park. "Two-step filtering datamining method integrating case-based reasoning and rule induction", Expert Syst. Appl. (2009). 36 (1) 861-871. https://doi.org/10.1016/j.eswa.2007.10.036
  32. Shin, K.S. and I. Han. "A case-based approach using inductive indexing for corporatebond rating" Decision Support Syst. (2001). 32 (1) 41-52. https://doi.org/10.1016/S0167-9236(01)00099-9
  33. Tay, F.E.H., Cao, L., "Application of support vector machines in financial timeseries forecasting", Omega (2001). 29, 309-317. https://doi.org/10.1016/S0305-0483(01)00026-3
  34. Teixeira, L. A., De Oliveira, A. L. I. "A method for automatic stock tradingcombining technical analysis and nearest neighbor classification", Expert Systems with Applications, (2010). 37, 6885-6890. https://doi.org/10.1016/j.eswa.2010.03.033
  35. Thirunavukarasu, P. "Estimation of return on investment in share market through ANN", Global Journal of Finance and Management, (2009). 1, 113-122.
  36. Xi, L., Muzhou, H., Lee, M. H., Li, J., Wei, D., Hai, H., Wu, Y. "A new constructive neural network method for noise processing and its application on stock market prediction", Applied Soft Computing, (2014). 15, 57-66. https://doi.org/10.1016/j.asoc.2013.10.013
  37. Yip, A. Y. N. "Predicting business failure with a case-based reasoning approach", In M. G. Negoita, R. J. Howlett, L. C. Jain (Eds.), Knowledge-based intelligent information and engineering systems: 8th international conference, KES (2004). Proceedings Part III (pp.20-25).
  38. Yu, H., Chen, R., Zhang, G. "A svm stock selection model within pca", Procedia Computer Science, (2014). 31, 406-412. https://doi.org/10.1016/j.procs.2014.05.284
  39. Zhang, Y., Shen, W. "Stock yield forecast based on ls-svm in bayesian inference", IEEE. Future Computer and Communication, 2009. FCC'09. International Conference on, (2009). 8-11.

Cited by

  1. A Study on Comparison of Open Application Programming Interface of Securities Companies Supporting Python vol.10, pp.1, 2021, https://doi.org/10.7236/ijasc.2021.10.1.97