DOI QR코드

DOI QR Code

Predicting stock price direction by using data mining methods : Emphasis on comparing single classifiers and ensemble classifiers

  • Eo, Kyun Sun (Sungkyunkwan University) ;
  • Lee, Kun Chang (SKK Business School/SAIHST (Samsung Advanced Institute of Health Sciences & Technology), Sungkyunkwan University)
  • 투고 : 2017.03.19
  • 심사 : 2017.08.09
  • 발행 : 2017.11.30

초록

This paper proposes a data mining approach to predicting stock price direction. Stock market fluctuates due to many factors. Therefore, predicting stock price direction has become an important issue in the field of stock market analysis. However, in literature, there are few studies applying data mining approaches to predicting the stock price direction. To contribute to literature, this paper proposes comparing single classifiers and ensemble classifiers. Single classifiers include logistic regression, decision tree, neural network, and support vector machine. Ensemble classifiers we consider are adaboost, random forest, bagging, stacking, and vote. For the sake of experiments, we garnered dataset from Korea Stock Exchange (KRX) ranging from 2008 to 2015. Data mining experiments using WEKA revealed that random forest, one of ensemble classifiers, shows best results in terms of metrics such as AUC (area under the ROC curve) and accuracy.

키워드

참고문헌

  1. R. Al-Hmouz, W. Pedrycz, & A. Balamash, "Description and prediction of time series: A general framework of granular computing". Expert Systems with Applications, Vol. 42, pp. 4830-4839, 2015 https://doi.org/10.1016/j.eswa.2015.01.060
  2. S. Barak, & M. Modarres, "Developing an approach to evaluate stocks by forecasting effective features with data mining methods", Expert Systems with Applications, Vol. 42, pp. 1325-1339, 2015 https://doi.org/10.1016/j.eswa.2014.09.026
  3. A. Booth, E. Gerding, & F. McGroarty, "Automated trading with performance weighted random forests and seasonality", Expert Systems with Applications, Vol. 41, pp. 3651-3661, 2014 https://doi.org/10.1016/j.eswa.2013.12.009
  4. P. N. Rodriguez, & A. Rodriguez, "Predicting stock market indices movements", WIT Transactions on Modelling and Simulation, Vol.38, 2004.
  5. M. Kumar, & M. Thenmozhi, "Forecasting Stock index movement: A comparison of support vector machines and random forest". SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, January 24, 2006.
  6. M. Ballings, Dirk Van den Poel, Nathalie Hespeels, Ruben Gryp. "Evaluating multiple classifiers for stock price direction prediction", Expert Systems with Applications Vol. 42 pp. 7046-7056, 2015 https://doi.org/10.1016/j.eswa.2015.05.013
  7. J. Patel, S. Shah, P. Thakkar, K. Kotecha, "Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques", Expert Systems with Applications, Vol. 42, pp. 259-268, 2015 https://doi.org/10.1016/j.eswa.2014.07.040
  8. L. P. Ni, Z. W. Ni, & Y. Z. Gao, "Stock trend prediction based on fractal feature selection and support vector machine", Expert Systems with Applications, Vol. 38(5), pp. 5569-5576, 2011 https://doi.org/10.1016/j.eswa.2010.10.079
  9. B. G. Malkiel, & E. F. Fama, "Efficient capital markets: A review of theory and empirical work", The Journal of Finance, Vol. 25(2), pp. 383-417, 1970 https://doi.org/10.1111/j.1540-6261.1970.tb00518.x
  10. Y. Kara, M. A. Boyacioglu, & O. K. Baykan, "Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange", Expert systems with Applications, Vol. 38(5), pp. 5311-5319, 2011 https://doi.org/10.1016/j.eswa.2010.10.027
  11. Y. Shynkevich, T. M. McGinnity, S. A. Coleman, & A. Belatreche, "Forecasting movements of health-care stock prices based on different categories of news articles using multiple kernel learning", Decision Support Systems, Vol. 85, pp. 74-83, 2016. https://doi.org/10.1016/j.dss.2016.03.001
  12. T. Hellstrom, K. Holmstromm, "Predictable Patterns in Stock Returns". Technical Report Series IMa-TOM-1997-09, (August 9, 1998)
  13. Z. Li, W. Xu, L. Zhang, & R. Y. Lau, "An ontology-based Web mining method for unemployment rate prediction". Decision Support Systems, Vol. 66, pp. 114-122, 2014. https://doi.org/10.1016/j.dss.2014.06.007
  14. R. Kohavi, & G. H. John, "Wrappers for feature subset selection". Artificial Intelligence, Vol. 97 , pp. 273-324, 1997 https://doi.org/10.1016/S0004-3702(97)00043-X
  15. D. L. Olson, D. Delen, & Y. Meng, "Comparative analysis of data mining methods for bankruptcy prediction", Decision Support Systems, Vol. 52, No. 2, pp. 464-473, 2012. https://doi.org/10.1016/j.dss.2011.10.007
  16. E. C. Bae, & K. C. Lee, "Predicting Stock Liquidity by Using Ensemble Data Mining Methods", Journal of The Korea Society of computer and Information, Vol. 21, No. 6, pp. 9-19, 2016. https://doi.org/10.9708/JKSCI.2016.21.6.009
  17. R. Quinlan, "C4.5: Programs for machine learning". San Mateo: Morgan Kaufmann Publishers, 1993
  18. A. Dag, A. Oztekin, A. Yucel, S. Bulur, F. M. Megahed, "Predicting heart transplantation outcomes through data analytics", Decision Support Systems Vol. 94 pp. 42-52, 2017 https://doi.org/10.1016/j.dss.2016.10.005
  19. F. Provost, T. Fawcett, & R. Kohavi, "The case against accuracy estimation for comparing induction algorithms", In Proceedings of the fifteenth international conference on machine learning (pp. 45-453), Morgan Kaufmann, 1997
  20. S. Arlot, & A. Celisse, "A survey of cross-validation procedures for model selection", Statistics Surveys, Vol. 4, pp. 40-79, 2010 https://doi.org/10.1214/09-SS054