DOI QR코드

DOI QR Code

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection

입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구

  • 이종식 (한화자산운용(주)) ;
  • 안현철 (국민대학교 비즈니스IT전문대학원)
  • Received : 2017.10.31
  • Accepted : 2017.12.02
  • Published : 2017.12.31

Abstract

There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.

오래 전부터 학계에서는 정확한 주식 시장의 예측에 대한 많은 연구가 진행되어 왔고 현재에도 다양한 기법을 응용한 예측모형들이 연구되고 있다. 특히 최근에는 딥러닝(Deep-Learning)을 포함한 다양한 기계학습기법(Machine Learning Methods)을 이용해 주가지수를 예측하려는 많은 시도들이 진행되고 있다. 전통적인 주식투자거래의 분석기법으로는 기본적 분석과 기술적 분석방법이 사용되지만 보다 단기적인 거래예측이나 통계학적, 수리적 기법을 응용하기에는 기술적 분석 방법이 보다 유용한 측면이 있다. 이러한 기술적 지표들을 이용하여 진행된 대부분의 연구는 미래시장의 (보통은 다음 거래일) 주가 등락을 이진분류-상승 또는 하락-하여 주가를 예측하는 모형을 연구한 것이다. 하지만 이러한 이진분류로는 추세를 예측하여 매매시그널을 파악하거나, 포트폴리오 리밸런싱(Portfolio Rebalancing)의 신호로 삼기에는 적합치 않은 측면이 많은 것 또한 사실이다. 이에 본 연구에서는 기존의 주가지수 예측방법인 이진 분류 (binary classification) 방법에서 주가지수 추세를 (상승추세, 박스권, 하락추세) 다분류 (multiple classification) 체계로 확장하여 주가지수 추세를 예측하고자 한다. 이러한 다 분류 문제 해결을 위해 기존에 사용하던 통계적 방법인 다항로지스틱 회귀분석(Multinomial Logistic Regression Analysis, MLOGIT)이나 다중판별분석(Multiple Discriminant Analysis, MDA) 또는 인공신경망(Artificial Neural Networks, ANN)과 같은 기법보다는 예측성과의 우수성이 입증된 다분류 Support Vector Machines(Multiclass SVM, MSVM)을 사용하고, 이 모델의 성능을 향상시키기 위한 래퍼(wrapper)로서 유전자 알고리즘(Genetic Algorithm)을 이용한 최적화 모델을 제안한다. 특히 GA-MSVM으로 명명된 본 연구의 제안 모형에서는 MSVM의 커널함수 매개변수, 그리고 최적의 입력변수 선택(feature selection) 뿐만이 아니라 학습사례 선택(instance selection)까지 최적화하여 모델의 성능을 극대화 하도록 설계하였다. 제안 모형의 성능을 검증하기 위해 국내주식시장의 실제 데이터를 적용해본 결과 ANN이나 CBR, MLOGIT, MDA와 같은 기존 데이터마이닝 기법들이나 인공지능 알고리즘은 물론 현재까지 가장 우수한 예측 성과를 나타내는 것으로 알려져 있던 전통적인 다분류 SVM 보다도 제안 모형이 보다 우수한 예측성과를 보임을 확인할 수 있었다. 특히 주가지수 추세 예측에 있어서 학습사례의 선택이 매우 중요한 역할을 하는 것으로 확인 되었으며, 모델의 성능의 개선효과에 다른 요인보다 중요한 요소임을 확인할 수 있었다.

Keywords

References

  1. Ahn, H., Kim, K.-j., and Han, I., "Intelligent Credit Rating Model for Korean Companies using Multiclass Support Vector Machines," Korean Management Review, Vol. 35, No. 5 (2006), pp. 1479-1496.
  2. Ahn, H., and Kim, K.-j. "Corporate Bond Rating Using Various Multiclass Support Vector Machines", Asia Pacific Journal of Information Systems, Vol.19, No.2(2009), pp. 157-178.
  3. Ahn, H., Lee, K. and Kim, K.-j. "Global Optimization of Support Vector Machines Using Genetic Algorithms for Bankruptcy Prediction," Lecture Notes in Computer Science, Vol. 4234 (2006), pp. 420-429.
  4. Babu, T.R., Murty, M.N., "Comparison of genetic algorithm based prototype selection schemes. Pattern Recognition," Vol.34, No.2 (2001), pp. 523-525. https://doi.org/10.1016/S0031-3203(00)00094-7
  5. Chatterjee, S. "Vision-based rock-type classification of limestone using multi-class support vector machine," Neurocomputing, Vol.39, No.1 (2013), pp. 14-27.
  6. Chen, L.H., and Hsiao, H.D. "Feature selection to diagnose a business crisis by using a real GA-based support vector machine: An empirical study," Expert Systems with Applications, Vol.35, No.3 (2008), pp. 1145-1155. https://doi.org/10.1016/j.eswa.2007.08.010
  7. Crammer, K. and Singer, Y. "On the Learnability and Design of Output Codes for Multiclass Problems," Proceedings of the 13th Annual Conference on Computational Learning Theory, Palo Alto, California, USA (2000), pp. 35-46.
  8. Dash, R., and Dash, P. K. "A hybrid stock trading framework integrating technical analysis with machine learning techniques," The Journal of Finance and Data Science, Vol.2 (2016) 42-57. https://doi.org/10.1016/j.jfds.2016.03.002
  9. Hong, T., and Park, J. "Feature Selection for Multi-Class Support Vector Machines Using an Impurity Measure of Classification Trees:An Application to the Credit Rating of S&P 500 Companies," Asia Pacific Journal of Information Systems, Vol. 21, No. 2(2011), pp. 43-58.
  10. Howley, T., and Madden, M.G. "The Genetic Kernel Support Vector Machine: Description and Evaluation," Artificial Intelligence Review, Vol. 24, Nos. 3-4 (2005), pp. 379-395. https://doi.org/10.1007/s10462-005-9009-3
  11. Hsu, C.W., and Lin, C.J. "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Transactions on Neural Networks, Vol. 13, No. 2 (2002), pp. 415-425. https://doi.org/10.1109/72.991427
  12. Jack, L.B. and Nandi, A.K., "Fault Detection Using Support Vector Machines and Artificial Neural Networks, Augmented by Genetic Algorithms," Mechanical Systems and Signal Processing, Vol.16 (2002), pp. 373-390. https://doi.org/10.1006/mssp.2001.1454
  13. Kim, K.-j., "Artificial neural networks with evolutionary instance selection for financial forecasting," Expert Systems with Applications, Vol.30, No.3 (2006), pp. 519-526. https://doi.org/10.1016/j.eswa.2005.10.007
  14. Kim, K.-j. and Ahn, H. "Optimization of Support Vector Machines for Financial Forecasting," Journal of Intelligence and Information Systems, Vol.17, No.4 (2011), pp. 223-236.
  15. Kim, S.W., "Comparison of Predictability of Stock Price Volatility: Focusing on Price Range and VKOSPI," Journal of Korean Data Analysis Society, Vol.13, No.2 (2011), pp. 915-925.
  16. Kim, S. W. and H. C. Ahn, "Development of an Intelligent Trading System using Support Vector Machines and Genetic Algorithms," Journal of Intelligence and Information Systems, Vol.16, No.1(2010), 71-92.
  17. Lee, H., "A Combination Model of Multiple Artificial Intelligence Techniques Based on Genetic Algorithms for the Prediction of Korean Stock Price Index(KOSPI)," Entrue Journal of Information Technology, Vol.7, No.2 (2008), pp. 33-43.
  18. Lee, K., and Byun, H., "A New Face Authentication System for Memory-Constrained Devices," IEEE Transactions on Consumer Electronics, Vol.49, No.4 (2003), pp. 1214-1222. https://doi.org/10.1109/TCE.2003.1261219
  19. Li, L., Tang, H., Wu, Z., Gong, J., Gruidl, M., Zou, J., Tockman, M., Clark, R.A., "Data mining techniques for cancer detection using serum proteomic profiling," Artificial Intelligence in Medicine, Vol.32, No.2 (2004), pp. 71-83. https://doi.org/10.1016/j.artmed.2004.03.006
  20. Lorena, A.C., and de Carvalho, A.C.P.L,F. "Comparing Techniques for Multiclass Classification Using Binary SVM Predictors," Lecture Notes in Artificial Intelligence, Vol.2972 (2004), pp. 272-281.
  21. Lorena, A.C., and de Carvalho, A.C.P.L,F. "Evolutionary tuning of SVM parameter values in multiclass problems," Neurocomputing, Vol.71, Nos.16-18 (2008), pp. 3326-3334. https://doi.org/10.1016/j.neucom.2008.01.031
  22. Pai, P.-F., and Hong, W.-C. "Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms," Electric Power Systems Research, Vol.74, No.3 (2005), pp. 417-425. https://doi.org/10.1016/j.epsr.2005.01.006
  23. Ra, Y.S., H. S. Choi, and S.W. Kim, " VKOSPI Forecasting and Option Trading Application Using SVM," Journal of Intelligence and Information Systems, Vol.22, No.4(2016), 177-192. https://doi.org/10.13088/jiis.2016.22.4.177
  24. Reeves, C.R., Taylor, S.J., "Selection of training sets for neural networks by a genetic algorithm. In Eiden," A.E., Back, T., Schoenauer M., Schwefel, H.-P., "Parallel problem-solving from nature-PPSN V.," Springer. Berlin (1998)
  25. Shieh, M.-D., and Yang, C.-C. "Multiclass SVM-RFE for product from feature selection," Expert Systems with Applications, Vol.35, Nos.1-2 (2008), pp. 531-541. https://doi.org/10.1016/j.eswa.2007.07.043
  26. Shin, K.S., and Han. I. "Case-based reasoning supported by genetic algorithms for corporate bond rating," Expert Systems with Applications, Vol.16, No.2 (1999), pp. 85-95. https://doi.org/10.1016/S0957-4174(98)00063-3
  27. Sun, Z., Bebis, G., Miller, R., "Object detection using feature subset selection," Pattern Recognition, Vol.37, No.11 (2004), pp. 2165-2176. https://doi.org/10.1016/j.patcog.2004.03.013
  28. Thi N., Lee G.-B., Peter W., and Jim P., "GA-SVM Based Framework for Time Series Forecasting," Proceedings of the Fifth International Conference on Natural Computation (2009).
  29. Vapnik, V. The Nature of Statistical Learning Theory. New York, NY: Springer-Verlag, 1995.
  30. Wu, C.H., Tzeng, G.H., Goo, Y.J., and Fang, W.C. "A real-valued genetic algorithm to optimize the parameters of support vector machine for prediction bankruptcy," Expert Systems with Applications, Vol. 32, No. 2 (2007), pp. 397-408. https://doi.org/10.1016/j.eswa.2005.12.008
  31. Zhao, X.-M., Cheung, Y.-M., Huang, D.-S., "A novel approach to extracting features from motif content and protein composition for protein sequence classification," Neural Networks, Vol.18, No.8 (2005), pp. 1019-1028.