Browse > Article
http://dx.doi.org/10.13088/jiis.2017.23.4.147

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection  

Lee, Jong-sik (Hanwha Asset Management)
Ahn, Hyunchul (Graduate School of Business IT Kookmin University)
Publication Information
Journal of Intelligence and Information Systems / v.23, no.4, 2017 , pp. 147-168 More about this Journal
Abstract
There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.
Keywords
Multiclass SVM; Genetic Algorithm; Feature Selection; Instance Selection; Stock Market Index Trend Prediction;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Ahn, H., Lee, K. and Kim, K.-j. "Global Optimization of Support Vector Machines Using Genetic Algorithms for Bankruptcy Prediction," Lecture Notes in Computer Science, Vol. 4234 (2006), pp. 420-429.
2 Babu, T.R., Murty, M.N., "Comparison of genetic algorithm based prototype selection schemes. Pattern Recognition," Vol.34, No.2 (2001), pp. 523-525.   DOI
3 Chatterjee, S. "Vision-based rock-type classification of limestone using multi-class support vector machine," Neurocomputing, Vol.39, No.1 (2013), pp. 14-27.
4 Chen, L.H., and Hsiao, H.D. "Feature selection to diagnose a business crisis by using a real GA-based support vector machine: An empirical study," Expert Systems with Applications, Vol.35, No.3 (2008), pp. 1145-1155.   DOI
5 Crammer, K. and Singer, Y. "On the Learnability and Design of Output Codes for Multiclass Problems," Proceedings of the 13th Annual Conference on Computational Learning Theory, Palo Alto, California, USA (2000), pp. 35-46.
6 Dash, R., and Dash, P. K. "A hybrid stock trading framework integrating technical analysis with machine learning techniques," The Journal of Finance and Data Science, Vol.2 (2016) 42-57.   DOI
7 Hong, T., and Park, J. "Feature Selection for Multi-Class Support Vector Machines Using an Impurity Measure of Classification Trees:An Application to the Credit Rating of S&P 500 Companies," Asia Pacific Journal of Information Systems, Vol. 21, No. 2(2011), pp. 43-58.
8 Howley, T., and Madden, M.G. "The Genetic Kernel Support Vector Machine: Description and Evaluation," Artificial Intelligence Review, Vol. 24, Nos. 3-4 (2005), pp. 379-395.   DOI
9 Hsu, C.W., and Lin, C.J. "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Transactions on Neural Networks, Vol. 13, No. 2 (2002), pp. 415-425.   DOI
10 Jack, L.B. and Nandi, A.K., "Fault Detection Using Support Vector Machines and Artificial Neural Networks, Augmented by Genetic Algorithms," Mechanical Systems and Signal Processing, Vol.16 (2002), pp. 373-390.   DOI
11 Kim, K.-j., "Artificial neural networks with evolutionary instance selection for financial forecasting," Expert Systems with Applications, Vol.30, No.3 (2006), pp. 519-526.   DOI
12 Kim, K.-j. and Ahn, H. "Optimization of Support Vector Machines for Financial Forecasting," Journal of Intelligence and Information Systems, Vol.17, No.4 (2011), pp. 223-236.
13 Kim, S.W., "Comparison of Predictability of Stock Price Volatility: Focusing on Price Range and VKOSPI," Journal of Korean Data Analysis Society, Vol.13, No.2 (2011), pp. 915-925.
14 Kim, S. W. and H. C. Ahn, "Development of an Intelligent Trading System using Support Vector Machines and Genetic Algorithms," Journal of Intelligence and Information Systems, Vol.16, No.1(2010), 71-92.
15 Lee, H., "A Combination Model of Multiple Artificial Intelligence Techniques Based on Genetic Algorithms for the Prediction of Korean Stock Price Index(KOSPI)," Entrue Journal of Information Technology, Vol.7, No.2 (2008), pp. 33-43.
16 Lee, K., and Byun, H., "A New Face Authentication System for Memory-Constrained Devices," IEEE Transactions on Consumer Electronics, Vol.49, No.4 (2003), pp. 1214-1222.   DOI
17 Li, L., Tang, H., Wu, Z., Gong, J., Gruidl, M., Zou, J., Tockman, M., Clark, R.A., "Data mining techniques for cancer detection using serum proteomic profiling," Artificial Intelligence in Medicine, Vol.32, No.2 (2004), pp. 71-83.   DOI
18 Lorena, A.C., and de Carvalho, A.C.P.L,F. "Comparing Techniques for Multiclass Classification Using Binary SVM Predictors," Lecture Notes in Artificial Intelligence, Vol.2972 (2004), pp. 272-281.
19 Pai, P.-F., and Hong, W.-C. "Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms," Electric Power Systems Research, Vol.74, No.3 (2005), pp. 417-425.   DOI
20 Lorena, A.C., and de Carvalho, A.C.P.L,F. "Evolutionary tuning of SVM parameter values in multiclass problems," Neurocomputing, Vol.71, Nos.16-18 (2008), pp. 3326-3334.   DOI
21 Ra, Y.S., H. S. Choi, and S.W. Kim, " VKOSPI Forecasting and Option Trading Application Using SVM," Journal of Intelligence and Information Systems, Vol.22, No.4(2016), 177-192.   DOI
22 Reeves, C.R., Taylor, S.J., "Selection of training sets for neural networks by a genetic algorithm. In Eiden," A.E., Back, T., Schoenauer M., Schwefel, H.-P., "Parallel problem-solving from nature-PPSN V.," Springer. Berlin (1998)
23 Shieh, M.-D., and Yang, C.-C. "Multiclass SVM-RFE for product from feature selection," Expert Systems with Applications, Vol.35, Nos.1-2 (2008), pp. 531-541.   DOI
24 Shin, K.S., and Han. I. "Case-based reasoning supported by genetic algorithms for corporate bond rating," Expert Systems with Applications, Vol.16, No.2 (1999), pp. 85-95.   DOI
25 Sun, Z., Bebis, G., Miller, R., "Object detection using feature subset selection," Pattern Recognition, Vol.37, No.11 (2004), pp. 2165-2176.   DOI
26 Thi N., Lee G.-B., Peter W., and Jim P., "GA-SVM Based Framework for Time Series Forecasting," Proceedings of the Fifth International Conference on Natural Computation (2009).
27 Vapnik, V. The Nature of Statistical Learning Theory. New York, NY: Springer-Verlag, 1995.
28 Ahn, H., Kim, K.-j., and Han, I., "Intelligent Credit Rating Model for Korean Companies using Multiclass Support Vector Machines," Korean Management Review, Vol. 35, No. 5 (2006), pp. 1479-1496.
29 Zhao, X.-M., Cheung, Y.-M., Huang, D.-S., "A novel approach to extracting features from motif content and protein composition for protein sequence classification," Neural Networks, Vol.18, No.8 (2005), pp. 1019-1028.   DOI
30 Wu, C.H., Tzeng, G.H., Goo, Y.J., and Fang, W.C. "A real-valued genetic algorithm to optimize the parameters of support vector machine for prediction bankruptcy," Expert Systems with Applications, Vol. 32, No. 2 (2007), pp. 397-408.   DOI
31 Ahn, H., and Kim, K.-j. "Corporate Bond Rating Using Various Multiclass Support Vector Machines", Asia Pacific Journal of Information Systems, Vol.19, No.2(2009), pp. 157-178.