Browse > Article
http://dx.doi.org/10.13088/jiis.2011.17.4.241

Optimization of Support Vector Machines for Financial Forecasting  

Kim, Kyoung-Jae (Department of Management Information Systems, Dongguk University_Seoul)
Ahn, Hyun-Chul (School of Management Information Systems, Kookmin University)
Publication Information
Journal of Intelligence and Information Systems / v.17, no.4, 2011 , pp. 241-254 More about this Journal
Abstract
Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.
Keywords
Instance Selection; Support Vector Machines; Hybrid Model; Financial Forecasting; Data Mining;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Harnett, D. L., A. K. Soni, Statistical methods for business and economics, Addison-Wesley, MA, 1991
2 Wilson, D. L., "Asymptotic properties of nearest neighbor rules using edited data", IEEE Transactions on Systems, Man, and Cybernetics, Vol.2, No.3(1972), 408-421.
3 Wilson, D. R. and T. R. Martinez, "Reduction techniques for instance-based learning algorithms", Machine Learning, Vol.38(2000), 257-286.   DOI   ScienceOn
4 Hart, P. E., "The condensed nearest neighbor rule", IEEE Transactions on Information Theory, Vol.14(1968), 515-516.   DOI
5 Liu, H. and H. Motoda, "Feature transformation and subset selection", IEEE Intelligent Systems, Vol.13, No.2(1998), 26-28.
6 Kim, K., "Financial time series forecasting using support vector machines", Neurocomputing, Vol.55(2003), 307-319.   DOI   ScienceOn
7 Kim, K., "Artificial neural networks with evolutionary instance selection for financial forecasting", Expert Systems with Applications, Vol.30, No.3(2006), 519-526.   DOI   ScienceOn
8 Kuncheva, L. I., "'Change-glasses' approach in pattern recognition", Pattern Recognition Letters, Vol.14(1993), 619-623.   DOI   ScienceOn
9 McSherry, D., "Automating case selection in the construction of a case library", Knowledge Based Systems, Vol.13, No.2/3(2000), 133- 140.   DOI
10 Reeves, C. R. and D. R. Bush, Using genetic algorithms for training data selection in RBF networks, In Liu, H. and H. Motoda, Instance selection and construction for data mining, Kluwer Academic Publishers, Massachusetts, (2001), 339-356.
11 Reeves, C. R. and S. J. Taylor, Selection of training sets for neural networks by a genetic algorithm, In Eiden, A. E., T. Back, M. Schoenauer and H.-P. Schwefel, Parallel problem-solving from nature-PPSN V, Springer-Verlag, Berlin, 1998.
12 Ritter, G. L., H. B. Woodruff, S. R. Lowry, and T. L. Isenhour, "An algorithm for a selective nearest neighbor decision rule", IEEE Transactions on Information Theory, Vol.21, No.6(1975), 665-669.   DOI
13 Smyth, B., "Case-base maintenance", Proceedings of the 11th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, (1998), 507-516.
14 Tay, F. E. H. and L. Cao, "Application of support vector machines in financial time series forecasting", Omega, Vol.29(2001), 309-317.   DOI   ScienceOn
15 Tetko, I. V. and A. E. P. Villa, "Efficient partition of learning data sets for neural network training", Neural Networks, Vol.10, No.8 (1997), 1361-1374.   DOI   ScienceOn
16 Vapnik, V. N., The Nature of Statistical Learning Theory, Springer-Verlag, 1995
17 Vapnik, V. N., Statistical Learning Theory, Wiley, New York, 1998
18 Ahn, H. and K. Kim, "Using genetic algorithms to optimize k-nearest neighbors for data mining", Annals of Operations Research, Vol.163, No.1(2008), 5-18.   DOI   ScienceOn
19 안현철, 김경재, "다양한 다분류 SVM을 적용한 기업채권평가", Asia Pacific Journal of Information Systems, 19권 2호(2009), 157-178
20 안현철, 김경재, 한인구, "다분류 Support Vector Machine을 이용한 한국 기업의 지능형 기업 채권평가모형", 경영학연구, 35권 5호(2006), 1479-1496.
21 Chang, C.-C. and C.-J Lin, LIBSVM:a library for support vector machines, Software available at http://www.csie.ntu.edu.tw/~cjlin/ libsvm, 2001
22 Gates, G. W., "The reduced nearest neighbor rule", IEEE Transactions on Information Theory, Vol.18, No.3(1972), 431-433.   DOI