Browse > Article
http://dx.doi.org/10.5351/KJAS.2016.29.5.961

Hierarchically penalized support vector machine for the classication of imbalanced data with grouped variables  

Kim, Eunkyung (Research Center, Korea Credit Bureau)
Jhun, Myoungshic (Department of Statistics, Korea University)
Bang, Sungwan (Department of Mathematics, Korea Military Academy)
Publication Information
The Korean Journal of Applied Statistics / v.29, no.5, 2016 , pp. 961-975 More about this Journal
Abstract
The hierarchically penalized support vector machine (H-SVM) has been developed to perform simultaneous classification and input variable selection when input variables are naturally grouped or generated by factors. However, the H-SVM may suffer from estimation inefficiency because it applies the same amount of shrinkage to each variable without assessing its relative importance. In addition, when analyzing imbalanced data with uneven class sizes, the classification accuracy of the H-SVM may drop significantly in predicting minority class because its classifiers are undesirably biased toward the majority class. To remedy such problems, we propose the weighted adaptive H-SVM (WAH-SVM) method, which uses a adaptive tuning parameters to improve the performance of variable selection and the weights to differentiate the misclassification of data points between classes. Numerical results are presented to demonstrate the competitive performance of the proposed WAH-SVM over existing SVM methods.
Keywords
adaptive tuning parameter; hierarchical penalization; imbalanced data; support vector machine; variable selection;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Breiman, L. (1995). Better subset regression using the nonnegative garrote, Technometrics, 37, 373-384.   DOI
2 Chawla, N., Bowyer, K., Hall, L., and Kegelmeyer, W. (2002). SMOTE: Synthetic minority over-sampling technique, Journal of Articial Intelligence Research, 16, 321-357.   DOI
3 Cortes, C. and Vapnik, V. (1995). Support vector networks, Machine Learning, 20, 273-297.
4 Domingos, P. (1999). Metacost: a general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 155-164.
5 Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its Oracle properties, Journal of American Statistical Association, 96, 1348-1360.   DOI
6 Friberg, H. A. (2013). Users Guide to the R-to-MOSEK Interface. URL http://rmosek.r-forge.r-project.org.
7 Hwang W., Zhang H., and Ghosal, S. (2009). FIRST: Combining forward iterative selection and shrinkage in high dimensional sparse linear regression, Statistics and Its Interface, 2, 341-348.   DOI
8 Japkowicz, N. (2000). The Class imbalance problem; Significance and Strategies. In Proceedings of the 2000 International Conference on Articial Intelligence : Special Track on Inductive Learning, 1, 111-117
9 Kim, E., Jhun, M., and Bang, S. (2015). Weighted $L_1$-norm support vector machine for classification of highly imbalanced data, The Korea Journal of Applied Statistics, 28, 9-22.   DOI
10 Kotsiantis, S., Kanellopoulos, D., and Pintelas, P. (2006). Handling imbalanced datasets: a review, GESTS International Transactions on Computer Science and Engineering, 30, 25-36.
11 Kubat, M. and Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning, 179-186.
12 Lin, Y., Lee, Y., and Wahba, G. (2002). Support vector machines for classification in nonstandard situations, Machine Learning, 46, 191-202.   DOI
13 R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
14 Tang, Y., Zhang, Y., Chawla, N., and Krasser, S. (2009). SVMs modeling for highly imbalanced classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B, 39, 281-288.   DOI
15 Turlach, B. and Weingessel, A. (2013). quadprog: Functions to solve quadratic programming problems. R package version 1.5-5. http://CRAN.R-project.org/package=quadprog.
16 Vapnik, V. N. (1998). Statistical Learning Theory, Wiley, New York.
17 Veropoulos, K., Campbell, C. and Cristianini, N. (1999). Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on AI, 55-60.
18 Wang, S., Nan, B., Zhou, N., and Zhu, J. (2009). Hierarchically penalized Cox regression with grouped variables, Biometrika, 96, 307-322.   DOI
19 Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society, Series B, 68, 49-67.   DOI
20 Zhou, N. and Zhu, J. (2010). Group variable selection via a hierarchical lasso and its oracle property, Statistics and Its Interface, 3, 557-574.   DOI
21 Zhu, J., Rosset, S., Hastiem T., and Tibshirani, R. (2003). 1-norm support vector machine, Neural Information Proceeding Systems, 16, 49-56.
22 Zou, H. (2006). The adaptive lasso and its oracle properties, Journal of the Royal Statistical Society, Series B, 101, 1418-1429.
23 Zou, H. (2007). An improved 1-norm SVM for simultaneous classification and variable selection. In Proceedings of the 11th International Conference on Articial Intelligence and Statistics.
24 Zou, H. and Yuan, M. (2008). The $F_{\infty}$-norm support vector machine, Statistica Sinica, 18, 379-398.
25 Akbani, R., Kwek, S., and Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In Proceedings of European Conference of Machine Learning, 3201, 39-50.
26 Bang, S. and Jhun, M. (2012). On the use of adaptive weights for the $F_{\infty}$-norm support vector machine, The Korean Journal of Applied Statistics, 25, 829-835.   DOI
27 Bang, S., Kang, J., Jhun, M., and Kim, E. (2016). Hierarchically penalized support vector machine with grouped variables, International Journal of Machine Learning and Cybernetics, DOI:10.1007/s13042-016-0494-2.   DOI
28 Berkelaar, M. and others (2014). lpSolve: Interface to Lp solve v. 5.5 to solve linear/integer programs. R package version 5.6.10. http://CRAN.R-project.org/package=lpSolve.