[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5351/KJAS.2016.29.5.961

Hierarchically penalized support vector machine for the classication of imbalanced data with grouped variables

Kim, Eunkyung (Research Center, Korea Credit Bureau)
Jhun, Myoungshic (Department of Statistics, Korea University)
Bang, Sungwan (Department of Mathematics, Korea Military Academy)

Publication Information

The Korean Journal of Applied Statistics / v.29, no.5, 2016 , pp. 961-975 More about this Journal

Abstract

The hierarchically penalized support vector machine (H-SVM) has been developed to perform simultaneous classification and input variable selection when input variables are naturally grouped or generated by factors. However, the H-SVM may suffer from estimation inefficiency because it applies the same amount of shrinkage to each variable without assessing its relative importance. In addition, when analyzing imbalanced data with uneven class sizes, the classification accuracy of the H-SVM may drop significantly in predicting minority class because its classifiers are undesirably biased toward the majority class. To remedy such problems, we propose the weighted adaptive H-SVM (WAH-SVM) method, which uses a adaptive tuning parameters to improve the performance of variable selection and the weights to differentiate the misclassification of data points between classes. Numerical results are presented to demonstrate the competitive performance of the proposed WAH-SVM over existing SVM methods.

Keywords

adaptive tuning parameter; hierarchical penalization; imbalanced data; support vector machine; variable selection;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	Breiman, L. (1995). Better subset regression using the nonnegative garrote, Technometrics, 37, 373-384. DOI
2	Chawla, N., Bowyer, K., Hall, L., and Kegelmeyer, W. (2002). SMOTE: Synthetic minority over-sampling technique, Journal of Articial Intelligence Research, 16, 321-357. DOI
3	Cortes, C. and Vapnik, V. (1995). Support vector networks, Machine Learning, 20, 273-297.
4	Domingos, P. (1999). Metacost: a general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 155-164.
5	Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its Oracle properties, Journal of American Statistical Association, 96, 1348-1360. DOI
6	Friberg, H. A. (2013). Users Guide to the R-to-MOSEK Interface. URL http://rmosek.r-forge.r-project.org.
7	Hwang W., Zhang H., and Ghosal, S. (2009). FIRST: Combining forward iterative selection and shrinkage in high dimensional sparse linear regression, Statistics and Its Interface, 2, 341-348. DOI
8	Japkowicz, N. (2000). The Class imbalance problem; Significance and Strategies. In Proceedings of the 2000 International Conference on Articial Intelligence : Special Track on Inductive Learning, 1, 111-117
9	Kim, E., Jhun, M., and Bang, S. (2015). Weighted $L_1$ -norm support vector machine for classification of highly imbalanced data, The Korea Journal of Applied Statistics, 28, 9-22. DOI
10	Kotsiantis, S., Kanellopoulos, D., and Pintelas, P. (2006). Handling imbalanced datasets: a review, GESTS International Transactions on Computer Science and Engineering, 30, 25-36.
11	Kubat, M. and Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning, 179-186.
12	Lin, Y., Lee, Y., and Wahba, G. (2002). Support vector machines for classification in nonstandard situations, Machine Learning, 46, 191-202. DOI
13	R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
14	Tang, Y., Zhang, Y., Chawla, N., and Krasser, S. (2009). SVMs modeling for highly imbalanced classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B, 39, 281-288. DOI
15	Turlach, B. and Weingessel, A. (2013). quadprog: Functions to solve quadratic programming problems. R package version 1.5-5. http://CRAN.R-project.org/package=quadprog.
16	Vapnik, V. N. (1998). Statistical Learning Theory, Wiley, New York.
17	Veropoulos, K., Campbell, C. and Cristianini, N. (1999). Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on AI, 55-60.
18	Wang, S., Nan, B., Zhou, N., and Zhu, J. (2009). Hierarchically penalized Cox regression with grouped variables, Biometrika, 96, 307-322. DOI
19	Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society, Series B, 68, 49-67. DOI
20	Zhou, N. and Zhu, J. (2010). Group variable selection via a hierarchical lasso and its oracle property, Statistics and Its Interface, 3, 557-574. DOI
21	Zhu, J., Rosset, S., Hastiem T., and Tibshirani, R. (2003). 1-norm support vector machine, Neural Information Proceeding Systems, 16, 49-56.
22	Zou, H. (2006). The adaptive lasso and its oracle properties, Journal of the Royal Statistical Society, Series B, 101, 1418-1429.
23	Zou, H. (2007). An improved 1-norm SVM for simultaneous classification and variable selection. In Proceedings of the 11th International Conference on Articial Intelligence and Statistics.
24	Zou, H. and Yuan, M. (2008). The $F_{\infty}$ -norm support vector machine, Statistica Sinica, 18, 379-398.
25	Akbani, R., Kwek, S., and Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In Proceedings of European Conference of Machine Learning, 3201, 39-50.
26	Bang, S. and Jhun, M. (2012). On the use of adaptive weights for the $F_{\infty}$ -norm support vector machine, The Korean Journal of Applied Statistics, 25, 829-835. DOI
27	Bang, S., Kang, J., Jhun, M., and Kim, E. (2016). Hierarchically penalized support vector machine with grouped variables, International Journal of Machine Learning and Cybernetics, DOI:10.1007/s13042-016-0494-2. DOI
28	Berkelaar, M. and others (2014). lpSolve: Interface to Lp solve v. 5.5 to solve linear/integer programs. R package version 5.6.10. http://CRAN.R-project.org/package=lpSolve.

KSCI

Hierarchically penalized support vector machine for the classication of imbalanced data with grouped variables 그룹변수를 포함하는 불균형 자료의 분류분석을 위한 서포트 벡터 머신

Hierarchically penalized support vector machine for the classication of imbalanced data with grouped variables