Browse > Article
http://dx.doi.org/10.5351/KJAS.2015.28.1.009

Weighted L1-Norm Support Vector Machine for the Classification of Highly Imbalanced Data  

Kim, Eunkyung (Department of Statistics, Korea University)
Jhun, Myoungshic (Department of Statistics, Korea University)
Bang, Sungwan (Department of Mathematics, Korea Military Academy)
Publication Information
The Korean Journal of Applied Statistics / v.28, no.1, 2015 , pp. 9-21 More about this Journal
Abstract
The support vector machine has been successfully applied to various classification areas due to its flexibility and a high level of classification accuracy. However, when analyzing imbalanced data with uneven class sizes, the classification accuracy of SVM may drop significantly in predicting minority class because the SVM classifiers are undesirably biased toward the majority class. The weighted $L_2$-norm SVM was developed for the analysis of imbalanced data; however, it cannot identify irrelevant input variables due to the characteristics of the ridge penalty. Therefore, we propose the weighted $L_1$-norm SVM, which uses lasso penalty to select important input variables and weights to differentiate the misclassification of data points between classes. We demonstrate the satisfactory performance of the proposed method through simulation studies and a real data analysis.
Keywords
Imbalanced data; lasso; linear programming; ridge; support vector machine;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Garcia, V., Sanchez, J. S., Mollineda, R. A., Alejo, R. and Sotoca, J. M. (2007). The class imbalance problem in pattern classification and learning, In Proceedings of the 5th Spanish Workshop on Data Mining and Learning, 283-291.
2 Han, H., Wang, W. Y. and Mao, B. H. (2005). Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing, 3644, 878-887.
3 Hoerl, A. and Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12, 55-67.   DOI   ScienceOn
4 Japkowicz, N. (2000). The Class imbalance problem; Significance and Strategies, In Proceedings of the 2000 International Conference on Artificial Intelligence : Special Track on Inductive Learning, 1, 111-117
5 Kim, J. and Jeong, J. (2004). Classification of class-imbalanced data: Effect of over-sampling and undersampling of training data, The Korean Journal of Applied Statistics, 17, 445-457.   DOI   ScienceOn
6 Kubat M. and Matwin S. (1997). Adressing the curse of imbalanced training sets: One-sided selection, In Proceedings of the Fourteenth International Conference on Machine Learining, 179-186.
7 Lee, H. and Lee, S. (2014). A comparison of ensemble methods combining resampling techniques for class imbalanced data, The Korean Journal of Applied Statistics, 27, 357-371.   DOI   ScienceOn
8 Lin, Y., Lee, Y. and Wahba, G. (2002). Support vector machines for classification in nonstandard situations, Machine Learning, 46, 191-202.   DOI
9 Liu, Y., An, A. and Huang, X. (2006). Boosting prediction accuracy on imbalanced datasets with SVM ensembles, In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 3918, 107-118.
10 R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
11 Tang, Y., Zhang, Y., Chawla, N. and Krasser, S. (2009). SVMs modeling for highly imbalanced classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B, 39, 281-288.   DOI   ScienceOn
12 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B, 58, 267-288.
13 Turlach, B. and Weingessel, A. (2013). quadprog: Functions to solve quadratic programming problems. R package version 1.5-5. http://CRAN.R-project.org/package=quadprog.
14 Vapnik, V. N, (1998). Statistical Learning Theory, Wiley, New York.
15 Veropoulos, K., Campbell, C. and Cristianini, N. (1999). Controlling the sensitivity of support vector machines, In Proceedings of the International Joint Conference on AI, 55-60.
16 Wang, B. X. and Japkowicz, N. (2009). Boosting support vector machines for imbalanced data sets, Knowledge and Information Systems, 25, 1-20.
17 Wang, L., Zhu, J. and Zou, H. (2006). The doubly regularized support vector machine, Statistica Sinica, 16, 589-615.
18 Zhu, J., Rosset, S., Hastiem, T. and Tibshirani, R. (2003). 1-norm support vector machine, Neural Information Proceeding Systems, 16, 49-56.
19 Bang, S. and Jhun, M. (2014). Weighted support vector machine using k-means clustering, Communications in Statistics-Simulation and Computation, 43, 2307-2324.   DOI   ScienceOn
20 Akbani, R., Kwek, S. and Japkowicz,, N. (2004). Applying support vector machines to imbalanced datasets, In Proceedings of European Conference of Machine Learning, 3201, 39-50.
21 Barandela, R., Sanchez, J., Garcia, V. and Rangel, E. (2003). Strategies for learning in class imbalance problems, Pattern Recognition, 36, 849-851.   DOI   ScienceOn
22 Berkelaar, M. and others (2014). lpSolve: Interface to Lp solve v. 5.5 to solve linear/integer programs. R package version 5.6.10. http://CRAN.R-project.org/package=lpSolve.
23 Chawla, N., Bowyer, K., Hall, L. and Kegelmeyer, W. (2002). SMOTE: Synthetic minority over-sampling technique, Journal of Arti cial Intelligence Research, 16, 321-357.
24 Cohen, G., Hilario, M., Sax, H., Hugonnet, S. and Geissbuhler, A. (2006). Learning from imbalanced data in surveillance of nosocomial infection, Artificial Intelligence in Medicine, 37, 7-18.   DOI   ScienceOn
25 Cortes, C. and Vapnik, V. (1995). Support vector networks, Machine Learning, 20, 273-297.
26 Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets, International Journal of Emerging Technology and Advanced Engineering, 2, 42-47.