[KSCI] Korea Science Citation Index Service

SUPPORT VECTOR MACHINE USING K-MEANS CLUSTERING

Lee, S.J. (Department of Statistics, Seoul National University)
Park, C. (Institute of Statistics, Korea University)
Jhun, M. (Department of Statistics, Korea University)
Koo, J.Y. (Department of Statistics, Korea University)

Publication Information

Journal of the Korean Statistical Society / v.36, no.1, 2007 , pp. 175-182 More about this Journal

Abstract

The support vector machine has been successful in many applications because of its flexibility and high accuracy. However, when a training data set is large or imbalanced, the support vector machine may suffer from significant computational problem or loss of accuracy in predicting minority classes. We propose a modified version of the support vector machine using the K-means clustering that exploits the information in class labels during the clustering process. For large data sets, our method can save the computation time by reducing the number of data points without significant loss of accuracy. Moreover, our method can deal with imbalanced data sets effectively by alleviating the influence of dominant class.

Keywords

Class imbalance; K-means clustering; support vector machine;

Citations & Related Records

Times Cited By Web Of Science : 1 (Related Records In Web of Science)

Reference

1	SHIN, H. J. AND CHO, S. (2003). 'Fast pattern selection for support vector classifiers', Proceedings of 7th Pacific-Asis Conference on Knowledge Discovery and Data Mining, Seoul, Korea, 376-387
2	CRISTIANINI, N. AND SHAWE-TAYLOR, J. (2000). An Introduction to Support Vector Machines, Cambridge University Press
3	VAPNIK, V. N. (1998). Statistical Learning Theory, Wiley-Interscience, New York
4	MACQUEEN, J. B. (1967). 'Some methods for classification and analysis of multivariate observations', Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, 281-297
5	JAPKOWICZ, N. (2000). 'Learning from imbalanced data sets: a comparison of various strategies', AAAI Workshop on Learning from Imbalanced Data Sets, Menlo Park; CA, AAAI Press
6	AKBANI, R., KWEK, S. AND JAPKOWICZ, N. (2004). 'Applying support vector machines to imbalanced datasets', Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, 39-50
7	WANG, J., WU, X. AND ZHANG, C. (2005). 'Support vector machines based on K-means clustering for real-time business intelligence systems', International Journal of Business Intelligence and Data Mining, 1, 54-64 DOI
8	CORTES, C. AND VAPNIK, V. (1995). 'Support-vector networks', Machine Learning, 20, 273-297