SUPPORT VECTOR MACHINE USING K-MEANS CLUSTERING

  • Lee, S.J. (Department of Statistics, Seoul National University) ;
  • Park, C. (Institute of Statistics, Korea University) ;
  • Jhun, M. (Department of Statistics, Korea University) ;
  • Koo, J.Y. (Department of Statistics, Korea University)
  • Published : 2007.03.31

Abstract

The support vector machine has been successful in many applications because of its flexibility and high accuracy. However, when a training data set is large or imbalanced, the support vector machine may suffer from significant computational problem or loss of accuracy in predicting minority classes. We propose a modified version of the support vector machine using the K-means clustering that exploits the information in class labels during the clustering process. For large data sets, our method can save the computation time by reducing the number of data points without significant loss of accuracy. Moreover, our method can deal with imbalanced data sets effectively by alleviating the influence of dominant class.

Keywords

References

  1. AKBANI, R., KWEK, S. AND JAPKOWICZ, N. (2004). 'Applying support vector machines to imbalanced datasets', Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, 39-50
  2. CORTES, C. AND VAPNIK, V. (1995). 'Support-vector networks', Machine Learning, 20, 273-297
  3. CRISTIANINI, N. AND SHAWE-TAYLOR, J. (2000). An Introduction to Support Vector Machines, Cambridge University Press
  4. JAPKOWICZ, N. (2000). 'Learning from imbalanced data sets: a comparison of various strategies', AAAI Workshop on Learning from Imbalanced Data Sets, Menlo Park; CA, AAAI Press
  5. MACQUEEN, J. B. (1967). 'Some methods for classification and analysis of multivariate observations', Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, 281-297
  6. SHIN, H. J. AND CHO, S. (2003). 'Fast pattern selection for support vector classifiers', Proceedings of 7th Pacific-Asis Conference on Knowledge Discovery and Data Mining, Seoul, Korea, 376-387
  7. VAPNIK, V. N. (1998). Statistical Learning Theory, Wiley-Interscience, New York
  8. WANG, J., WU, X. AND ZHANG, C. (2005). 'Support vector machines based on K-means clustering for real-time business intelligence systems', International Journal of Business Intelligence and Data Mining, 1, 54-64 https://doi.org/10.1504/IJBIDM.2005.007318