SUPPORT VECTOR MACHINE USING K-MEANS CLUSTERING

Lee, S.J.;Park, C.;Jhun, M.;Koo, J.Y.;

Journal of the Korean Statistical Society

Volume 36 Issue 1
/
Pages.175-182
/
2007
/
1226-3192(pISSN)
/
2005-2863(eISSN)

The Korean Statistical Society (한국통계학회)

SUPPORT VECTOR MACHINE USING K-MEANS CLUSTERING

Lee, S.J. (Department of Statistics, Seoul National University) ;
Park, C. (Institute of Statistics, Korea University) ;
Jhun, M. (Department of Statistics, Korea University) ;
Koo, J.Y. (Department of Statistics, Korea University)

Published : 2007.03.31

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The support vector machine has been successful in many applications because of its flexibility and high accuracy. However, when a training data set is large or imbalanced, the support vector machine may suffer from significant computational problem or loss of accuracy in predicting minority classes. We propose a modified version of the support vector machine using the K-means clustering that exploits the information in class labels during the clustering process. For large data sets, our method can save the computation time by reducing the number of data points without significant loss of accuracy. Moreover, our method can deal with imbalanced data sets effectively by alleviating the influence of dominant class.

Keywords

References

AKBANI, R., KWEK, S. AND JAPKOWICZ, N. (2004). 'Applying support vector machines to imbalanced datasets', Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, 39-50
CORTES, C. AND VAPNIK, V. (1995). 'Support-vector networks', Machine Learning, 20, 273-297
CRISTIANINI, N. AND SHAWE-TAYLOR, J. (2000). An Introduction to Support Vector Machines, Cambridge University Press
JAPKOWICZ, N. (2000). 'Learning from imbalanced data sets: a comparison of various strategies', AAAI Workshop on Learning from Imbalanced Data Sets, Menlo Park; CA, AAAI Press
MACQUEEN, J. B. (1967). 'Some methods for classification and analysis of multivariate observations', Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, 281-297
SHIN, H. J. AND CHO, S. (2003). 'Fast pattern selection for support vector classifiers', Proceedings of 7th Pacific-Asis Conference on Knowledge Discovery and Data Mining, Seoul, Korea, 376-387
VAPNIK, V. N. (1998). Statistical Learning Theory, Wiley-Interscience, New York
WANG, J., WU, X. AND ZHANG, C. (2005). 'Support vector machines based on K-means clustering for real-time business intelligence systems', International Journal of Business Intelligence and Data Mining, 1, 54-64 https://doi.org/10.1504/IJBIDM.2005.007318

Journal of the Korean Statistical Society

SUPPORT VECTOR MACHINE USING K-MEANS CLUSTERING

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)