Browse > Article

A Clustering-based Semi-Supervised Learning through Initial Prediction of Unlabeled Data  

Kim, Eung-Ku (한국생산성본부 컨설팅본부 CS경영센터)
Jun, Chi-Hyuck (포항공과대학교 산업경영공학과)
Publication Information
Abstract
Semi-supervised learning uses a small amount of labeled data to predict labels of unlabeled data as well as to improve clustering performance, whereas unsupervised learning analyzes only unlabeled data for clustering purpose. We propose a new clustering-based semi-supervised learning method by reflecting the initial predicted labels of unlabeled data on the objective function. The initial prediction should be done in terms of a discrete probability distribution through a classification method using labeled data. As a result, clusters are formed and labels of unlabeled data are predicted according to the Information of labeled data in the same cluster. We evaluate and compare the performance of the proposed method in terms of classification errors through numerical experiments with blinded labeled data.
Keywords
Clustering; Labeled Data; Semi-supervised Learning; Unlabeled Data;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Bar-Hillel, A., T. hertz, N. Shental, and D. Weinshall, Learning distance functions using equivalence relations. Proceedings of 20th International Conference on Machine Learning, Washington, USA, 2003, pp.11-18.
2 Bilenko, M., S. Basu, and R. Mooney, Integrating constraints and metric learning in semisupervised clustering. Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004, pp.81-88.
3 Tan, P.N., M. Steinbach, and V.Kumar, Introduction to Data Mining, Pearson Education, Boston, 2006.
4 Dempster, A.P., N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, Vol.39(1977), pp.1-38.
5 Demiriz, A., K. Bennett, and M. Embrechts, Semi-Supervised clustering using genetic algorithms. Intelligent Engineering Systems, Vol.9(1999), pp.809-814.
6 Xing, E.P., A.Y. Ng, M.I. Jordan, and S. Russell, Distance metric learning, with application to clustering with side information. Advances in Neural Information Processing Systems, Vol. 15(2003), pp.505-512.
7 Bouchachia, A. and W. pedrycz, Data clustering with partial supervision. Data Mining and Knowledge Discovery, Vol.12, No.1(2006), pp. 47-78.   DOI
8 Chapelle, O. and A. Zien, Semi-supervised classification by low density separation, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 2005, pp. 57-64.
9 Klein, D., S.D. Kamvar, and C. Manning, From instance-level constraints to space-level constraints : Making the most of prior knowledge in data clustering. Proceedings of the 19th International Conference on Machine Learning, 2002, pp.307-314.
10 Wagstaff, K., C. Cardie, S. Rogers, and S. Schroedl, Constrained K-means clustering with background knowledge. Proceedings of the 18th International Conference on Machine Learning, Massachusetts, USA, 2001, pp.577-584.
11 Nigam, K., A. McCallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using EM, Machine Learning, Vol.39(2000), pp.103-134.   DOI
12 Zhu, X.Semi-supervised learning literature survey, Computer Sciences TR 1530, University of Wisconsin-Madison. http://www.cs.wisc. edu/-jerryzhu/pub/s sl_survey.pdf, 2007.
13 Cozman, F., I. Cohen, and M. Cirelo, Semi- Supervised learning of mixture models. Proceedings of the 20th International Conference on Machine Learning, 2003, pp.99-106.
14 Basu, S., A. Banerjee, and R. Mooney, Semisupervised clustering by seeding. Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, 2002, pp. 19-26.
15 UCI repository : http://www.ics.uci.edu/-mlearn/MLRepository .html.
16 Lee, D. and J. Lee, Equilibrium-based support vector machine for semi-supervised classification, IEEE Trans. on Neural Networks, Vol.18, No.2(2007), pp.578-583.   DOI   ScienceOn