A Clustering-based Semi-Supervised Learning through Initial Prediction of Unlabeled Data

Kim, Eung-Ku;Jun, Chi-Hyuck;

한국경영과학회지 (Journal of the Korean Operations Research and Management Science Society)

제33권3호
/
Pages.93-105
/
2008
/
1225-1119(pISSN)
/
2733-4759(eISSN)

한국경영과학회 (The Korean Operations Research and Management Science Society)

미분류 데이터의 초기예측을 통한 군집기반의 부분지도 학습방법

A Clustering-based Semi-Supervised Learning through Initial Prediction of Unlabeled Data

김응구 (한국생산성본부 컨설팅본부 CS경영센터) ;
전치혁 (포항공과대학교 산업경영공학과)

발행 : 2008.09.30

PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Semi-supervised learning uses a small amount of labeled data to predict labels of unlabeled data as well as to improve clustering performance, whereas unsupervised learning analyzes only unlabeled data for clustering purpose. We propose a new clustering-based semi-supervised learning method by reflecting the initial predicted labels of unlabeled data on the objective function. The initial prediction should be done in terms of a discrete probability distribution through a classification method using labeled data. As a result, clusters are formed and labels of unlabeled data are predicted according to the Information of labeled data in the same cluster. We evaluate and compare the performance of the proposed method in terms of classification errors through numerical experiments with blinded labeled data.

키워드

참고문헌

Bar-Hillel, A., T. hertz, N. Shental, and D. Weinshall, Learning distance functions using equivalence relations. Proceedings of 20th International Conference on Machine Learning, Washington, USA, 2003, pp.11-18.
Basu, S., A. Banerjee, and R. Mooney, Semisupervised clustering by seeding. Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, 2002, pp. 19-26.
Bilenko, M., S. Basu, and R. Mooney, Integrating constraints and metric learning in semisupervised clustering. Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004, pp.81-88.
Bouchachia, A. and W. pedrycz, Data clustering with partial supervision. Data Mining and Knowledge Discovery, Vol.12, No.1(2006), pp. 47-78. https://doi.org/10.1007/s10618-005-0019-1
Chapelle, O. and A. Zien, Semi-supervised classification by low density separation, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 2005, pp. 57-64.
Cozman, F., I. Cohen, and M. Cirelo, Semi- Supervised learning of mixture models. Proceedings of the 20th International Conference on Machine Learning, 2003, pp.99-106.
Demiriz, A., K. Bennett, and M. Embrechts, Semi-Supervised clustering using genetic algorithms. Intelligent Engineering Systems, Vol.9(1999), pp.809-814.
Dempster, A.P., N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, Vol.39(1977), pp.1-38.
Klein, D., S.D. Kamvar, and C. Manning, From instance-level constraints to space-level constraints : Making the most of prior knowledge in data clustering. Proceedings of the 19th International Conference on Machine Learning, 2002, pp.307-314.
Lee, D. and J. Lee, Equilibrium-based support vector machine for semi-supervised classification, IEEE Trans. on Neural Networks, Vol.18, No.2(2007), pp.578-583. https://doi.org/10.1109/TNN.2006.889495
Nigam, K., A. McCallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using EM, Machine Learning, Vol.39(2000), pp.103-134. https://doi.org/10.1023/A:1007692713085
Tan, P.N., M. Steinbach, and V.Kumar, Introduction to Data Mining, Pearson Education, Boston, 2006.
Wagstaff, K., C. Cardie, S. Rogers, and S. Schroedl, Constrained K-means clustering with background knowledge. Proceedings of the 18th International Conference on Machine Learning, Massachusetts, USA, 2001, pp.577-584.
Xing, E.P., A.Y. Ng, M.I. Jordan, and S. Russell, Distance metric learning, with application to clustering with side information. Advances in Neural Information Processing Systems, Vol. 15(2003), pp.505-512.
Zhu, X.Semi-supervised learning literature survey, Computer Sciences TR 1530, University of Wisconsin-Madison. http://www.cs.wisc. edu/-jerryzhu/pub/s sl_survey.pdf, 2007.
UCI repository : http://www.ics.uci.edu/-mlearn/MLRepository .html.

한국경영과학회지 (Journal of the Korean Operations Research and Management Science Society)

미분류 데이터의 초기예측을 통한 군집기반의 부분지도 학습방법

A Clustering-based Semi-Supervised Learning through Initial Prediction of Unlabeled Data

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)