Browse > Article
http://dx.doi.org/10.3745/KTSDE.2013.2.10.705

Active Learning based on Hierarchical Clustering  

Woo, Hoyoung (충남대학교 컴퓨터공학과)
Park, Cheong Hee (충남대학교 컴퓨터공학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.2, no.10, 2013 , pp. 705-712 More about this Journal
Abstract
Active learning aims to improve the performance of a classification model by repeating the process to select the most helpful unlabeled data and include it to the training set through labelling by expert. In this paper, we propose a method for active learning based on hierarchical agglomerative clustering using Ward's linkage. The proposed method is able to construct a training set actively so as to include at least one sample from each cluster and also to reflect the total data distribution by expanding the existing training set. While most of existing active learning methods assume that an initial training set is given, the proposed method is applicable in both cases when an initial training data is given or not given. Experimental results show the superiority of the proposed method.
Keywords
Active Learning; Clustering; Ward's Method;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. J. Joshi, F. Porikli, and N. Papanikolopoulos, "Multi-class active learning for image classification", in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognition, pp.2372-2379, 2009.
2 Y. Freund, H.S. Seung, E. Shamir, and N. Tishby, "Selective sampling using the query by committee algorithm", Machine learning, Vol.28(2-3), 1997.
3 P. Melville and R. Mooney. "Diverse ensembles for active learning", In Proceedings of the International Conference on Machine Learning (ICML), pp.584-591. Morgan Kaufmann, 2004.
4 N. Roy and A. McCallum. "Toward optimal active learning through sampling estimation of error reduction", In Proceedings of the International Conference on Machine Learning (ICML), pp.441-448. Morgan Kaufmann, 2001.
5 Ward, J. H., Jr., "Hierarchical Grouping to Optimize an Objective Function", Journal of the American Statistical Association, 48, 236-244, 1963.
6 N. Semmar, B. Bruguerolle, N. Simon, "Cluster Analysis: An Alternative Method for Covariate Selection in Population Pharmacokinetic Modeling", Journal of Pharmacokinetics and Pharmacodynamics, Vol.32, 2005.
7 Davies, David L. Bouldin, Donald W. A "Cluster Separation Measure", IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1 (2): 224-227. 1979.   DOI   ScienceOn
8 Ihara, Shunsuke., "Information theory for continuous systems", World Scientific. p. 2. ISBN 978-981-02-0985-8. 1993.
9 Christopher D. Manning, Prabhakar Raghavan & Hinrich Schutze. "Introduction to Information Retrieval". Cambridge University Press. ISBN 978-0-521-86571-5, 2008.
10 UCI Machine Learning Repository [Internet], http://archive.ics.uci.edu/ml
11 A Library for Support Vector Machines [Internet], http://csie.ntu.edu.tw/-cjlin/libsvm/
12 Machine Learning Group at University of Waikato [Internet], http://www.cs.waikato.ac.nz/ml/weka/
13 B. Settles, "Active learning literature survey: Computer sciences technical report 1648", University of Wisconsin-Madison, 2009
14 S. Tong and D. Koller, "Support Vector Machine Active Learning with Applications to Text Classification", J. Machine Learning Research, Vol.2, pp.45-66, 2002.
15 L. Zhang, C. Chen, J. Bu, D. Cai, X. He, T. S. Huang, "Active Learning Based on Locally Linear Reconstruction", IEEE Trans. Pattern Anal. Machine Int., Vol.33, No.10, pp. 2026-2038, 2011.   DOI   ScienceOn
16 Hoyoung Woo, Cheong Hee Park, "Efficient Active Learning Method Based on Random Sampling and Backward Deletion", LNCS Vol.7751, 2013.
17 P. Tan, M. "Steinbach, and V. Kumar, Introduction to Data Mining", Addison Wesley, Boston 2006.
18 Woo H, C. H. Park, "Active Learning using Hierarchical Clustering and stratified sampling", KISSE proceeding, Vol.39, No.2(B), pp.216-218, 2012.