Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2006.13B.4.457

Improving the Classification Accuracy Using Unlabeled Data: A Naive Bayesian Case  

Lee Chang-Hwan (동국대학교 정보통신학과)
Abstract
In many applications, an enormous amount of unlabeled data is available with little cost. Therefore, it is natural to ask whether we can take advantage of these unlabeled data in classification learning. In this paper, we analyzed the role of unlabeled data in the context of naive Bayesian learning. Experimental results show that including unlabeled data as part of training data can significantly improve the performance of classification accuracy. The effect of using unlabeled data is especially important in case labeled data are sparse.
Keywords
Machine Learning; Semi-supervised Learning; Artificial Intelligence;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Riccardi, G. and Hakkani-Tur, D. 'Active learning: theory and applications to automatic speech recognition' IEEE Transactions on Speech and Audio Processing, 2005   DOI   ScienceOn
2 Tom Mitchell, 'Machine Learning' McGraw Hill, 1997
3 Avrim Blum and Tom Mitchell 'Combining Labeled and Unlabeled Data with Co-Training', COLT, 1998   DOI
4 A. P. Dempster, N. M. Laird, and D. B. Rubin 'Maximum Likelihood from Incomplete Data via the EM Algorithm' Journal of Royal Statistical Society, Vol.39, pp.1-38, 1977
5 Xing Yi; Changshui Zhang; Jingdong Wang, 'Multi-view EM algorithm and its application to color image segmentation' IEEE International Conference on Multimedia and Expo., 2004
6 T. Mitchell 'The Role of Unlabeled Data in Supervised Learning' 6th Int'l Colloquium on Cognitive Science, 1999
7 T. Hofmann 'Text Categorization with Labeled and Unlabeled Data: A Generative Model Approach' NIPS 99 Workshop on Using Unlabeled Data for Supervised Learning, 1999
8 K. Nigam, AK. McCallum, S. Thrun, and T. Mitchell 'Text Classification from Labeled and Unlabeled Documents Using EM' Machine Learning, Vol.39, pp.103-134, 2000   DOI
9 F. De Comite et al. 'Positive and Unlabeled Examples Help Learning' Tenth Int'l Conf. on Algorithmic Learning Theory, pp.219-230, 1999
10 R. Liere and P. Tadepalli 'Active Learning with Committees for Text Categorization' 14th National Conf. on Artificial Intelligence, pp.591-596, 1997
11 Tur, G., Schapire, R.E., and Hakkani-Tur, D. 'Active learning for spoken language understanding' IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003
12 D. Cohn et al 'Active Learning with Statistical Models' Journal of Artificial Intelligence Research, Vol.4, pp.129-145, 1996
13 A. Levin and P. Viola and Y. Freund, 'Unsupervised improvement of visual detectors using co-training' the Nineth IEEE International Conference on Computer Vision, 2003
14 Rong Yan and Naphade, M. 'Multi-Modal Video Concept Extraction Using Co-Training' IEEE International Conference on Multimedia and Expo., 2005   DOI
15 Kamal Nigam and Rayid Ghani 'Analyzing the Effectiveness and Applicability of Co-training', CIKM, 2000   DOI
16 Vittorio Castelli and Thomas M. Cover, 'On the Exponential Value of Labeled Samples' Pattern Recognition Letters, Vol.16, pp.105-111, 1995   DOI   ScienceOn
17 Sally Goldman and Yan Zhou 'Enhancing Supervised Learning with Unlabeled Data' ICML, 2000
18 B. Shahshahani and D. Landgrebe 'The Effect of Unlabeled Samples in Reducing the Small Sample Size Problem and Mitigating the Hughes Phenomenon' IEEE Trans. On Geoscience and Remote Sensing, Vol.32, No.5, 1087-1095, 1994   DOI   ScienceOn
19 Y. Zhou and S. Goldman 'Enhancing Supervised Learning with Unlabeled Data' 17th Int'l Conf. On Machine Learning, pp.327-334, 2000
20 R. Duda, et al. 'Pattern Classification' 2nd edition, John Wiley&Sons, 2001
21 T. Zhang 'Some Asymptotic Results Concerning the Value of Unlabeled Data' NIPS 99 Workshop on Using Unlabeled Data for Supervised Learning, 1999