DOI QR코드

DOI QR Code

Improving the Classification Accuracy Using Unlabeled Data: A Naive Bayesian Case

나이브 베이지안 환경에서 미분류 데이터를 이용한 성능향상

  • Published : 2006.08.01

Abstract

In many applications, an enormous amount of unlabeled data is available with little cost. Therefore, it is natural to ask whether we can take advantage of these unlabeled data in classification learning. In this paper, we analyzed the role of unlabeled data in the context of naive Bayesian learning. Experimental results show that including unlabeled data as part of training data can significantly improve the performance of classification accuracy. The effect of using unlabeled data is especially important in case labeled data are sparse.

많은 경우에 분류데이터의 생성은 사람의 시간과 노력에 의존하기 때문에 많은 비용과 시간을 요구한다. 이에 반하여 미분류 데이터는 거의 비용을 들이지 않고 무제한의 데이터를 쉽게 획득할 수 있다. 따라서 기계학습에 있어서 이러한 미분류 데이터를 이용하여 분류학습의 성능을 향상시킬 수 있는 준감독자(semi-supervised)학습 방법이 최근 관심을 끌고 있다. 본 논문에서는 미분류 데이터가 분류학습의 성능향상에 마치는 영향을 분석하기 위하여 나이브 베이지안의 환경에서 미분류 데이터를 이용한 학습방법을 제시하고 이를 이용하여 미분류 데이터의 효용성을 실험적으로 조사하였다. 미분류 데이터는 나이브 베이지안의 환경에서 분류데이터의 숫자가 적을 때 특히 많은 효과를 보임을 알 수 있었다.

Keywords

References

  1. R. Duda, et al. 'Pattern Classification' 2nd edition, John Wiley&Sons, 2001
  2. T. Zhang 'Some Asymptotic Results Concerning the Value of Unlabeled Data' NIPS 99 Workshop on Using Unlabeled Data for Supervised Learning, 1999
  3. Y. Zhou and S. Goldman 'Enhancing Supervised Learning with Unlabeled Data' 17th Int'l Conf. On Machine Learning, pp.327-334, 2000
  4. Vittorio Castelli and Thomas M. Cover, 'On the Exponential Value of Labeled Samples' Pattern Recognition Letters, Vol.16, pp.105-111, 1995 https://doi.org/10.1016/0167-8655(94)00074-D
  5. B. Shahshahani and D. Landgrebe 'The Effect of Unlabeled Samples in Reducing the Small Sample Size Problem and Mitigating the Hughes Phenomenon' IEEE Trans. On Geoscience and Remote Sensing, Vol.32, No.5, 1087-1095, 1994 https://doi.org/10.1109/36.312897
  6. F. De Comite et al. 'Positive and Unlabeled Examples Help Learning' Tenth Int'l Conf. on Algorithmic Learning Theory, pp.219-230, 1999
  7. Sally Goldman and Yan Zhou 'Enhancing Supervised Learning with Unlabeled Data' ICML, 2000
  8. T. Hofmann 'Text Categorization with Labeled and Unlabeled Data: A Generative Model Approach' NIPS 99 Workshop on Using Unlabeled Data for Supervised Learning, 1999
  9. K. Nigam, AK. McCallum, S. Thrun, and T. Mitchell 'Text Classification from Labeled and Unlabeled Documents Using EM' Machine Learning, Vol.39, pp.103-134, 2000 https://doi.org/10.1023/A:1007692713085
  10. T. Mitchell 'The Role of Unlabeled Data in Supervised Learning' 6th Int'l Colloquium on Cognitive Science, 1999
  11. A. P. Dempster, N. M. Laird, and D. B. Rubin 'Maximum Likelihood from Incomplete Data via the EM Algorithm' Journal of Royal Statistical Society, Vol.39, pp.1-38, 1977
  12. Xing Yi; Changshui Zhang; Jingdong Wang, 'Multi-view EM algorithm and its application to color image segmentation' IEEE International Conference on Multimedia and Expo., 2004
  13. Avrim Blum and Tom Mitchell 'Combining Labeled and Unlabeled Data with Co-Training', COLT, 1998 https://doi.org/10.1145/279943.279962
  14. Kamal Nigam and Rayid Ghani 'Analyzing the Effectiveness and Applicability of Co-training', CIKM, 2000 https://doi.org/10.1145/354756.354805
  15. A. Levin and P. Viola and Y. Freund, 'Unsupervised improvement of visual detectors using co-training' the Nineth IEEE International Conference on Computer Vision, 2003
  16. Rong Yan and Naphade, M. 'Multi-Modal Video Concept Extraction Using Co-Training' IEEE International Conference on Multimedia and Expo., 2005 https://doi.org/10.1109/ICME.2005.1521473
  17. D. Cohn et al 'Active Learning with Statistical Models' Journal of Artificial Intelligence Research, Vol.4, pp.129-145, 1996
  18. R. Liere and P. Tadepalli 'Active Learning with Committees for Text Categorization' 14th National Conf. on Artificial Intelligence, pp.591-596, 1997
  19. Tur, G., Schapire, R.E., and Hakkani-Tur, D. 'Active learning for spoken language understanding' IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003
  20. Riccardi, G. and Hakkani-Tur, D. 'Active learning: theory and applications to automatic speech recognition' IEEE Transactions on Speech and Audio Processing, 2005 https://doi.org/10.1109/TSA.2005.848882
  21. Tom Mitchell, 'Machine Learning' McGraw Hill, 1997