Browse > Article

A Co-training Method based on Classification Using Unlabeled Data  

윤혜성 (이화여자대학교 컴퓨터학과)
이상호 (이화여자대학교 컴퓨터학)
박승수 (이화여자대학교 컴퓨터학)
용환승 (이화여자대학교 컴퓨터학)
김주한 (서울대학교 의과대학 생명의료정보학)
Abstract
In many practical teaming problems including bioinformatics area, there is a small amount of labeled data along with a large pool of unlabeled data. Labeled examples are fairly expensive to obtain because they require human efforts. In contrast, unlabeled examples can be inexpensively gathered without an expert. A common method with unlabeled data for data classification and analysis is co-training. This method uses a small set of labeled examples to learn a classifier in two views. Then each classifier is applied to all unlabeled examples, and co-training detects the examples on which each classifier makes the most confident predictions. After some iterations, new classifiers are learned in training data and the number of labeled examples is increased. In this paper, we propose a new co-training strategy using unlabeled data. And we evaluate our method with two classifiers and two experimental data: WebKB and BIND XML data. Our experimentation shows that the proposed co-training technique effectively improves the classification accuracy when the number of labeled examples are very small.
Keywords
co-training; classification algorithm; semi-supervised learning; semi-structured data;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Mitchell, 'The Role of Unlabeled Data in Supervised Learning,' Proceedings of the 6th International Colloquium on Cognitive Science (ICCS), pp. 254-278, 1999
2 K. Nigam, A. K. Mccallum, S. Thrun, T. and Mitchell, 'Text Classification from Labeled and Unlabeled Documents using EM,' Machine Learning, 39(2/3), pp.103-134, 2000   DOI
3 S. Goldman and Y. Zhou, 'Enhancing Supervised Learning with Unlabeled Data,' Proceedings of the 7th International Conference on Machine Learning(ICML), pp. 327-334, 2000
4 B. Raskutti, H. Ferra, A. Kowalczyk, 'Combining Clustering and Co-training to Enhance Text Classification Using Unlabelled Data,' Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (KDD), pp.620-625, 2002
5 I. Muslea, S. Minston and C. Knoblock, 'Active+Semi-Supervised Learning=Robust Multi-view Learning,' Proceedings of International Conference on Machine Learning (ICML), pp.435-442, 2002
6 D. Suciu, 'Semistructured Data and XML,' Proceedings of International Conference on Foundations of Data Organization (FODO), 1998
7 A. Blum and T. Mitchell, 'Combining Labeled and Unlabeled Data with Co-Training,' Proceedings of the 11th Annual Conference on Compotational Learning Theory (COLT), pp.92-100, 1998
8 K. Nigam and R. Ghani, 'Analyzing the Effectiveness and Applicability of Co-Training,' Proceedings of Information and Knowledge Management, pp.86-93, 2000
9 K. Nigam and R. Ghani, 'Understanding the Behavior of Co-training,' in KDD-2000 Workshop on Text Mining, 2000
10 I. Muslea, S. Minston and C. Knoblock, 'Selective Sampling with Redundant Views,' Proceedings of National Conference on Artificial Intelligence, pp.621-626, 2000
11 M. Figueiredo, A. K. Jain and M. H. Law, 'A Feature Selection Wrapper for Mixtures,' Proceedings of the First Iberian Conference on Pattern Recognition and Image Analysis, Puerto de Andratx, Spain, June, 2003
12 I. Muslea, S. Minston and C. Knoblock, 'Active Learning with Strong and Weak Views: A Case study on Wrapper Induction,' Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2003
13 P. Buneman, 'Tutorial: Semistructured Data,' Proceedings of ACM Symposium on Principles of Database Systems, pp.117-121, 1997
14 O. Chapelle, J. Weston and B. Scholkopf, 'Cluster Kernels for Semi-Supervised Learning,' Advances in Neural Information Processing Systems (NIPS 2002), MIT Press, Cambridge, MA, 2003