Browse > Article

Generation and Selection of Nominal Virtual Examples for Improving the Classifier Performance  

Lee, Yu-Jung (부산대학교 컴퓨터공학과)
Kang, Byoung-Ho (부산대학교 컴퓨터공학과)
Kang, Jae-Ho (야후코리아 Search R&D센터)
Ryu, Kwang-Ryel (부산대학교 컴퓨터공학과)
Abstract
This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naive Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network's conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way.can help various learning algorithms to derive classifiers of improved accuracy.
Keywords
machine learning; classification; Bayesian network; naive Bayes; conditional likelihood; virtual example;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Pazzani, M., 'Constructive induction of Cartesian product attributes,' Information, Statistics and Induction in Science, pp. 66-77, 1996
2 Alrnuallim, H. and Dietterich, T. G., 'Learning With Many Irrelevant Features,' Proc. of the 9th National Conference on Artificial Intelligence, pp. 547-552, 1991
3 Kohavi, R. and Sahami, M., 'Error-based and Entropy-based Discretization of Continuous Features,' Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114-119, 1996
4 Breiman, L., 'Stacked Regression,' Machine Learning, Vol.24, No.2, pp. 123-140, 1996   DOI
5 Newman, D. J., Hettich, S., Blake, C. L. and Merz, C. J., UCI Repository of machine learning databases [http://www.ics.uci.edu/-mlearn/MLRepository .html], CA: University of California, Department of Information and Computer Science, Irvine, 1998
6 Witten, I. H. and Frank, E., Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufman Publishers, 1999
7 Freund, Y. and Schapire, R. E., 'Experiments with a New Boosting Algorithm,' Proc. of the 13th International Conference on Machine Learning, pp, 148-156, 1996
8 Wolpert, D. H., 'Stacked Generalization,' Neural Networks, Vol.5, pp. 241-259, 1992   DOI   ScienceOn
9 Aha, D. W., 'Tolerating Noisy, Irrelevant, and Novel Attributes in Instance-based Learning Algorithms,' International Journal of Man-Machine Studies, Vol.36, No.2, pp. 267-287, 1992   DOI
10 이경순, 안동언 '문서분류에서 가상문서기법을 이용한 성능 향상' 정보처리학회논문지, 저11-B권, 제4호, pp. 501-508, 2004   과학기술학회마을   DOI
11 Cho, S. and Cha, K., 'Evolution of Neural Network Training Set through Addition of Virtual samples,' Proc. of the 1996 IEEE International Conference on Evolutionary Computation, pp, 685-688, 1996   DOI
12 Aha, D. and Kibler, D., 'Instance-based Learning ?Algorithms,' Machine Learning, Vol.6, pp. 37-66, 1991   DOI
13 Weka3 - Data Mining with Open Source Machine Learning Software in Java http://www.cs.waikato.ac.nz/-ml/weka
14 Ryu, Y. S. and Oh, S. Y., 'SIMPLE Hybrid Classifier for Face Recognition with Adaptively Generated Virtual Data,' Pattern Recognition Letters, 2002   DOI   ScienceOn
15 Quinlan, J. R., C4.5 : Programs for Machine Learning, Morgan Kaufmann Publishers, 1993
16 김종성, 박태진, 강재호, 백납철, 강원회, 이상협, 류광렬, '병합된 예제를 이용한 자동 차 번호판 문자 인식' 한국정보과학회 2004 가을 학술발표논문집(I), 제31권, 제2호, pp. 238-240, 2004
17 Cho, S., Jang, M. and Chang, S., 'Virtual Sample Generation using a Population of Networks,' Neural Processing Letters, Vol.5, No.2, pp. 83-89, 1997   DOI
18 김종성, '분류 성능 향상을 위한 가상예제 생성 방안' ?부산대학교 석사학위논문, 2004
19 이유정, 강병호, 강재호, 류광렬, '가상예제를 이용한 naive Bayes 분류기 성능 향상' 한국정보과학회 제32회 추계학술발표회 논문집, Vol.:32, No.2, pp. 655-657, 2005
20 Burges, C. and Scholkopf, B., 'Improving the Accuracy and Speed of Support Vector Machines,' Advances in Neural Information Processing System, Vol.9, No.7, 1997
21 Greiner, R. and Zhou, W., 'Structural Extension to Logistic Regression: Discriminative parameter learning of belief net classifiers,' Proc. of the 18th National Conference on Artificial Intelligence, pp. 167 -173, 2002
22 Scholkopf, B., Burges, C. J. C. and Smola, A. J., Advance in Kernel Methods - Support Vector Learning, MIT Press, 1998
23 Sietsma, J. and Dow, R. J. F., 'Creating Artificial Neural Networks that Generalize. Neural Networks,' IEEE transactions on Neural Networks, Vol.4, pp. 67-79, 1991   DOI   ScienceOn
24 Grossman, D. and Domingos, P., 'Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood,' Proc. of the 21th International Conference on Machine Learning, pp. 361-368, 2004
25 John, G. and Langley, P., 'Estimating Continuous Distributions in Bayesian Classifiers,' Proc. of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338-345, 1995