Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2009.16B.6.479

Construction Scheme of Training Data using Automated Exploring of Boundary Categories  

Choi, Yun-Jeong (서일대학 정보통신과)
Jee, Jeong-Gyu (한국연구재단 연구기반조성단장)
Park, Seung-Soo (이화여자대학교 컴퓨터공학과)
Abstract
This paper shows a reinforced construction scheme of training data for improvement of text classification by automatic search of boundary category. The documents laid on boundary area are usually misclassified as they are including multiple topics and features. which is the main factor that we focus on. In this paper, we propose an automated exploring methodology of optimal boundary category based on previous research. We consider the boundary area among target categories to new category to be required training, which are then added to the target category sementically. In experiments, we applied our method to complex documents by intentionally making errors in training process. The experimental results show that our system has high accuracy and reliability in noisy environment.
Keywords
Machine Learning; Learning/Training Algorithms; Active Learning; Hierarchical Classification; Clustering;
Citations & Related Records
연도 인용수 순위
  • Reference
1 최윤정, 박승수, “학습방법 개선과 후처리분석을 이용한 자동문서분류의 성능향상 방법,” 한국정보처리학회논문지, Vol.12, No.7, pp.811-822, 2005
2 Y., Yang, “Expert Network:Effective and Efficient Learning form Human Decisions in Text Categorization and Retrieval,” in Proc. of 17th ACM, pp.13-22, 1994
3 D., David, J., Catlett, “Heterogeneous Uncertainty Sampling for Supervised Learning,” In Proc. of the 11th ICML, pp. 148-156, 1994
4 D., Raj,et.al, “Boosting for document routing,” In Proc. of the AGM CIKM, pp.70-77, 2000
5 CLUTO-Clustering Algorithms, http://glaros.dtc.umn.edu/gkhome/views/cluto
6 C., Cortes, V., Vapnik, “Supprot Vector Network,” Machine Learning, Vol.20, pp.273-297, 1995
7 T., Joachims, “Text categorization with support vector machines: learning with many relevant features,” In Proc. of ECML-98 pp.137-142, 1998
8 Y., Yang, “An Evaluation of Statistical Approaches to Text Categorization,” Journal of Information Retrieval, Vol.1, No.1, pp.67-88, 1999
9 M., Ruiz, P.Srinivasan, “Hierarchical text categorization using neural networks,” Information Retrieval, Vol.5, No.1, pp.87-118, 2002   DOI   ScienceOn
10 O., Dekel, J., Keshet, “Large margin hierarchical classification.,” In Proc. of the ICML'04, pp.209- 216, 2004
11 D. Koller, S., Tong, “Active learning for parameter estimation in Bayesian networks,” In Neural Information Processing Systems, 2001
12 D., Cohn, “Less is more: Active learning with support vector machines,” In Proc.17th International Conference on Machine Learning, pp.839-846, 2000