Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2005.12B.7.811

Reinforcement Method for Automated Text Classification using Post-processing and Training with Definition Criteria  

Choi, Yun-Jeong (이화여자대학교 컴퓨터학과)
Park, Seung-Soo (이화여자대학교 컴퓨터학과)
Abstract
Automated text categorization is to classify free text documents into predefined categories automatically and whose main goals is to reduce considerable manual process required to the task. The researches to improving the text categorization performance(efficiency) in recent years, focused on enhancing existing classification models and algorithms itself, but, whose range had been limited by feature based statistical methodology. In this paper, we propose RTPost system of different style from i.ny traditional method, which takes fault tolerant system approach and data mining strategy. The 2 important parts of RTPost system are reinforcement training and post-processing part. First, the main point of training method deals with the problem of defining category to be classified before selecting training sample documents. And post-processing method deals with the problem of assigning category, not performance of classification algorithms. In experiments, we applied our system to documents getting low classification accuracy which were laid on a decision boundary nearby. Through the experiments, we shows that our system has high accuracy and stability in actual conditions. It wholly did not depend on some variables which are important influence to classification power such as number of training documents, selection problem and performance of classification algorithms. In addition, we can expect self learning effect which decrease the training cost and increase the training power with employing active learning advantage.
Keywords
Automated Text Categorization(classification); Active Learning.; Self Learning; Hierarchical Classification; Text Mining; Data Mining; Fault Detection;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 W.N. Street, and Y. S. Kim, 'Streaming ensemble algorithm(SEA) for large-scale classification,' Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.377-382, San Francisco, California, 2001   DOI
2 B. Krishnarnachari and S. Iyengar, 'Distributed Bayesian Algorithms for Fault-Tolerant Event Region Detection in Wireless Sensor Networks,' IEEE Transactions on Computers, Vol.53, No.3, pp.241-250, March, 2004   DOI   ScienceOn
3 D. K. Pradhan, ed., Fault-Tolerant Computer System Design. Prentice Hall Inc., 1996
4 Dagan, I. And A.Itai, 'Word Sense Disambiguation using a second language monolingual corpus,' Computational Linguistics, 20(4), December, 1994
5 Hatzivassiloglou, V., P.A. Duboue, and A.Rzhetsky. 'Disambiguating Proteins, Genes and RNA in Text: a Machine Learning Approach'. Bioinformatics Vol.17, pp.S97-106, 2001   DOI   ScienceOn
6 Tateishi, Y., T. Ohta, J Tsujii, 'Building an Annotated Corpus in the Molecular-Biology Domain,' In Proceedings of COLING 2000 Workshop on Semantic Annotation and Intelligent Content, pp.28-34, 2000
7 S. B. Cho, 'Ensemble of structure adaptive self-organizing maps for high performance classification,' Inforrmation Science, Vol.23, No.1-2, pp.103-114, 2000   DOI   ScienceOn
8 Ko, Y.J., J.Y., Seo, 'Using the Feature Projection Technique based on a Normalized Voting Method for Text Classification,' Information Processing & Management, Pergamon-Elsevier Science, Vol.40, No.2, pp.191-208, Mar., 2004   DOI   ScienceOn
9 Wilson, D.,R., et al 'Reduction Techniques for Exemplar-based Learning algorithms,' Machine Learning, Vol.38, No.3, pp.257-286, 2002   DOI
10 김제준,김한구, '베이지언 문서분류시스템을 위한 능동적 학습기반의 학습문서집합 구성방법', 한국정보과학회 논문지, Vol.29, No.12, 2002. 12   과학기술학회마을
11 T.Joachims, 'Text categorization with support vector machines: learning with many relevant features,' In Proceedings of ECML -98, 10th European Conference on Machine Learning, pp.137-142, 1998
12 C., Cortes and V., Vapnik, 'Supprot Vector Network', Machine Learning, Vol.20, pp.273-297, 1995   DOI
13 D. Koller and S. Tong. 'Active learning for parameter estimation in Bayesian networks,' In Neural Information Processing Systems, 2001
14 M. Hasenager. 'Active Data Selection in Supervised and Unsupervised Learning,' PhD thesis, Technische Fakultat der Universitat Bielefeld, 2000
15 Yiming Yang. 'An Evaluation of Statistical Approaches to Text Categorization,' Journal of Information Retrieval, Vol.1, No.1, pp.67-88, 1999   DOI
16 Zijian Zheng. 'Naive Bayesian Classifier Committees,' In Proceedings of European Conference on Machine Learning, pp.196-207, 1998   DOI   ScienceOn
17 Yiming Yang and J. O. Pedersen. 'A Comparative Study on Feature Selection in Text Categorization,' In Proceedings of the 14th International Conference on Machine Learning, pp.42-420, 1997
18 Kim S.B., HC.,Rim, 'Recomputation of Class Relevence Score for Improving Text Classification,' In Proceedings of Conference of CICLing(Computational Linguistics and Intelligent Text Processing), Lecture Note in Computer Science, VoI.2945, pp.580-583, Feb., 2004   DOI
19 David D. Lewis and Jason Catlett. 'Heterogeneous Uncertainty Sampling for Supervised Learning,' In Proceedings of the 11th international Conference on Machine Learning, pp.148-156, 1994
20 Pedro Domingos and Michael Pazzani. 'Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier', In Proceedings of the 13th International Conference on Machine Learning, pp.105-112, 1996
21 R. Agrawal, R. Bayardo, and R. Srikant, 'Athena: Mining-based Interactive Management of Text Databases,' In Proceedings of the 7th International Conference on Extending Database Technology, pp.365-379, 2000   DOI   ScienceOn