Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2012.19B.4.237

A Document Classification System Using Modified ECCD and Category Weight for each Document  

Han, Chung-Seok (숭실대학교 컴퓨터학과)
Park, Sang-Yong (숭실대학교 컴퓨터학과)
Lee, Soo-Won (숭실대학교 컴퓨터학부)
Abstract
Web information service needs a document classification system for efficient management and conveniently searches. Existing document classification systems have a problem of low accuracy in classification, if a few number of feature words is selected in documents or if the number of documents that belong to a specific category is excessively large. To solve this problem, we propose a document classification system using 'Modified ECCD' feature selection method and 'Category Weight for each Document'. Experimental results show that the 'Modified ECCD' feature selection method has higher accuracy in classification than ${\chi}^2$ and the ECCD method. Moreover, combining the 'Category Weight for each Document' feature value and 'Modified ECCD' feature selection method results better accuracy in classification.
Keywords
Document Classification; Feature Selection; ECCD;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 KLT2010, http://nlp.kookmin.ac.kr/
2 Dan-Ho Park, Won-Sik Choi, Hong-Jo Kim, Seok-Lyong Lee, "Web Document Classification System Using the Text Analysis and Decision Tree Model", Proceedings of The Korean Institute of Information Scientists and Engineers 2011 fall, Vol.38, No.2(A), pp.248-251, 2011.
3 Salton, G. "Automatic processing of foreign language documents." Journal of the American Society for Information Science, 21(3), pp.187-194, 1970.   DOI
4 C. E. Shannon, "A mathematical theory of communication" ACM SIGMOBILE Mobile Computing and Communications Review, Vol.5 Issue 1, January, 2001.
5 Kil-Hong Joo, Eun-young Shin, Joo_Il Lee, Won-Suk Lee, "Hierarchical Automatic Classification of News Articles based on Association Rules" Journal of Korea Multimedia Society, Vol.14, No.6, pp.730-741, June, 2011.   DOI   ScienceOn
6 Sanasam Ranbir Singh, Hema A. Murthy, Timothy A. Gonsalves, "Feature Selection for Text Classification Based on Gini Coefficient of Inequality" JMLR:Workshop and Conference Proceedings, pp.76-85, 2010.
7 Christine Largeron, Christophe Moulin, Mathias Gery, "Entropy based feature selection for text categorization" Proceedings of the 2011 ACM Symposium on Applied Computing, pp.924-928, 2011.
8 Haichao Dong, Siu Cheung Hui, Yulan He*, "Structural Analysis of Chat Messages for Topic Detection" Online Information Review, pp.496-516, 2006.
9 S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, K. R. K. Murthy, "Improvements to Platt's SMO Algorithm for SVM Classifier Design" Journal Neural Computation, Vol.13 Issue 3, March, 2001.
10 Steven L. "C4.5: Programs for Machine Learning" Book Review, Machine Learning, 16, pp.235-240, 1994.
11 P.Winstron, http://www2.cs.uregina.ca/-dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html, 1992.
12 Hang Li, Kenji Yamanishi, "Document classification using a finite mixture model", EACL '97 Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, 1997.
13 McCallum, A., Nigam, K., "A comparison of event models for Naive Bayes text classification.", AAAI-98 Workshop on Learning for Text Categorization, pp.41-48, 1998.