Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2008.15-B.5.457

Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting  

Kim, Han-Joon (서울시립대학교 전자전기컴퓨터공학부)
Chang, Jae-Young (한성대학교 컴퓨터공학과)
Abstract
In the real-world operational environment, most of text classification systems have the problems of insufficient training documents and no prior knowledge of feature space. In this regard, $Na{\ddot{i}ve$ Bayes is known to be an appropriate algorithm of operational text classification since the classification model can be evolved easily by incrementally updating its pre-learned classification model and feature space. This paper proposes the improving technique of $Na{\ddot{i}ve$ Bayes classifier through feature weighting strategy. The basic idea is that parameter estimation of $Na{\ddot{i}ve$ Bayes considers the degree of feature importance as well as feature distribution. We can develop a more accurate classification model by incorporating feature weights into Naive Bayes learning algorithm, not performing a learning process with a reduced feature set. In addition, we have extended a conventional feature update algorithm for incremental feature weighting in a dynamic operational environment. To evaluate the proposed method, we perform the experiments using the various document collections, and show that the traditional $Na{\ddot{i}ve$ Bayes classifier can be significantly improved by the proposed technique.
Keywords
Text classification$Na{\ddot{i}ve$ Bayes classifier; feature weighting; feature selection$X^2$-statistics;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Hu and B. Liu, "Mining and Summarizing Customer Reviews," Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04), pp.168-177, 2004   DOI
2 T.M. Mitchell, "Bayesian Learning," Machine Learning, McGraw-Hill, pp.154-200, 1997
3 E.H. Han, G. Karypis G and V. Kumar, "Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification," Proceedings of The fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD '91), pp.53-65, 1991
4 F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys, Vol.34, No.1, pp.1-47, 2002   DOI   ScienceOn
5 N. Jindal and B. Liu, "Identifying Comparative Sentences in Text Documents," Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06), pp.244-251, 2006   DOI
6 K.J. Mock, "Hybrid hill-climbing and Knowledge-based techniques for Intelligent News Filtering," Proceedings of the National Conference on Artificial Intelligence (AAAI'96), pp.48-53, 1996
7 T. Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," Proceedings of the 10th European Conference on Machine Learning (ECML'98), pp.137-142, 1998
8 Y. Yang and J. Pedersen, "A comparative study on feature selection in text categorization," Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), pp.412-420, 1997
9 I. Katakis, G. Tsoumakas and I. Vlahavas, "Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams," Proceedings of ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams, pp.107-116, 2006
10 A. Kolcz, V. Prabakarmurthi and J. Kalita, "Summarization as Feature Selection for Text Categorization," Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM'01), pp.365-370, 2001   DOI