[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTB.2008.15-B.5.457

Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting

Kim, Han-Joon (서울시립대학교 전자전기컴퓨터공학부)
Chang, Jae-Young (한성대학교 컴퓨터공학과)

Publication Information

The KIPS Transactions:PartB / v.15B, no.5, 2008 , pp. 457-464 More about this Journal

Abstract

In the real-world operational environment, most of text classification systems have the problems of insufficient training documents and no prior knowledge of feature space. In this regard, $Na{\ddot{i}ve$ Bayes is known to be an appropriate algorithm of operational text classification since the classification model can be evolved easily by incrementally updating its pre-learned classification model and feature space. This paper proposes the improving technique of $Na{\ddot{i}ve$ Bayes classifier through feature weighting strategy. The basic idea is that parameter estimation of $Na{\ddot{i}ve$ Bayes considers the degree of feature importance as well as feature distribution. We can develop a more accurate classification model by incorporating feature weights into Naive Bayes learning algorithm, not performing a learning process with a reduced feature set. In addition, we have extended a conventional feature update algorithm for incremental feature weighting in a dynamic operational environment. To evaluate the proposed method, we perform the experiments using the various document collections, and show that the traditional $Na{\ddot{i}ve$ Bayes classifier can be significantly improved by the proposed technique.

Keywords

Text classification $Na{\ddot{i}ve$ Bayes classifier; feature weighting; feature selection $X^2$ -statistics;

Citations & Related Records

Reference

1	M. Hu and B. Liu, "Mining and Summarizing Customer Reviews," Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04), pp.168-177, 2004 DOI
2	T.M. Mitchell, "Bayesian Learning," Machine Learning, McGraw-Hill, pp.154-200, 1997
3	E.H. Han, G. Karypis G and V. Kumar, "Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification," Proceedings of The fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD '91), pp.53-65, 1991
4	F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys, Vol.34, No.1, pp.1-47, 2002 DOI ScienceOn
5	N. Jindal and B. Liu, "Identifying Comparative Sentences in Text Documents," Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06), pp.244-251, 2006 DOI
6	K.J. Mock, "Hybrid hill-climbing and Knowledge-based techniques for Intelligent News Filtering," Proceedings of the National Conference on Artificial Intelligence (AAAI'96), pp.48-53, 1996
7	T. Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," Proceedings of the 10th European Conference on Machine Learning (ECML'98), pp.137-142, 1998
8	Y. Yang and J. Pedersen, "A comparative study on feature selection in text categorization," Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), pp.412-420, 1997
9	I. Katakis, G. Tsoumakas and I. Vlahavas, "Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams," Proceedings of ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams, pp.107-116, 2006
10	A. Kolcz, V. Prabakarmurthi and J. Kalita, "Summarization as Feature Selection for Text Categorization," Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM'01), pp.365-370, 2001 DOI

KSCI

Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting 점진적 특징 가중치 기법을 이용한 나이브 베이즈 문서분류기의 성능 개선

Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting