Calculating the Importance of Attributes in Naive Bayesian Classification Learning

나이브 베이시안 분류학습에서 속성의 중요도 계산방법

  • Lee, Chang-Hwan (Department of Information and Communications, Dongguk University)
  • Received : 2011.04.26
  • Accepted : 2011.09.06
  • Published : 2011.09.25

Abstract

Naive Bayesian learning has been widely used in machine learning. However, in traditional naive Bayesian learning, we make two assumptions: (1) each attribute is independent of each other (2) each attribute has same importance in terms of learning. However, in reality, not all attributes are the same with respect to their importance. In this paper, we propose a new paradigm of calculating the importance of attributes for naive Bayesian learning. The performance of the proposed methods has been compared with those of other methods including SBC and general naive Bayesian. The proposed method shows better performance in most cases.

나이브 베이시안은 기계학습에서 많이 사용되고 상대적으로 좋은 성능을 보인다. 하지만 전통적인 나이브 베이시안 학습의 환경은 두 가지의 가정을 기반으로 학습을 수행한다: (1) 각 속성들의 값은 서로 독립적이다. (2) 각 속성들의 중요도는 동일하다. 본 연구에서는 각 속성의 중요도가 동일하다는 가정에 대하여 새로운 방법을 제시한다. 즉 각 속성은 현실적으로 다른 중요도를 가지며 본 논문은 나이브 베이시안에서 각 속성의 중요도를 계산하는 새로운 방식을 제안한다. 제안된 알고리즘은 다수의 데이터를 이용하여 기존의 나이브 베이시안과 SBC 등의 다른 확장된 나이브 베이시안 방법들과 비교하였고 대부분의 경우에 더 좋은 성능을 보임을 알 수 있었다.

Keywords

References

  1. Claire Cardie and Nicholas Howe. Improving minority class prediction using case-specific feature weights. In the Fourteenth International Conference on Machine Learning, pages 57-65, 1997.
  2. Pedro Domingos and Michael Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29(2-3), 1997.
  3. Thomas Gartner and Peter A. Flach. Wbcsvm: Weighted bayesian classification based on support vector machines. In the Eighteenth International Conference on Machine Learning, 2001.
  4. Mark Hall. A decision tree-based attribute weighting filter for naive bayes. Knowledge-Based Systems, 20(2), 2007. 13
  5. S. Kullback and R. A. Leibler. On information and sufficiency. The Annals of Mathematical Statistics, 22(1):79-86, 1951. https://doi.org/10.1214/aoms/1177729694
  6. A. Frank and A. Asuncion UCI repository of machine learning databases. 2011.
  7. Pat Langley and Stephanie Sage. Induction of selective bayesian classifiers. In the Tenth Conference on Uncertainty in Artificial Intelligence, pages 399-406, 1994.
  8. J. Ross Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.
  9. Dietrich Wettschereck, David W. Aha, and Takao Mohri. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11:273-314, 1997. https://doi.org/10.1023/A:1006593614256