Performance Comparison of Naive Bayesian Learning and Centroid-Based Classification for e-Mail Classification

전자메일 분류를 위한 나이브 베이지안 학습과 중심점 기반 분류의 성능 비교

  • Kim, Kuk-Pyo (Department of Industrial & Systems Engineering, Dongguk University) ;
  • Kwon, Young-S. (Department of Industrial & Systems Engineering, Dongguk University)
  • 김국표 (동국대학교 산업시스템공학부) ;
  • 권영식 (동국대학교 산업시스템공학부)
  • Received : 2003.10.08
  • Accepted : 2004.12.10
  • Published : 2005.03.31

Abstract

With the increasing proliferation of World Wide Web, electronic mail systems have become very widely used communication tools. Researches on e-mail classification have been very important in that e-mail classification system is a major engine for e-mail response management systems which mine unstructured e-mail messages and automatically categorize them. In this research we compare the performance of Naive Bayesian learning and Centroid-Based Classification using the different data set of an on-line shopping mall and a credit card company. We analyze which method performs better under which conditions. We compared classification accuracy of them which depends on structure and size of train set and increasing numbers of class. The experimental results indicate that Naive Bayesian learning performs better, while Centroid-Based Classification is more robust in terms of classification accuracy.

Keywords

References

  1. Apte. C. and Damerau. F.(1994), Auromared Learning of Decision Rules for Text Categorizarion, ACMTOIS, 12(3), 233-251
  2. Castelli. V. and Cover. M. T.(1995), On the Exponential Value of Labeled Samples, Pattern Recognition Letters, 16(1), 105-111 https://doi.org/10.1016/0167-8655(94)00074-D
  3. Cohen. W. W.(1996), Learning Rule that ClassifY E-Mail, AAAI spring symposium
  4. Diao. Y, Lu. H. and Wu. O.(2000), A Comparative Study of Classification Based Personal E-mail Filtering, PAKDD
  5. Dietterich. T. G.(1998), Approximate statistical tests for comparing supervised classificarion learning algorithms, Neural Computation, 10(7)
  6. Dumais. S. S., Heckerrnan, D. and Horvitz. E.(1998), A Bayesian Approach to Filtering Junk e-mail, AAAI Technical Report WS- 98-05
  7. Han(Sam). E. H. and Karypis. G.(2000), Centroid-Based Document Classification : Analysis & Experimental Results, PAKDD
  8. Han, K R, Sun, B. K, Han, S. T. and Rim, K W.(2000), A Study on Development of Auromaric Categorization System for Internet Documents, Korea Information Processing Society, 79(9), 2867 -2875
  9. Hur, J. H., Choi, J. H., Lee, J. H., Kim, J. B. and Rim, K W.(2001), An Automatic Classification Sysrem of Korean Documents Using Weighr for Keywords of Document and Word Cluster, Korea Information Processing. Society, 8(5), 447-454
  10. Hwang, H. S.(2001), Developing e-Mail Classifier for Front end e-CRM, Masters Thesis, Dongguk University
  11. Ko, S. J. and Lee, J. H.(2001), Bayesian Automatic Document Categorization Using Apriori-Genetic Algorithm, Korea Information Processing. Society, 8(3), 251-260
  12. Lewis. D and Ringuette. M.(l998), Comparison of two learning algorithms for text categorization, In Tenth European Conference on Machine Learning
  13. LG Economic Research institute.(2000), E-mail Marketing Strategy, Weekly Economy No. 593
  14. Liere. Rand Tadepalli. P.(l996), The Use of Active Learning in' Text Categorization, Working notes of the AAAI Spring Symposium on Machine Learning, Stan lord
  15. McCallum. A. and Nigam. K(1998), A comparison of event models for naive bayes text classification, In AAAI-98 Workshop on Learning for Text Categorization
  16. Mitchell. T. M.(1997), Machine Learning, The McGraw-Hill Company
  17. Salton. G.(1989), Automatic Text Processing.: The Transformation, Analysis, and Retrieval of Information by Computer, Addison Wesley
  18. Yang. Y(1994), Expert network: Effective and efficient learning from human decisions in text categorization and retrieval, In SIGIR-94
  19. Yoon, J. S.(2001), Improving Naive Bayesian e-Mail Classifier Accuracy by Bagging and Boosting, Masters Thesis, Dongguk: University