Browse > Article

Performance Comparison of Naive Bayesian Learning and Centroid-Based Classification for e-Mail Classification  

Kim, Kuk-Pyo (Department of Industrial & Systems Engineering, Dongguk University)
Kwon, Young-S. (Department of Industrial & Systems Engineering, Dongguk University)
Publication Information
IE interfaces / v.18, no.1, 2005 , pp. 10-21 More about this Journal
Abstract
With the increasing proliferation of World Wide Web, electronic mail systems have become very widely used communication tools. Researches on e-mail classification have been very important in that e-mail classification system is a major engine for e-mail response management systems which mine unstructured e-mail messages and automatically categorize them. In this research we compare the performance of Naive Bayesian learning and Centroid-Based Classification using the different data set of an on-line shopping mall and a credit card company. We analyze which method performs better under which conditions. We compared classification accuracy of them which depends on structure and size of train set and increasing numbers of class. The experimental results indicate that Naive Bayesian learning performs better, while Centroid-Based Classification is more robust in terms of classification accuracy.
Keywords
classification; text mining; naive bayesian classifier; centroid-based classifier;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Apte. C. and Damerau. F.(1994), Auromared Learning of Decision Rules for Text Categorizarion, ACMTOIS, 12(3), 233-251
2 Castelli. V. and Cover. M. T.(1995), On the Exponential Value of Labeled Samples, Pattern Recognition Letters, 16(1), 105-111   DOI   ScienceOn
3 Han, K R, Sun, B. K, Han, S. T. and Rim, K W.(2000), A Study on Development of Auromaric Categorization System for Internet Documents, Korea Information Processing Society, 79(9), 2867 -2875
4 Ko, S. J. and Lee, J. H.(2001), Bayesian Automatic Document Categorization Using Apriori-Genetic Algorithm, Korea Information Processing. Society, 8(3), 251-260
5 Dietterich. T. G.(1998), Approximate statistical tests for comparing supervised classificarion learning algorithms, Neural Computation, 10(7)
6 Cohen. W. W.(1996), Learning Rule that ClassifY E-Mail, AAAI spring symposium
7 Han(Sam). E. H. and Karypis. G.(2000), Centroid-Based Document Classification : Analysis & Experimental Results, PAKDD
8 LG Economic Research institute.(2000), E-mail Marketing Strategy, Weekly Economy No. 593
9 Salton. G.(1989), Automatic Text Processing.: The Transformation, Analysis, and Retrieval of Information by Computer, Addison Wesley
10 Liere. Rand Tadepalli. P.(l996), The Use of Active Learning in' Text Categorization, Working notes of the AAAI Spring Symposium on Machine Learning, Stan lord
11 Mitchell. T. M.(1997), Machine Learning, The McGraw-Hill Company
12 Yang. Y(1994), Expert network: Effective and efficient learning from human decisions in text categorization and retrieval, In SIGIR-94
13 McCallum. A. and Nigam. K(1998), A comparison of event models for naive bayes text classification, In AAAI-98 Workshop on Learning for Text Categorization
14 Diao. Y, Lu. H. and Wu. O.(2000), A Comparative Study of Classification Based Personal E-mail Filtering, PAKDD
15 Hur, J. H., Choi, J. H., Lee, J. H., Kim, J. B. and Rim, K W.(2001), An Automatic Classification Sysrem of Korean Documents Using Weighr for Keywords of Document and Word Cluster, Korea Information Processing. Society, 8(5), 447-454
16 Yoon, J. S.(2001), Improving Naive Bayesian e-Mail Classifier Accuracy by Bagging and Boosting, Masters Thesis, Dongguk: University
17 Dumais. S. S., Heckerrnan, D. and Horvitz. E.(1998), A Bayesian Approach to Filtering Junk e-mail, AAAI Technical Report WS- 98-05
18 Hwang, H. S.(2001), Developing e-Mail Classifier for Front end e-CRM, Masters Thesis, Dongguk University
19 Lewis. D and Ringuette. M.(l998), Comparison of two learning algorithms for text categorization, In Tenth European Conference on Machine Learning