Performance Comparison of Naive Bayesian Learning and Centroid-Based Classification for e-Mail Classification

Kim, Kuk-Pyo;Kwon, Young-S.;

IE interfaces (산업공학)

Volume 18 Issue 1
/
Pages.10-21
/
2005
/
1225-0996(pISSN)
/
2234-6465(eISSN)

Korean Institute of Industrial Engineers (대한산업공학회)

Performance Comparison of Naive Bayesian Learning and Centroid-Based Classification for e-Mail Classification

전자메일 분류를 위한 나이브 베이지안 학습과 중심점 기반 분류의 성능 비교

Kim, Kuk-Pyo (Department of Industrial & Systems Engineering, Dongguk University) ;
Kwon, Young-S. (Department of Industrial & Systems Engineering, Dongguk University)

김국표 (동국대학교 산업시스템공학부) ;
권영식 (동국대학교 산업시스템공학부)

Received : 2003.10.08
Accepted : 2004.12.10
Published : 2005.03.31

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

With the increasing proliferation of World Wide Web, electronic mail systems have become very widely used communication tools. Researches on e-mail classification have been very important in that e-mail classification system is a major engine for e-mail response management systems which mine unstructured e-mail messages and automatically categorize them. In this research we compare the performance of Naive Bayesian learning and Centroid-Based Classification using the different data set of an on-line shopping mall and a credit card company. We analyze which method performs better under which conditions. We compared classification accuracy of them which depends on structure and size of train set and increasing numbers of class. The experimental results indicate that Naive Bayesian learning performs better, while Centroid-Based Classification is more robust in terms of classification accuracy.

Keywords

References

Apte. C. and Damerau. F.(1994), Auromared Learning of Decision Rules for Text Categorizarion, ACMTOIS, 12(3), 233-251
Castelli. V. and Cover. M. T.(1995), On the Exponential Value of Labeled Samples, Pattern Recognition Letters, 16(1), 105-111 https://doi.org/10.1016/0167-8655(94)00074-D
Cohen. W. W.(1996), Learning Rule that ClassifY E-Mail, AAAI spring symposium
Diao. Y, Lu. H. and Wu. O.(2000), A Comparative Study of Classification Based Personal E-mail Filtering, PAKDD
Dietterich. T. G.(1998), Approximate statistical tests for comparing supervised classificarion learning algorithms, Neural Computation, 10(7)
Dumais. S. S., Heckerrnan, D. and Horvitz. E.(1998), A Bayesian Approach to Filtering Junk e-mail, AAAI Technical Report WS- 98-05
Han(Sam). E. H. and Karypis. G.(2000), Centroid-Based Document Classification : Analysis & Experimental Results, PAKDD
Han, K R, Sun, B. K, Han, S. T. and Rim, K W.(2000), A Study on Development of Auromaric Categorization System for Internet Documents, Korea Information Processing Society, 79(9), 2867 -2875
Hur, J. H., Choi, J. H., Lee, J. H., Kim, J. B. and Rim, K W.(2001), An Automatic Classification Sysrem of Korean Documents Using Weighr for Keywords of Document and Word Cluster, Korea Information Processing. Society, 8(5), 447-454
Hwang, H. S.(2001), Developing e-Mail Classifier for Front end e-CRM, Masters Thesis, Dongguk University
Ko, S. J. and Lee, J. H.(2001), Bayesian Automatic Document Categorization Using Apriori-Genetic Algorithm, Korea Information Processing. Society, 8(3), 251-260
Lewis. D and Ringuette. M.(l998), Comparison of two learning algorithms for text categorization, In Tenth European Conference on Machine Learning
LG Economic Research institute.(2000), E-mail Marketing Strategy, Weekly Economy No. 593
Liere. Rand Tadepalli. P.(l996), The Use of Active Learning in' Text Categorization, Working notes of the AAAI Spring Symposium on Machine Learning, Stan lord
McCallum. A. and Nigam. K(1998), A comparison of event models for naive bayes text classification, In AAAI-98 Workshop on Learning for Text Categorization
Mitchell. T. M.(1997), Machine Learning, The McGraw-Hill Company
Salton. G.(1989), Automatic Text Processing.: The Transformation, Analysis, and Retrieval of Information by Computer, Addison Wesley
Yang. Y(1994), Expert network: Effective and efficient learning from human decisions in text categorization and retrieval, In SIGIR-94
Yoon, J. S.(2001), Improving Naive Bayesian e-Mail Classifier Accuracy by Bagging and Boosting, Masters Thesis, Dongguk: University

IE interfaces (산업공학)

Performance Comparison of Naive Bayesian Learning and Centroid-Based Classification for e-Mail Classification

전자메일 분류를 위한 나이브 베이지안 학습과 중심점 기반 분류의 성능 비교

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)