Browse > Article

A Three-Step Preprocessing Algorithm for Enhanced Classification of E-Mail Recommendation System  

Jeong Ok-Ran (이화여자대학교 공대 컴퓨터학과)
Cho Dong-Sub (이화여자대학교 공대 컴퓨터학과)
Publication Information
The Transactions of the Korean Institute of Electrical Engineers D / v.54, no.4, 2005 , pp. 251-258 More about this Journal
Abstract
Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier's performance. This research identifies e-mail document's characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document's atypical characteristics. In the first 5go, uncertain based sampling algorithm that used Mean Absolute Deviation(MAD), is used to address the question of selection learning document for the rule generation at the time of classification. In the subsequent stage, Weighted vlaue assigning method by attribute is applied to increase the discriminating capability of the terms that appear on the title on the e-mail document characteristic level. in the third and last stage, accuracy level during classification by each category is increased by using Naive Bayesian Presumptive Algorithm's Dynamic Threshold. And, we implemented an E-Mail Recommendtion System using a three-step preprocessing algorithm the enable users for direct and optimal classification with the recommendation of the applicable category when a mail arrives.
Keywords
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Tom Mitchell, MaGraw Hill, 'Machine Learning', McGRAW-HILL International Edition, 1997
2 M. Trensh, N. Palmer, and A. Luniewski. Type Classification of Semi-structured Documents. In Proceedings of the 21st ACM SIGMOD International Conference on Management of Data, 1995
3 강영순, 이용배, 김태현, 조숙현, 맹성현, '전자우편문서의 효율적인 분류을 위한 전처리', 제 29회 춘계학술발표회, 한국정보과학회, 제29권 제1호 pp. 493-495, 2002   과학기술학회마을
4 정옥란, 조동섭, '개인화된 분류를 위한 웹 메일 필터링 에이전트', 정보처리학회논문지B, 제 10-B권 제7호, pp.853-862, 2003   과학기술학회마을   DOI
5 Pedro Domingos and Michael Pazzani. 'Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier,' In Proceedings of the 13thInternational Conference on Machine Learning, pp105-112, 1996
6 Ok-Ran Jeong, Dong-Sub Cho, 'A Personalized Recommendation Agent System for E-Mail Document Classification' , Computational Science and Its Applications-ICCSA 2004, LNCS3045, Springer Verlag, Vol 3, pp.558-565, 2004   DOI
7 David D. Lewis and William A.Gale. A Sequential Algorithm for Training Text Classifiers. In Proceedings of the 17thAnnual International ACM -SIGIR Conference on Research and Development in Information Retrieval, pp. 3-12, 1994
8 David D. Lewis and Jason Catlett. Heterogeneous Uncertainty Sampling for Supervised Learning. In Proceedings of the 11th International Conference on Machine Learning, pages 148-156, 1994
9 F.Sebastiani, 'Machine Learning in Automated Text Categorization,' Technical Report IEI-B4-31-19
10 Ian H. written and Eibe Frank, 'Data Mining,' Morgan Kaufmann Publishers, Inc., 2000
11 Yiming Yang, Jan O. Perdersen, 'A Comparative Study on Feature Selection in Text Cateforization', Proc. of ICML97, pp.412-420, 1997
12 M. Trensh, N. Palmer, and A. Luniewski, 'Type Classication of Semi-structured Documents,' In Proceedings of the 21st ACM SIGMOD International Conference on Management of Data, 1995