Browse > Article
http://dx.doi.org/10.13089/JKIISC.2004.14.2.23

A study on the Filtering of Spam E-mail using n-Gram indexing and Support Vector Machine  

서정우 (고려대학교)
손태식 (고려대학교)
서정택 (국가보안기술연구소)
문종섭 (고려대학교)
Abstract
Because of a rapid growth of internet environment, it is also fast increasing to exchange message using e-mail. But, despite the convenience of e-mail, it is rising a currently bi9 issue to waste their time and cost due to the spam mail in an individual or enterprise. Many kinds of solutions have been studied to solve harmful effects of spam mail. Such typical methods are as follows; pattern matching using the keyword with representative method and method using the probability like Naive Bayesian. In this paper, we propose a classification method of spam mails from normal mails using Support Vector Machine, which has excellent performance in pattern classification problems, to compensate for the problems of existing research. Especially, the proposed method practices efficiently a teaming procedure with a word dictionary including a generated index by the n-Gram. In the conclusion, we verified the proposed method through the accuracy comparison of spm mail separation between an existing research and proposed scheme.
Keywords
Spam Mail Filtering; n-Gram Indexing; Support Vector Machine;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Properties of Support Vector Machines /
[ Pontil.M.;Verri.A. ] / A.I. Memo No.1612;CBCL paper No.152, Massachusetts Institute of Technology
2 /
[ Cristianini N.;Shawetaylor.J. ] / An Introduction to Support Vector Machines
3 Learning Rules that Classify E-Mail /
[ William W. Cohen ] / AAAAI spring Symposium:Machine Learning in Information Access
4 Boosting Trees for Anti-Spam Email Filtering /
[ Xavier.C.;Lluis.M. ] / 4th International conference on Recent Advances in Natural Language Processing
5 Learning to Filter Spam E0Mail:A Comparison of a Naive Bayesian and a Memory-Based Approach /
[ Ion.A.;Georgios.P.;Vangelis.K.;Georgios.S.;Constantine.A. ] / PKDD 2000
6 스팸메일, 너, 나가있어! /
[ FUJITSU ] / 스팸메일의 유통 현황과 문제점 및 대책방향
7 MySVM-Manual /
[ Ruping.S. ] / Lehrstuhl Informatick Ⅷ
8 /
[ Fix.E.;Hodges.J.L. ] / Discriminatory Analysis:Nonparametric Discrimination:Consistency Properties Report No.4. Project No.21-49-004
9 경험적 정보를 이용한 k-nn기반 한국어 문서 분류기의 개선 /
[ 임희석;남기춘 ] / 컴퓨터교육학회지 논문지
10 Text Categorization with Support Vector Machine:Learning with Many Relevant Features /
[ Joachmims.T ] / European Conference on Machine Learning
11 Support Vector Networks /
[ Cortes.C;Vapnik.V. ] / Machine Learning
12 모멘트를 이용한 Support Vector Machines의 학습성능 개선 /
[ 조용현 ] / 한국정보처리학회 논문지   과학기술학회마을
13 /
[ Takuya.I;shigeo.A. ] / Fuzzy Support Vector Machines for Pattern Classification
14 한글 문서의 효과적인 검색을 위한 n-Gram 기반의 색인 방법 /
[ 이준호;안정수;박현주;김명호 ] / 정보관리학회지   과학기술학회마을
15 /
[ Joachmims.T. ] / mySVM-a support vector Machine
16 An Experimental Comparison of Naive Bayesian and Keyward-Based Anto-Spam Filtering with Personal E-mail Messages /
[ Androutsopoulos.I.;Koutsias.J.;Konstantinos V. Chandrinos;Constantine D;Spyropoulos ] / 23rd ACM International Confernce on Reserch and Development in Information Retrieval
17 Simple Learning Algorithms for Training report /
[ Campbell.C;Cristianini.N. ] / Technicl report
18 a Comparative Study of Classification Based Personal E-mail Filtering /
[ Yanlei.D;Hongjun.L;Dekai.W. ] / 4th Pacific-Asia conference on Knowledge Discovery and Data Mining(PAKDD'00)
19 A Bayesian Appriach to Filtering Junk E-Mail /
[ Mehran.S;Susan.d;David.H;Eric.H ] / In AAAI-98 Workshop on Learning for Text Categorization