Browse > Article
http://dx.doi.org/10.12673/jant.2011.15.2.287

Recognition Method of Korean Abnormal Language for Spam Mail Filtering  

Ahn, Hee-Kook (강원대학교)
Han, Uk-Pyo (강원대학교)
Shin, Seung-Ho (강원대학교)
Yang, Dong-Il (한림성심대학)
Roh, Hee-Young (강원대학교)
Abstract
As electronic mails are being widely used for facility and speedness of information communication, as the amount of spam mails which have malice and advertisement increase and cause lots of social and economic problem. A number of approaches have been proposed to alleviate the impact of spam. These approaches can be categorized into pre-acceptance and post-acceptance methods. Post-acceptance methods include bayesian filters, collaborative filtering and e-mail prioritization which are based on words or sentances. But, spammers are changing those characteristics and sending to avoid filtering system. In the case of Korean, the abnormal usages can be much more than other languages because syllable is composed of chosung, jungsung, and jongsung. Existing formal expressions and learning algorithms have the limits to meet with those changes promptly and efficiently. So, we present an methods for recognizing Korean abnormal language(Koral) to improve accuracy and efficiency of filtering system. The method is based on syllabic than word and Smith-waterman algorithm. Through the experiment on filter keyword and e-mail extracted from mail server, we confirmed that Koral is recognized exactly according to similarity level. The required time and space costs are within the permitted limit.
Keywords
Spam Mail Filtering; Korean Abnormal Language; Smith-Waterman Algorithm; Keyword Similarity;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 한국인터넷진흥원, "2010 국가정보보호백서(National Informatization Protection White Paper)", pp. 107-109, 2010.
2 이우권. "사이버공간의 스팸메일 규제정책에 관한 연구", 규제연구 제 13권 2호, 12월, 2004.
3 L. H. Gomes and C. Cazita, "Characterizing a Spam Traffic.," in Proc. 2004 Internet Measurement Conference, Taormina, Sicily, Italy. Oct. 2004.
4 V. Keselj, E. Milios, A. Tuttle, S. Wang, and R. Zhang. "TREC 2005 Spam Track: Spam Filtering Using N-gram-based Techniques", Proceedings of Text REtrieval Conference, 2005.
5 김현준, 정재은, 조근식, "가중치가 부여된 베이지안 분류자를 이용한 스팸메일 필터링 시스템 " 정보과학회논문지, 31권 8호, pp.1092-1100, 2004.
6 R. Segal. "IBM SpamGuru on the TREC 2005 Spam Track," Proceedings of Text REtrieval Conference, 2005.
7 Al Brakto, B. Filipic. "Spam Filtering Using Character-Level Markov Models: Experiments for the TREC 2005 Spam Track," Proceedings fo Text REtrieval Conference, 2005.
8 L. A. Breyer. "DBACL at the TREC 2005," Proceedings of Text REtrieval Conference, 2005.
9 http://www.csie.ntu.edu.tw/-cjlin/libsvm
10 공미경, 이경순, "스팸성 자질과 URL 자질의 공동 학습을 이용한 최대 엔트로피 기반 스팸메일 필터 시스템," 정보처리학회 논문지B, 15-B권 1호, pp.61-68, 2008.
11 F. Zhou, L. Zhuang, B. Zhao, L. Huang, A. Joseph, and J. Kubiatozicz, "Approximate object location and spam filtering on peer-to-peer systems," in Proc. Middleware, Rio de Janeiro, Brazil, June 2003.
12 이성욱, "카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터," 정보과학회 논문지B, 17-B권 3호, pp.249-254, 2010.
13 S. B. Needleman and C. D. Wunsch. "A general method applicable to the search for similarities in the amino acid sequences of two proteins," Journal of Molecular Biology. vol. 48: 443-453, 1970.   DOI
14 Wagner, R. A. and Fischer, M. J. "The string-to-string correction problem," J. ACM 21, 168-173, Jan. 1974.   DOI   ScienceOn
15 T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, vol. 147(1): 195-197, Mar. 1981.   DOI