Browse > Article
http://dx.doi.org/10.5762/KAIS.2010.11.7.2595

Korean Mobile Spam Filtering System Considering Characteristics of Text Messages  

Sohn, Dae-Neung (Innoace Co., Ltd.)
Lee, Jung-Tae (Department of Computer and Radio Communications Engineering, Korea University)
Lee, Seung-Wook (Department of Computer and Radio Communications Engineering, Korea University)
Shin, Joong-Hwi (Pantech Co., Ltd.)
Rim, Hae-Chang (Division of Computer and Communications Engineering, Korea University)
Publication Information
Journal of the Korea Academia-Industrial cooperation Society / v.11, no.7, 2010 , pp. 2595-2602 More about this Journal
Abstract
This paper introduces a mobile spam filtering system that considers the style of short text messages sent to mobile phones for detecting spam. The proposed system not only relies on the occurrence of content words as previously suggested but additionally leverages the style information to reduce critical cases in which legitimate messages containing spam words are mis-classified as spam. Moreover, the accuracy of spam classification is improved by normalizing the messages through the correction of word spacing and spelling errors. Experiment results using real world Korean text messages show that the proposed system is effective for Korean mobile spam filtering.
Keywords
Mobile spam filtering; Stylistic information; Text normalization;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 O. Uzuner et al., "A comparative study of language models for book and author recognition", Proc. of 2nd International Joint Conference on Natural Language Processing, pp. 969-980, 2005.
2 박소영 외, "문장성분의 다양한 자질을 이용한 한국어 구문분석 모델", 한국정보처리학회 논문지(B), 제11권, 제 6호, pp. 743-748, 2004.   과학기술학회마을   DOI
3 이상주 외, "품사태깅을 위한 어휘문맥 의존규칙의 말뭉치기반 중의성주도 학습", 한국정보과학회 논문지(B), 제 26권, 제 1호, pp. 178-189, 1999.
4 Y. Yang et al., "A comparative study on feature selection in text categorization", Proc. of 14th International Conference on Machine Learning, pp. 412-420, 1997.
5 G. V. Cormack et al., "TREC 2005 spam track overview", Proc. of 2005 Text REtrieval Conference, 2005.
6 A. Q. Morton, "The authorship of greek prose", Journal of the Royal Statistical Society Series A(General), pp. 169-233, 1965.
7 G. V. Cormack et al., "Feature engineering for mobile (SMS) spam filtering", Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 871-872, 2007.
8 M. Koppel et al., "Automatically categorizing written texts by author gender", Literary and Linguistic Computing, Vol. 17, No. 4, pp. 401-412, 2002.   DOI
9 T. C. Mendenhall, "The characteristic curves of composition," Science, pp. 237-246, 1887.
10 G. U. Yule, "On sentence-length as a statistical characteristic of style in prose: with application to two cases of disputed authorship", Biometrika, Vol. 30, No. 3-4, pp. 363-390, 1939.   DOI   ScienceOn
11 F. Mosteller et al., Applied Bayesian and classical inference: the case of the Federalist papers, Springer Verlag, 1984.
12 E. Stamatatos et al., "Automatic text categorization in terms of genre and author", Computational Linguistics, Vol. 26, No. 4, pp. 471-495, 2000.   DOI
13 J.-H. Byun et al., "Three-Phase Text Error Correction Model for Korean SMS Messages", IEICE Transactions on Information and Systems, Vol. E92-D, No. 5, pp. 1213-1217, 2009.   DOI
14 A. L. Berger et al., "A maximum entropy approach to natural language processing", Computational Linguistics, Vol. 22, No. 1, pp. 39-71, 1996.
15 L. Zhang et al., "Filtering junk mail with a maximum entropy model", Proc. of 20th International Conference on Computer Processing of Oriental Languages, pp. 446-453, 2003.
16 K. Nigam et al., "Using maximum entropy for text classification", Proc. of the IJCAI-99 Workshop on Machine Learning for Information Filtering, pp. 61-67, 1999.
17 G. V. Cormack et al., "Spam filtering for short messages", Proc. of ACM Sixteenth Conference on Information and Knowledge Management, pp. 313-320, 2007.
18 정보통신부 뉴스, "이메일 스팸 계속 감소 추세", 7월, 2007.
19 J. M. Gomez et al., "Content Based SMS Spam Filtering", Proc. of the 2006 ACM Symposium on Document Engineering, pp. 107-114, 2006.