Browse > Article
http://dx.doi.org/10.3745/KTSDE.2014.3.7.271

A Normalization Method of Distorted Korean SMS Sentences for Spam Message Filtering  

Kang, Seung-Shik (국민대학교 컴퓨터공학부)
Publication Information
KIPS Transactions on Software and Data Engineering / v.3, no.7, 2014 , pp. 271-276 More about this Journal
Abstract
Short message service(SMS) in a mobile communication environment is a very convenient method. However, it caused a serious side effect of generating spam messages for advertisement. Those who send spam messages distort or deform SMS sentences to avoid the messages being filtered by automatic filtering system. In order to increase the performance of spam filtering system, we need to recover the distorted sentences into normal sentences. This paper proposes a method of normalizing the various types of distorted sentence and extracting keywords through automatic word spacing and compound noun decomposition.
Keywords
Spam Message; SMS Filtering; Sentence Normalization; Automatic Word Spacing; Keyword Extraction;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 B. Y. Kim, A Study on the Morphological Characteristics of Communicative Languages by the Statistical Frequency, Master Thesis, Kookmin University, 2002.
2 S. J. Lee and D. J. Choi, "Personalized mobile junk message filtering system," Journal of the Korea Contents Association, pp.122-135, 2011.   과학기술학회마을   DOI   ScienceOn
3 S. S. Kang, "Junk-mail filtering by mail address validation and title-content weighting," Journal of the Korea Multimedia Society, Vol.9, No.2, pp.255-263, 2006.   과학기술학회마을
4 K. Tretyakov, "Machine learning techniques in spam filtering," Data Mining Problem-oriented Seminar, MTAT. 03. 177, pp.60-79, 2004.
5 L. Zhang, J. Zhu, and T. Yao, "An evaluation of statistical spam filtering techniques," ACM Transactions on Asian Language Information Processing(TALIP), Vol.3, No.4, pp.243-269, 2004.   DOI
6 C. Brutlag and J. Meek, "Challenges of the email domain for text classification," Proceedings of the 17th International Conference on Machine Learning, 2000.
7 M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, "A Bayesian approach to filtering junk E-mail," Proceedings of the AAAI Workshop, pp.55-62, 1998.
8 M. Salib, "MeatSlicer: Spam classification with Naive Bayes and smart heuristics," Proceedings of the Spam Conference, MA, Jan., 2003.
9 K. Schneider, "A comparison of event models for Naive Bayes anti-spam E-mail filtering," Proceedings of 10th Conference of the European Chapter of the Association for Computational Linguistics(EACL 2003), pp.307-314, 2003.
10 S. S. Kang and K. B. Hwang, "A language independent n-gram model for word segmentation," Proceedings of AI'2006, pp.557-565, 2006.
11 S. S. Kang, "A decomposition algorithm of Korean compound nouns," Journal of KIISE(B), Vol.25, No.1, pp.172-182, 1998.