Browse > Article
http://dx.doi.org/10.3837/tiis.2014.06.022

The Adaptive SPAM Mail Detection System using Clustering based on Text Mining  

Hong, Sung-Sam (Department of Computer Engineering, Gachon University)
Kong, Jong-Hwan (Department of Computer Engineering, Gachon University)
Han, Myung-Mook (Department of Computer Engineering, Gachon University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.8, no.6, 2014 , pp. 2186-2196 More about this Journal
Abstract
Spam mail is one of the most general mail dysfunctions, which may cause psychological damage to internet users. As internet usage increases, the amount of spam mail has also gradually increased. Indiscriminate sending, in particular, occurs when spam mail is sent using smart phones or tablets connected to wireless networks. Spam mail consists of approximately 68% of mail traffic; however, it is believed that the true percentage of spam mail is at a much more severe level. In order to analyze and detect spam mail, we introduce a technique based on spam mail characteristics and text mining; in particular, spam mail is detected by extracting the linguistic analysis and language processing. Existing spam mail is analyzed, and hidden spam signatures are extracted using text clustering. Our proposed method utilizes a text mining system to improve the detection and error detection rates for existing spam mail and to respond to new spam mail types.
Keywords
SPAM; Text Mining; Text Clustering; Text Classification; Detection;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Ki-joon Lee, Jin Myung Lee, Woo Ju Lee, "The Search Method of Blog using K-means, " The Proceeding of Korea Intelligent Information System Society, pp 269-275, 2009.
2 http://www.r-project.org/.
3 http://cran.r-project.org/web/packages/tm/index.html.
4 Drew Conway and John Myles White, Machine Learning for Hackers, O'Reilly Media, 2012
5 http://cran.r-project.org/web/packages/wordcloud/.
6 Androutsopoulos, J. Koutsias, K.V. Chandrinos, George Paliouras, and C.D. Spyropoulos, "An Evaluation of Naive Bayesian Anti-Spam Filtering," in Proc. of 11th European Conference on Machine Learning (ECML 2000), pp. 9-17, 2000.
7 V. Metsis, I. Androutsopoulos and G. Paliouras, "Spam Filtering with Naive Bayes - Which Naive Bayes?," in Proc. of Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), 2006.
8 M. Basavaraju and Dr. R. Prabhakar, "A Novel Method of Spam Mail Detection using Text Based Clustering Approach," International Journal of Computer Application, vol 5, no 4, 2010.
9 Alaa El-Halees, "Filtering Spam E-mail from Mixed Arabic and English Messages: A Comparison of Machine Learning Techniques," The International Arab Journal of Information Technology, Vol. 6, No. 1, pp 52-59, 2007.
10 R. Sibson, "SLINK: an optimally efficient algorithm for the single-link cluster method," The Computer Journal (British Computer Society), Vol 16, No.1, pp 30-34, 1973.
11 Ho-Sub Lee, Jae-Ik Cho, Man-Hyun Jung and Jong-Sub Moon, "An Approach to Detect Spam E-mail with Abnormal Character," in Journel of Korea Institute of Information Security & Cryptology, Vol.8, No.6, pp 129-137, 2008.
12 Hearst and Marti A. "Untangling text data mining," in Proc. of Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 3-10, 1999.
13 Altman, N. S. "An introduction to kernel and nearest-neighbor nonparametric regression," The American Statistician, Vol.46, No.3, pp 175-185, 1992.
14 MacQueen, J. B., "Some Methods for classification and Analysis of Multivariate Observations," in Proc. of Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, 1967.