Spam-Filtering by Identifying Automatically Generated Email Accounts

Lee Sangho;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 32 Issue 5
/
Pages.378-384
/
2005
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Spam-Filtering by Identifying Automatically Generated Email Accounts

자동 생성 메일계정 인식을 통한 스팸 필터링

Lee Sangho

이상호 (한국산업기술대학교 게임공학과)

Published : 2005.05.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we describe a novel method of spam-filtering to improve the performance of conventional spam-filtering systems. Conventional systems filter emails by investigating words distribution in email headers or bodies. Nowadays, spammers begin making email accounts in web-based email service sites and sending emails as if they are not spams. Investigating the email accounts of those spams, we notice that there is a large difference between the automatically generated accounts and ordinaries. Based on that difference, incoming emails are classified into spam/non-spam classes. To classify emails from only account strings, we used decision trees, which have been generally used for conventional pattern classification problems. We collected about 2.15 million account strings from email service sites, and our account checker resulted in the accuracy of $96.3\%$. The previous filter system with the checker yielded the improved filtering performance.

본 논문에서는 기존의 스팸 메일 필터링 시스템의 성능을 향상시키기 위한 새로운 필터링 방법을 설명한다. 대부분의 스팸 필터링 시스템은 메일의 제목이나 혹은 그 문서 안에서 발견되는 단어들의 분포를 조사하여 이루어진다. 한편, 최근의 스팸 발송자들은 메일 서비스 업체가 제공하는 웹메일 계정을 이용하여 스팸을 발송하기 시작하였다 이렇게 웹메일을 통해 발송되는 스팸 메일의 특징을 보면, 그 메일 계정이 자동으로 생성되기 때문에 일반 사용자의 메일 계정과 많은 차이를 보인다. 본 연구에서는 이러한 점에 착안하여, 발송자의 메일 계정이 자동 생성된 메일 계정인지를 예측하고 이를 통해 스팸을 필터링하고자 한다. 메일 계정을 분류하기 위해서는 패턴 인식 문제에서 사용되어 온 결정 트리를 이용하였으며, 메일 서비스 업체로부터 수집된 약 215 만개의 메일 계정에 대해 실험하였다. 실험 결과, $96.3\%$의 정확률을 나타내었으며, 기존 시스템과 연동하여 새로운 형태의 스팸을 필터링할 수 있었다.

Keywords

References

조 한철, 조 근식, 나이브 베이지안 분류자와 메세지 규칙을 이용한 스팸메일 필터링 시스템, 한국정보과학회 봄 학술발표논문집, 제 29권, 제 1호, pp. 223-225, 2002
Paul Graham, A plan for spam, http://www.paulgrapham.com/spam.html, 2003
Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Georgios Sakkis, Constantine D. Spyropoulos, and Panagiotis Stamatopoulos, Learning to filter spam e-mails: A comparison of a naive bayesian and a memory-based approach, Proceedings of Machine Learning and Textual Information Access, pp. 1-13, 2000
신 경식, 안 수산, 데이타 마이닝 기법을 활용한 스팸메일 분류 및 예측모형 구축에 관한 연구, 경영농촌, 제 20권, 제 2호, pp. 89-104, 2002
민 도식, 송 무희, 손 기준, 이 상조, SVM 분류 알고리즘을 이용한 스팸메일 필터링, 한국정보과학회 봄 학술발표논문집, 제 30권, 제 1호, pp. 552-554, 2003
Aleksander Kolcz and Joshua Alspector, SVM-based filtering of e-mail spam with content-specific misclassification costs, Proceedings of the TextDM'01 Workshop on Text Mining, 2001
William S. Yerazunis, The spam-filtering accuracy plateau at 99.9% accuracy and how to get past it, MIT Spam Conference, 2004
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees, Wadsworth Statistics/Probability Series, Belmont, CA, 1984
S.R Safavian and D. Landgrebe, A survey of decision tree classifier methodology, IEEE Trans. on Systems, Man, And Cybernetics, Vol. 21, No. 3, pp. 660-674, 1991 https://doi.org/10.1109/21.97458
J. Graham-Cumming, How to beat an Adaptive Spam Filter, MIT Spam Conference, 2004
J. Graham-Cumming, The Spammer's Compendium, MIT Spam Conference, 2004
P.A. Chou, Optimal partitioning for classification and regression trees, IEEE Trans. on PAMI, Vol. 13, No.4, pp. 340-354, 1991 https://doi.org/10.1109/34.88569
J.R. Quinlan, C4.5: Programs for Machine Learning, San Mateo: Morgan Kaufmann, 1993
H. Ney, S. Martin, and F. Wessel, Statistical language modeling using leaving-one-out, Steve Young and Gerrit Bloothooft, editors, Corpus-Based Methods in Language and Speech Processing, pp. 174-207. Kluwer Academic Publishers, 1997

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Spam-Filtering by Identifying Automatically Generated Email Accounts

자동 생성 메일계정 인식을 통한 스팸 필터링

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)