A Spam Mail Classification Using Link Structure Analysis

Rhee, Shin-Young;Khil, A-Ra;Kim, Myung-Won;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 34 Issue 1
/
Pages.30-39
/
2007
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

A Spam Mail Classification Using Link Structure Analysis

링크구조분석을 이용한 스팸메일 분류

이신영 ((주)시뮬레이션연구소) ;
길아라 (숭실대학교 컴퓨터학부) ;
김명원 (숭실대학교 컴퓨터학부)

Published : 2007.01.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The existing content-based spam mail filtering algorithms have difficulties in filtering spam mails when e-mails contain images but little text. In this thesis we propose an efficient spam mail classification algorithm that utilizes the link structure of e-mails. We compute the number of hyperlinks in an e-mail and the in-link frequencies of the web pages hyperlinked in the e-mail. Using these two features we classify spam mails and legitimate mails based on the decision tree trained for spam mail classification. We also suggest a hybrid system combining three different algorithms by majority voting: the link structure analysis algorithm, a modified link structure analysis algorithm, in which only the host part of the hyperlinked pages of an e-mail is used for link structure analysis, and the content-based method using SVM (support vector machines). The experimental results show that the link structure analysis algorithm slightly outperforms the existing content-based method with the accuracy of 94.8%. Moreover, the hybrid system achieves the accuracy of 97.0%, which is a significant performance improvement over the existing method.

기존의 내용기반 스팸메일 분류는 전자메일이 이미지를 많이 가지고 있고 텍스트는 적게 가지고 있을 경우에는 내용을 분석하기 어려우므로 스팸메일을 분류하는 데 한계가 있다. 이와 같은 문제를 해결하기 위하여 본 논문에서는 전자메일의 구조를 분석하는 링크구조분석 스팸메일 분류 알고리즘을 제안한다. 이것은 전자메일 안의 하이퍼링크의 개수와 하이퍼링크가 가리키는 웹 문서들이 다른 웹 문서에 의해 링크된 수를 측정하여 전자메일의 중요도를 계산한 후 의사결정트리를 학습하여 스팸메일과 정상메일을 분류한다. 또한 위의 링크구조분석 알고리즘과 하이퍼링크의 서버 주소만을 이용한 변형된 링크구조 분석 알고리즘, 그리고 SVM(support vector machine)을 이용한 내용기반 방법을 다수결 원칙으로 결합한 통합 스팸메일 분류 시스템을 제안한다. 실험 결과, 제안한 링크구조분석 알고리즘은 기존의 내용기반 방법 보다 스팸메일 분류 정확도가 94.8%로 약간 향상되었으며 또한 통합 스팸메일 분류 시스템도 내용기반 방법과 비교하여 향상된 97.7%를 나타냈다.

Keywords

References

민도식, 송무희, 손기준, 이상조, 'SVM 분류 알고리즘을 이용한 스팸메일 필터링', 한국정보과학회 2003년 춘계학술대회, 2003
Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C., 'An evaluation of naive bayesian anti-spam filtering,' In Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning(ECML 2000), 2000
서정우, 손태식, 서정택, 문종섭, 'Support Vector Machine을 사용한 스팸메일 탐지 방안', 한국정보과학회 2003 추계학술대회, 2003
Drucker, H., Wu, D., 'Support vector machines for spam categorization,' IEEE Transactions on Neural Networks, VOL. 10, NO. 5, 1999
Boykin, O., Roychowdhury, V., 'Personal email network: an effective anti-spam tool,' Arxiv preprint cond-mat/0402143, 2004 - arxiv.org, 2004
Page, L., Brin, S., Motwani, R., Winograd, T., 'The pagerank citation ranking: bringing order to the web,' Technical Report, Stanford University, Stanford, CA, 1998
Vieira, C., Mather, P., 'A comparative study of multiple classifier combination methods in remote sensing,' In Proceedings of the IC-AI'2000, Vol. 1, pp.39-46, 2000
i-config: Internet Content Filtering Group, http://www.iit.demokritos.gr/skel/i-config/
SpamArchive.org, http://spamarchive.org/
The Apache SpamAssassin Project, http://spamassassin.apache.org/
Carreras, X., Marquez, L., 'Boosting trees for antispam email filtering,' In Proceedings of RANLP-2001, 4th International Conference on Recent Advances in Natural Language Processing, 2001
ZHANG, L., ZHU, J., YAO, T., 'An evaluation of statistical spam filtering techniques,' ACM Transactions on Asian Language Information Processing, Vol.3, No. 4, pp.243-269, 2004 https://doi.org/10.1145/1039621.1039625
YALE(Yet Another Learning Environment), http://rapid-i.com/

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

A Spam Mail Classification Using Link Structure Analysis

링크구조분석을 이용한 스팸메일 분류

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)