Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2006.13B.2.163

Performance Improvement of Spam Filtering Using User Actions  

Kim Jae-Hoon (한국해양대학교 컴퓨터공학과)
Kim Kang-Min ((주)태광ENG 연구소)
Abstract
With rapidly developing Internet applications, an e-mail has been considered as one of the most popular methods for exchanging information. The e-mail, however, has a serious problem that users ran receive a lot of unwanted e-mails, what we called, spam mails, which cause big problems economically as well as socially. In order to block and filter out the spam mails, many researchers and companies have performed many sorts of research on spam filtering. In general, users of e-mail have different criteria on deciding if an e-mail is spam or not. Furthermore, in e-mail client systems, users do different actions according to a spam mail or not. In this paper, we propose a mail filtering system using such user actions. The proposed system consists of two steps: One is an action inference step to draw user actions from an e-mail and the other is a mail classification step to decide if the e-mail is spam or not. All the two steps use incremental learning, of which an algorithm is IB2 of TiMBL. To evaluate the proposed system, we collect 12,000 mails of 12 persons. The accuracy is $81{\sim}93%$ according to each person. The proposed system outperforms, at about 14% on the average, a system that does not use any information about user actions.
Keywords
Korean Language Processing; Spam Mail; Information Filtering; Machine Learning;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 임정택, 김형준, 강승식, '나이브 베이지안 분류자와 메일 주소 유효성 검사를 이용한 스팸 메일 필터링 시스템', 한국정보과학회 2005년 춘계학술발표논문집, Vol.32, No.2, pp.523-525, 2005
2 김현준, 정재은, 조근식, '가중치가 부여된 베이지안 분류를 이용한 스팸 메일 필터링시스템', 정보과학회 논문지: 소프트웨어 및 응용, Vol.31, No.8, pp.1092-1100, 2004   과학기술학회마을
3 Goecks, J. and Shavlik, J., 'Learning users' interests by unobtrusively observing their normal behavior', Proceedings of The 5th International Conference on Intelligent User Interfaces, pp.129-132, 2000   DOI
4 Kim, J. and Oard, D. W, 'Observable behavior for implicit user modeling: A framework and user studies', Journal of the Korean Society for Library and Information Science, Vol.35, No.3, pp.173-189, 2001   과학기술학회마을
5 Daelemans, W., Zavrel, J. and Ko, van der S., TiMBL: Tilburg Memory-Based Learner Version 5.1 reference guide, Tilburg University, ILK Technical Report, ILK -0104, 2004
6 Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C. and Stamatopoulos, P., 'A memorybased approach to anti-spam filtering for mailing lists', Information Retrieval, Vol.6, pp.49-73, 2003   DOI   ScienceOn
7 Morita, M. and Shinoda, Y., 'Information filtering based on user behavior: Analysis and best match text retrieval', Proceedings of SIGIR, pp.272~281, 1994
8 민도식, 송무회, 손기준, 이상조, 'SVM 분류 알고리즘을 이용한 스팸 메일 필터링', 한국정보과학회 2003년 춘계학술대회 발표논문집, Vol.30, No.1, PP.552-554, 2003   과학기술학회마을
9 Drucker, H. D., Wu, D. and Vapnik, V., 'Support vector machines for spam categorization', IEEE Transactions on Neural Networks, Vol.10, No.5, pp.1048-1054, 1999   DOI   ScienceOn
10 Wolfe, P., Scott C., and Erwin M. W. (2004), Anti-SPAM Toolkit, McGraw- Hill/Osborne
11 Mitchell, T. M., Machine Learning, McGraw-Hill, 1997
12 이상호, '자동 생성 메일계정 인식을 통한 스팸 필터링', 정보과학회 논문지: 소프트웨어 및 응용, Vol.32, No.5, pp.378-384, 2005   과학기술학회마을
13 Androutsopoulos, I., Koutsias, J, Chandrinos, K. V. and Spyropoulos, C. D. 'Learning to filter spam e-mail: A ?comparison of a naive bayesian and a memory-based approach,' Proceedings of the Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp.1-13, 2000
14 Cranor, L. F., and LaMacchia, B. A. 'Spam!', Communications of ACM, Vol.41, No.8, pp.74-83, 1998   DOI   ScienceOn
15 Schwartz, A., SpamAssassin, O'Reilly, 2004
16 Cohen, W. W., 'Learning rules that classify e-mail,' Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access, pp.18-25, 1996
17 Tretyakov, K. 'Machine Learning Techniques in Spam Filtering,' Institute of Computer Science, University of Tartu Data Mining Problem-oriented Seminar, MTAT, Vol.3, pp.60-79, 2004
18 Zhang, L., Zhu, J. and Yao, T. 'An evaluation of statistical spam filtering techniques', ACM Transactions on Asian Language Information Processing, vol. 3, No.4, pp.243-269, 2004   DOI
19 Sorkin, D. E.,'Technical and legal approaches to unsolicited electronic mail', San Francisco University Raw Review, vol. 35, pp.334, 2001
20 ITU, SPAM in the Information Society: Building Frameworks for International Cooperation, http://www.itu.int/osg/spu/publication/#2004, 2004
21 한국정보보호진흥원, 알기 쉬운 스팸 대응 현황 자료집, http://www.kisa.or.kr/index.jsp, 2004