• Title/Summary/Keyword: spam-mail filtering

Search Result 54, Processing Time 0.028 seconds

Performance Improvement of Spam Filtering Using User Actions (사용자 행동을 이용한 쓰레기편지 여과의 성능 개선)

  • Kim Jae-Hoon;Kim Kang-Min
    • The KIPS Transactions:PartB
    • /
    • v.13B no.2 s.105
    • /
    • pp.163-170
    • /
    • 2006
  • With rapidly developing Internet applications, an e-mail has been considered as one of the most popular methods for exchanging information. The e-mail, however, has a serious problem that users ran receive a lot of unwanted e-mails, what we called, spam mails, which cause big problems economically as well as socially. In order to block and filter out the spam mails, many researchers and companies have performed many sorts of research on spam filtering. In general, users of e-mail have different criteria on deciding if an e-mail is spam or not. Furthermore, in e-mail client systems, users do different actions according to a spam mail or not. In this paper, we propose a mail filtering system using such user actions. The proposed system consists of two steps: One is an action inference step to draw user actions from an e-mail and the other is a mail classification step to decide if the e-mail is spam or not. All the two steps use incremental learning, of which an algorithm is IB2 of TiMBL. To evaluate the proposed system, we collect 12,000 mails of 12 persons. The accuracy is $81{\sim}93%$ according to each person. The proposed system outperforms, at about 14% on the average, a system that does not use any information about user actions.

A Spam Mail Classification Using Link Structure Analysis (링크구조분석을 이용한 스팸메일 분류)

  • Rhee, Shin-Young;Khil, A-Ra;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.30-39
    • /
    • 2007
  • The existing content-based spam mail filtering algorithms have difficulties in filtering spam mails when e-mails contain images but little text. In this thesis we propose an efficient spam mail classification algorithm that utilizes the link structure of e-mails. We compute the number of hyperlinks in an e-mail and the in-link frequencies of the web pages hyperlinked in the e-mail. Using these two features we classify spam mails and legitimate mails based on the decision tree trained for spam mail classification. We also suggest a hybrid system combining three different algorithms by majority voting: the link structure analysis algorithm, a modified link structure analysis algorithm, in which only the host part of the hyperlinked pages of an e-mail is used for link structure analysis, and the content-based method using SVM (support vector machines). The experimental results show that the link structure analysis algorithm slightly outperforms the existing content-based method with the accuracy of 94.8%. Moreover, the hybrid system achieves the accuracy of 97.0%, which is a significant performance improvement over the existing method.

Studying on Expansion of Realtime Blocking List Conception for Spam E-mail Filtering (스팸 메일 차단을 위한 RBL개념의 확장에 관한 연구)

  • Kim, Jong-Min;Kim, Hion-Gun;Kim, Bong-Gi
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.10
    • /
    • pp.1808-1814
    • /
    • 2008
  • In addition to RBL function, which is used to applying for spam e-mail filtering, as an effective way to deal with the recently widespread spam types, this paper proposes how to extract URL that was comprised in the original e-mail, apply it to RBL, and expand it. The BotNet, which is used to using for sending spam mails these days, has a problem that it is not able to solve with the distributed addresses of sent mails in spam e-mails. In general, as these spam e-mails are sent from the infected Zombi PC of individual user, the sent address itself is not efficient and is meaningless to use in RBL. As an effective way to filter spam e-mail sent by BotNet, this paper analyzes URLs that contained in the original spam e-mail and proposes how to effectively improve filter rate, based on the distribution data of URL site tempting users. This paper proposes the sending mechanism of spam e-mails from BotNet and the methods to realize those types of spam e-mails. In order to gather analyzable spam e-mails, this paper also carries out an experiment by configuring trap system of spam e-mail. By analyzing spam e-mails, which have been received during the certain period of experiment, this paper shows that the expanded RBL method, using URLs that contained in spam e-mails, is effective way to improve the filter distribution of spam e-mail.

A fasrter Spam Mail Prevention Algorithm on userID based (userID 기반의 빠른 메일 차단 알고리즘)

  • 심재창;고주영;김현기
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.211-214
    • /
    • 2003
  • The problem of unsolicited e-mail has been increasing for years, so many researchers has studied about spam filtering and prevention. In this article, we proposed a faster spam prevention algorithm based on userID instead of full email address. But there are 2% of false-negatives by userID. In this case, we store those domains in a DB and filter them out. The proposed algorithm requires small DB and 3.7 times faster than the e-mail address comparison algorithm. We implemented this algorithm using SPRSW(Spam Prevention using Replay Secrete Words) to register userID automatically in userID DB.

  • PDF

A Design of the SMBC for Improving Reliability of Blocking Spam Mail (스팸 메일 차단 신뢰도 향상을 위한 SMBC 플랫폼 설계)

  • Park Nho-Kyung;Han Sung-Ho;Seo Sang-Jin;Jin Hyun-Joon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.11B
    • /
    • pp.730-735
    • /
    • 2005
  • While the E-mail is a important way of fast communication in these days. it is real that the E-mail is often misused as a commercial advertisement method and creates many social problems. Even though various filtering techniques for blocking spam mails have been developed, reliability of mail systems is decreased by misreading normal mails as spam mails, i.e. false-positive errors. In this paper, the SMBC(Spam Mail Blocking Center) platform employing spam mail recovery method based on privacy information is proposed and designed. The SMBC is designed in frame layer based on spam blocking system of proxy sewer and can be physically implemented in various topology so that flexible development with layered module is possible. Using privacy information makes the proposed SMBC platform minimize processing load and false-positive error rates so that it can improve mail system reliabilities.

A study on the Filtering of Spam E-mail using n-Gram indexing and Support Vector Machine (n-Gram 색인화와 Support Vector Machine을 사용한 스팸메일 필터링에 대한 연구)

  • 서정우;손태식;서정택;문종섭
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.14 no.2
    • /
    • pp.23-33
    • /
    • 2004
  • Because of a rapid growth of internet environment, it is also fast increasing to exchange message using e-mail. But, despite the convenience of e-mail, it is rising a currently bi9 issue to waste their time and cost due to the spam mail in an individual or enterprise. Many kinds of solutions have been studied to solve harmful effects of spam mail. Such typical methods are as follows; pattern matching using the keyword with representative method and method using the probability like Naive Bayesian. In this paper, we propose a classification method of spam mails from normal mails using Support Vector Machine, which has excellent performance in pattern classification problems, to compensate for the problems of existing research. Especially, the proposed method practices efficiently a teaming procedure with a word dictionary including a generated index by the n-Gram. In the conclusion, we verified the proposed method through the accuracy comparison of spm mail separation between an existing research and proposed scheme.

Features Reduction using Logistic Regression for Spam Filtering (로지스틱 회귀 분석을 이용한 스펨 필터링의 특징 축소)

  • Jung, Yong-Gyu;Lee, Bum-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.2
    • /
    • pp.13-18
    • /
    • 2010
  • Today, The much amount of spam that occupies the mail server and network storage occurs the lack of negative issues, such as overload, and for users to delete the spam should spend time, resources have a problem. Automatic spam filtering on the incidence to solve the problem is essential. A lot of Spam filters have tried to solve the problem emerged as an essential element automatically. Unlike traditional method such as Naive Bayesian, PCA through the many-dimensional data set of spam with a few spindle-dimensional process that narrowed the operation to reduce the burden on certain groups for classification Logistic regression analysis method was used to filter the spam. Through the speed and performance, it was able to get the positive results.

Spam-Mail Filtering System Using Weighted Bayesian Classifier (가중치가 부여된 베이지안 분류자를 이용한 스팸 메일 필터링 시스템)

  • 김현준;정재은;조근식
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.8
    • /
    • pp.1092-1100
    • /
    • 2004
  • An E-mails have regarded as one of the most popular methods for exchanging information because of easy usage and low cost. Meanwhile, exponentially growing unwanted mails in user's mailbox have been raised as main problem. Recognizing this issue, Korean government established a law in order to prevent e-mail abuse. In this paper we suggest hybrid spam mail filtering system using weighted Bayesian classifier which is extended from naive Bayesian classifier by adding the concept of preprocessing and intelligent agents. This system can classify spam mails automatically by using training data without manual definition of message rules. Particularly, we improved filtering efficiency by imposing weight on some character by feature extraction from spam mails. Finally, we show efficiency comparison among four cases - naive Bayesian, weighting on e-mail header, weighting on HTML tags, weighting on hyperlinks and combining all of four cases. As compared with naive Bayesian classifier, the proposed system obtained 5.7% decreased precision, while the recall and F-measure of this system increased by 33.3% and 31.2%, respectively.

Design and Implementation of The Spam I-Mail filtering System (컨텐츠 필터를 이용한 스팸메일 차단 시스템 설계 및 구현)

  • 김진만;장종욱
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.05a
    • /
    • pp.465-468
    • /
    • 2003
  • E-mail, one of the oldest services in internet becomes very important and essential way to communicate with development of internet. Due to E-mail has a property which is not complete for security, sometimes it is used for purpose of commercial or bad things, therefore it becomes the latest problem to keep off a Spam-mail and commercial advertising E-mail, many ways to keep off were perposed for it. In this paper, I explained how to sort and keep off these Spam-mail and commercial advertising E-mail with three way, prevention by server level, prevention by construction of network level, prevention by client level. we designed a prevention system for Spam-mail and implemented it by Visual Basic.

  • PDF

Design and Implementation of The Spam E-Mail filtering System (스팸메일 차단 시스템 설계 및 구현)

  • 김진만;장종욱
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.11a
    • /
    • pp.413-417
    • /
    • 2002
  • E-mail was very particular way of communication in the past, but it becomes one of daily communication methods now. Due to E-mail has a property which is not complete for security, sometimes it is used for purpose of commercial or badthings, therefore it becomes the latest problem to keep off a Spam-mail and commercial advertising E-mail, many ways to keep off were perposed for it. In this paper, I explained how to sort and keep off these Spam-mail and commercial advertising E-mail with three way, prevention by server level, prevention by construction of network level, prevention by client level. we designed a prevention system for Spam-mail and implemented it by Visual Basic.

  • PDF