• Title/Summary/Keyword: Spam Mail Classification

Search Result 24, Processing Time 0.031 seconds

Improved Spam Filter via Handling of Text Embedded Image E-mail

  • Youn, Seongwook;Cho, Hyun-Chong
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.1
    • /
    • pp.401-407
    • /
    • 2015
  • The increase of image spam, a kind of spam in which the text message is embedded into attached image to defeat spam filtering technique, is a major problem of the current e-mail system. For nearly a decade, content based filtering using text classification or machine learning has been a major trend of anti-spam filtering system. Recently, spammers try to defeat anti-spam filter by many techniques. Text embedding into attached image is one of them. We proposed an ontology spam filters. However, the proposed system handles only text e-mail and the percentage of attached images is increasing sharply. The contribution of the paper is that we add image e-mail handling capability into the anti-spam filtering system keeping the advantages of the previous text based spam e-mail filtering system. Also, the proposed system gives a low false negative value, which means that user's valuable e-mail is rarely regarded as a spam e-mail.

The Adaptive SPAM Mail Detection System using Clustering based on Text Mining

  • Hong, Sung-Sam;Kong, Jong-Hwan;Han, Myung-Mook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.6
    • /
    • pp.2186-2196
    • /
    • 2014
  • Spam mail is one of the most general mail dysfunctions, which may cause psychological damage to internet users. As internet usage increases, the amount of spam mail has also gradually increased. Indiscriminate sending, in particular, occurs when spam mail is sent using smart phones or tablets connected to wireless networks. Spam mail consists of approximately 68% of mail traffic; however, it is believed that the true percentage of spam mail is at a much more severe level. In order to analyze and detect spam mail, we introduce a technique based on spam mail characteristics and text mining; in particular, spam mail is detected by extracting the linguistic analysis and language processing. Existing spam mail is analyzed, and hidden spam signatures are extracted using text clustering. Our proposed method utilizes a text mining system to improve the detection and error detection rates for existing spam mail and to respond to new spam mail types.

A Spam Mail Classification Using Link Structure Analysis (링크구조분석을 이용한 스팸메일 분류)

  • Rhee, Shin-Young;Khil, A-Ra;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.30-39
    • /
    • 2007
  • The existing content-based spam mail filtering algorithms have difficulties in filtering spam mails when e-mails contain images but little text. In this thesis we propose an efficient spam mail classification algorithm that utilizes the link structure of e-mails. We compute the number of hyperlinks in an e-mail and the in-link frequencies of the web pages hyperlinked in the e-mail. Using these two features we classify spam mails and legitimate mails based on the decision tree trained for spam mail classification. We also suggest a hybrid system combining three different algorithms by majority voting: the link structure analysis algorithm, a modified link structure analysis algorithm, in which only the host part of the hyperlinked pages of an e-mail is used for link structure analysis, and the content-based method using SVM (support vector machines). The experimental results show that the link structure analysis algorithm slightly outperforms the existing content-based method with the accuracy of 94.8%. Moreover, the hybrid system achieves the accuracy of 97.0%, which is a significant performance improvement over the existing method.

A study on the Filtering of Spam E-mail using n-Gram indexing and Support Vector Machine (n-Gram 색인화와 Support Vector Machine을 사용한 스팸메일 필터링에 대한 연구)

  • 서정우;손태식;서정택;문종섭
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.14 no.2
    • /
    • pp.23-33
    • /
    • 2004
  • Because of a rapid growth of internet environment, it is also fast increasing to exchange message using e-mail. But, despite the convenience of e-mail, it is rising a currently bi9 issue to waste their time and cost due to the spam mail in an individual or enterprise. Many kinds of solutions have been studied to solve harmful effects of spam mail. Such typical methods are as follows; pattern matching using the keyword with representative method and method using the probability like Naive Bayesian. In this paper, we propose a classification method of spam mails from normal mails using Support Vector Machine, which has excellent performance in pattern classification problems, to compensate for the problems of existing research. Especially, the proposed method practices efficiently a teaming procedure with a word dictionary including a generated index by the n-Gram. In the conclusion, we verified the proposed method through the accuracy comparison of spm mail separation between an existing research and proposed scheme.

An Approach to Detect Spam E-mail with Abnormal Character Composition (비정상 문자 조합으로 구성된 스팸 메일의 탐지 방법)

  • Lee, Ho-Sub;Cho, Jae-Ik;Jung, Man-Hyun;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.6A
    • /
    • pp.129-137
    • /
    • 2008
  • As the use of the internet increases, the distribution of spam mail has also vastly increased. The email's main use was for the exchange of information, however, currently it is being more frequently used for advertisement and malware distribution. This is a serious problem because it consumes a large amount of the limited internet resources. Furthermore, an extensive amount of computer, network and human resources are consumed to prevent it. As a result much research is being done to prevent and filter spam. Currently, research is being done on readable sentences which do not use proper grammar. This type of spam can not be classified by previous vocabulary analysis or document classification methods. This paper proposes a method to filter spam by using the subject of the mail and N-GRAM for indexing and Bayesian, SVM algorithms for classification.

A Research on the Intelligent E-mail System Using User Patterns (사용자 패턴을 이용한 지능형 e-메일 시스템의 연구)

  • Lim Yang-Won;Lim Han-Kyu
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.1
    • /
    • pp.64-71
    • /
    • 2006
  • Electronic mail (E-mail) is an integral part of communication for the recent Internet users. However, e-mail has also come to serve as a means to support flood of unwanted spam mails and junk mails having bad purposes. This paper was conducted in order to develop an intelligent e-mail system using user behavior pattern that can prevent these unnecessary information and enable the user to enjoy communication via e-mail in a cleaner environment. The concentrated analysis of the user behavior in terms of using e-mail functions has resulted in better classification between unnecessary and necessary information, thereby facilitating faster disposal of spam mails.

  • PDF

Performance Improvement of Spam Filtering Using User Actions (사용자 행동을 이용한 쓰레기편지 여과의 성능 개선)

  • Kim Jae-Hoon;Kim Kang-Min
    • The KIPS Transactions:PartB
    • /
    • v.13B no.2 s.105
    • /
    • pp.163-170
    • /
    • 2006
  • With rapidly developing Internet applications, an e-mail has been considered as one of the most popular methods for exchanging information. The e-mail, however, has a serious problem that users ran receive a lot of unwanted e-mails, what we called, spam mails, which cause big problems economically as well as socially. In order to block and filter out the spam mails, many researchers and companies have performed many sorts of research on spam filtering. In general, users of e-mail have different criteria on deciding if an e-mail is spam or not. Furthermore, in e-mail client systems, users do different actions according to a spam mail or not. In this paper, we propose a mail filtering system using such user actions. The proposed system consists of two steps: One is an action inference step to draw user actions from an e-mail and the other is a mail classification step to decide if the e-mail is spam or not. All the two steps use incremental learning, of which an algorithm is IB2 of TiMBL. To evaluate the proposed system, we collect 12,000 mails of 12 persons. The accuracy is $81{\sim}93%$ according to each person. The proposed system outperforms, at about 14% on the average, a system that does not use any information about user actions.

Constructing User Preferred Anti-Spam Ontology using Data Mining Technique (데이터 마이닝 기술을 적용한 사용자 선호 스팸 대응 온톨로지 구축)

  • Kim, Jong-Wan;Kim, Hee-Jae;Kang, Sin-Jae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.2
    • /
    • pp.160-166
    • /
    • 2007
  • When a mail was given to users, each user's response could be different according to his or her preference. This paper presents a solution for this situation by constructing a user preferred ontology for anti-spam systems. To define an ontology for describing user behaviors, we applied associative classification mining to study preference information of users and their responses to emails. Generated classification rules can be represented in a formal ontology language. A user preferred ontology can explain why mail is decided to be spam or ron-spam in a meaningful way. We also suggest a new rule optimization procedure inspired from logic synthesis to improve comprehensibility and exclude redundant rules.

Design and Implementation of Web Mail Filtering Agent for Personalized Classification (개인화된 분류를 위한 웹 메일 필터링 에이전트)

  • Jeong, Ok-Ran;Cho, Dong-Sub
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.853-862
    • /
    • 2003
  • Many more use e-mail purely on a personal basis and the pool of e-mail users is growing daily. Also, the amount of mails, which are transmitted in electronic commerce, is getting more and more. Because of its convenience, a mass of spam mails is flooding everyday. And yet automated techniques for learning to filter e-mail have yet to significantly affect the e-mail market. This paper suggests Web Mail Filtering Agent for Personalized Classification, which automatically manages mails adjusting to the user. It is based on web mail, which can be logged in any time, any place and has no limitation in any system. In case new mails are received, it first makes some personal rules in use of the result of observation ; and based on the personal rules, it automatically classifies the mails into categories according to the contents of mails and saves the classified mails in the relevant folders or deletes the unnecessary mails and spam mails. And, we applied Bayesian Algorithm using Dynamic Threshold for our system's accuracy.

Features Reduction using Logistic Regression for Spam Filtering (로지스틱 회귀 분석을 이용한 스펨 필터링의 특징 축소)

  • Jung, Yong-Gyu;Lee, Bum-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.2
    • /
    • pp.13-18
    • /
    • 2010
  • Today, The much amount of spam that occupies the mail server and network storage occurs the lack of negative issues, such as overload, and for users to delete the spam should spend time, resources have a problem. Automatic spam filtering on the incidence to solve the problem is essential. A lot of Spam filters have tried to solve the problem emerged as an essential element automatically. Unlike traditional method such as Naive Bayesian, PCA through the many-dimensional data set of spam with a few spindle-dimensional process that narrowed the operation to reduce the burden on certain groups for classification Logistic regression analysis method was used to filter the spam. Through the speed and performance, it was able to get the positive results.