• Title/Summary/Keyword: spam mail filter

Search Result 22, Processing Time 0.03 seconds

Features Reduction using Logistic Regression for Spam Filtering (로지스틱 회귀 분석을 이용한 스펨 필터링의 특징 축소)

  • Jung, Yong-Gyu;Lee, Bum-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.2
    • /
    • pp.13-18
    • /
    • 2010
  • Today, The much amount of spam that occupies the mail server and network storage occurs the lack of negative issues, such as overload, and for users to delete the spam should spend time, resources have a problem. Automatic spam filtering on the incidence to solve the problem is essential. A lot of Spam filters have tried to solve the problem emerged as an essential element automatically. Unlike traditional method such as Naive Bayesian, PCA through the many-dimensional data set of spam with a few spindle-dimensional process that narrowed the operation to reduce the burden on certain groups for classification Logistic regression analysis method was used to filter the spam. Through the speed and performance, it was able to get the positive results.

E-mail Sending-Server Authorization Method using a Distance Estimation Algorithm between IP Addresses for Filtering Spam (스팸메일 차단을 위해 IP 주소간 거리 측정 알고리즘을 이용하는 전자우편 발송서버의 권한확인 방법)

  • Yim Hosung;Shim Jaehong;Choi Kyunghee;Jung Gihyun
    • The KIPS Transactions:PartC
    • /
    • v.12C no.5 s.101
    • /
    • pp.765-772
    • /
    • 2005
  • In this paper, we propose E-mail sending-server authorization method using a distance estimation algorithm between W addresses to check whether the E-mail sending server is registered in the domain of mail sending server or belongs to the domain for filtering spam mail. This method utilizes the distance between the IP address of sending server and IP addresses registered in the DNS to figure out that the E-mail sending server exists in the domain to filter spam mail. The experimental result of applying the proposed algorithm to sample E-mails gathered in a large size laboratory says that 88 percents of legitimate E-mails and only 10 percents of spam mails are sent by servers in the same domains of senders. The algorithm may be effectively used to block spam mails sent by servers outside of the domains of mail senders. It may be also hired as a temporary E-mail protecting system until the standard E-mail authorization protocol is fully deployed.

An Improved Bayesian Spam Mail Filter based on Ch-square Statistics (카이제곱 통계량을 이용한 개선된 베이지안 스팸메일 필터)

  • Kim Jin-Sang;Choe Sang-Yeol
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.04a
    • /
    • pp.403-414
    • /
    • 2005
  • Most of the currently used spam-filters are based on a Bayesian classification technique, where some serious problems occur such as a limited precision/recall rate and the false positive error. This paper addresses a solution to the problems using a modified Bayesian classifier based on chi-square statistics. The resulting spam-filter is more accurate and flexible than traditional Bayesian spam-filters and can be a personalized one providing some parameters when the filter is teamed from training data.

  • PDF

Spam Image Detection Model based on Deep Learning for Improving Spam Filter

  • Seong-Guk Nam;Dong-Gun Lee;Yeong-Seok Seo
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.289-301
    • /
    • 2023
  • Due to the development and dissemination of modern technology, anyone can easily communicate using services such as social network service (SNS) through a personal computer (PC) or smartphone. The development of these technologies has caused many beneficial effects. At the same time, bad effects also occurred, one of which was the spam problem. Spam refers to unwanted or rejected information received by unspecified users. The continuous exposure of such information to service users creates inconvenience in the user's use of the service, and if filtering is not performed correctly, the quality of service deteriorates. Recently, spammers are creating more malicious spam by distorting the image of spam text so that optical character recognition (OCR)-based spam filters cannot easily detect it. Fortunately, the level of transformation of image spam circulated on social media is not serious yet. However, in the mail system, spammers (the person who sends spam) showed various modifications to the spam image for neutralizing OCR, and therefore, the same situation can happen with spam images on social media. Spammers have been shown to interfere with OCR reading through geometric transformations such as image distortion, noise addition, and blurring. Various techniques have been studied to filter image spam, but at the same time, methods of interfering with image spam identification using obfuscated images are also continuously developing. In this paper, we propose a deep learning-based spam image detection model to improve the existing OCR-based spam image detection performance and compensate for vulnerabilities. The proposed model extracts text features and image features from the image using four sub-models. First, the OCR-based text model extracts the text-related features, whether the image contains spam words, and the word embedding vector from the input image. Then, the convolution neural network-based image model extracts image obfuscation and image feature vectors from the input image. The extracted feature is determined whether it is a spam image by the final spam image classifier. As a result of evaluating the F1-score of the proposed model, the performance was about 14 points higher than the OCR-based spam image detection performance.

An Implementation and Evaluation of FQDN Check System to Filter Junk Mail (정크메일 차단을 위한 FQDN 확인 시스템의 구현 및 평가)

  • Kim Sung-Chan;Lee Sang-Hun;Jun Moon-Seog
    • The KIPS Transactions:PartC
    • /
    • v.12C no.3 s.99
    • /
    • pp.361-368
    • /
    • 2005
  • Internet mail has become a common communication method around the world because of tremendous Internet service usage increment. In other respect, Most Internet users' mail addresses are exposed to spammer, and the damage of Junk mail is growing bigger and bigger. These days, Junk mail delivery problem is becoming more serious, because this is used for an attack or propagation scheme of malicious code. It's a most dangerous dominant cause for computer system accident. This paper shows the Junk mail filtering model and implementation which is based on FQDN (Fully Qualified Domain Name) check and evaluates it for proposing advanced scheme against Junk mail.

Design and Implementation of Web Mail Filtering Agent for Personalized Classification (개인화된 분류를 위한 웹 메일 필터링 에이전트)

  • Jeong, Ok-Ran;Cho, Dong-Sub
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.853-862
    • /
    • 2003
  • Many more use e-mail purely on a personal basis and the pool of e-mail users is growing daily. Also, the amount of mails, which are transmitted in electronic commerce, is getting more and more. Because of its convenience, a mass of spam mails is flooding everyday. And yet automated techniques for learning to filter e-mail have yet to significantly affect the e-mail market. This paper suggests Web Mail Filtering Agent for Personalized Classification, which automatically manages mails adjusting to the user. It is based on web mail, which can be logged in any time, any place and has no limitation in any system. In case new mails are received, it first makes some personal rules in use of the result of observation ; and based on the personal rules, it automatically classifies the mails into categories according to the contents of mails and saves the classified mails in the relevant folders or deletes the unnecessary mails and spam mails. And, we applied Bayesian Algorithm using Dynamic Threshold for our system's accuracy.

Spam-Filtering by Identifying Automatically Generated Email Accounts (자동 생성 메일계정 인식을 통한 스팸 필터링)

  • Lee Sangho
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.378-384
    • /
    • 2005
  • In this paper, we describe a novel method of spam-filtering to improve the performance of conventional spam-filtering systems. Conventional systems filter emails by investigating words distribution in email headers or bodies. Nowadays, spammers begin making email accounts in web-based email service sites and sending emails as if they are not spams. Investigating the email accounts of those spams, we notice that there is a large difference between the automatically generated accounts and ordinaries. Based on that difference, incoming emails are classified into spam/non-spam classes. To classify emails from only account strings, we used decision trees, which have been generally used for conventional pattern classification problems. We collected about 2.15 million account strings from email service sites, and our account checker resulted in the accuracy of $96.3\%$. The previous filter system with the checker yielded the improved filtering performance.

Recognition Method of Korean Abnormal Language for Spam Mail Filtering (스팸메일 필터링을 위한 한글 변칙어 인식 방법)

  • Ahn, Hee-Kook;Han, Uk-Pyo;Shin, Seung-Ho;Yang, Dong-Il;Roh, Hee-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.2
    • /
    • pp.287-297
    • /
    • 2011
  • As electronic mails are being widely used for facility and speedness of information communication, as the amount of spam mails which have malice and advertisement increase and cause lots of social and economic problem. A number of approaches have been proposed to alleviate the impact of spam. These approaches can be categorized into pre-acceptance and post-acceptance methods. Post-acceptance methods include bayesian filters, collaborative filtering and e-mail prioritization which are based on words or sentances. But, spammers are changing those characteristics and sending to avoid filtering system. In the case of Korean, the abnormal usages can be much more than other languages because syllable is composed of chosung, jungsung, and jongsung. Existing formal expressions and learning algorithms have the limits to meet with those changes promptly and efficiently. So, we present an methods for recognizing Korean abnormal language(Koral) to improve accuracy and efficiency of filtering system. The method is based on syllabic than word and Smith-waterman algorithm. Through the experiment on filter keyword and e-mail extracted from mail server, we confirmed that Koral is recognized exactly according to similarity level. The required time and space costs are within the permitted limit.

Checking Spam Mail Filter Need for Each User Group from Computing Distribution of User Email Responses (사용자 이메일 반응 분포 계산과 사용자 그룹 스팸 메일 필터 필요성 점검)

  • Kim, Jongwan;Kim, In-Gil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.47-49
    • /
    • 2009
  • 본 연구에서는 이메일 사용자별로 제공받은 사용자 선호 정보 대상으로 EM 클러스터링을 수행하여 사용자 클러스터를 만든 후, 사용자 클러스터들의 이메일 반응 분포를 계산함으로써 사용자 취향에 따라 동일한 이메일에 대해서도 서로 다른 반응을 가질 수 있다는 사실을 확인하려고 한다. 그 결과로부터 현재의 내용기반 방식과는 다르게 사용자 선호도를 고려한 스팸 메일 필터 구축 방법을 제안한다.

Automatic e-mail classification using Dynamic Category Hierarchy and Principal Component Analysis (주성분 분석과 동적 분류체계를 사용한 자동 이메일 분류)

  • Park, Sun;Kim, Chul-Won;Lee, Yang-weon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.576-579
    • /
    • 2009
  • The amount of incoming e-mails is increasing rapidly due to the wide usage of Internet. Therefore, it is more required to classify incoming e-mails efficiently and accurately. Currently, the e-mail classification techniques are focused on two way classification to filter spam mails from normal ones based mainly on Bayesian and Rule. The clustering method has been used for the multi-way classification of e-mails. But it has a disadvantage of low accuracy of classification. In this paper, we propose a novel multi-way e-mail classification method that uses PCA for automatic category generation and dynamic category hierarchy for high accuracy of classification. It classifies a huge amount of incoming e-mails automatically, efficiently, and accurately.

  • PDF