• Title/Summary/Keyword: Spam Filtering

Search Result 95, Processing Time 0.023 seconds

Spam Message Filtering with Bayesian Approach for Internet Communities (베이지안을 이용한 인터넷 커뮤니티 상의 유해 메시지 차단 기법)

  • Kim, Bum-Bae;Choi, Hyoung-Kee
    • The KIPS Transactions:PartC
    • /
    • v.13C no.6 s.109
    • /
    • pp.733-740
    • /
    • 2006
  • Spam Message has been Causing widespread damages on the Internet. One source of the problems is rooted from an anonymously posted message in the bulletin board in Internet communities. This type of the Spam messages tries to advertise products, to harm other's reputation, to deliver religious messages and so on. In this paper we present the Spam message filtering using the Bayesian approach. In order to increase usefulness of the Spam filter in the bulletin board in Internet communities, we made the Spam filter which can divide the Spam message into six categories such as advertisement, pornography, abuse, religion and other. The test conducted against messages posted on the popular web sites.

Knowledge Graph-based Korean New Words Detection Mechanism for Spam Filtering (스팸 필터링을 위한 지식 그래프 기반의 신조어 감지 매커니즘)

  • Kim, Ji-hye;Jeong, Ok-ran
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.79-85
    • /
    • 2020
  • Today, to block spam texts on smartphone, a simple string comparison between text messages and spam keywords or a blocking spam phone numbers is used. As results, spam text is sent in a gradually hanged way to prevent if from being automatically blocked. In particular, for words included in spam keywords, spam texts are sent to abnormal words using special characters, Chinese characters, and whitespace to prevent them from being detected by simple string match. There is a limit that traditional spam filtering methods can't block these spam texts well. Therefore, new technologies are needed to respond to changing spam text messages. In this paper, we propose a knowledge graph-based new words detection mechanism that can detect new words frequently used in spam texts and respond to changing spam texts. Also, we show experimental results of the performance when detected Korean new words are applied to the Naive Bayes algorithm.

Comparing Feature Selection Methods in Spam Mail Filtering

  • Kim, Jong-Wan;Kang, Sin-Jae
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2005.11a
    • /
    • pp.17-20
    • /
    • 2005
  • In this work, we compared several feature selection methods in the field of spam mail filtering. The proposed fuzzy inference method outperforms information gain and chi squared test methods as a feature selection method in terms of error rate. In the case of junk mails, since the mail body has little text information, it provides insufficient hints to distinguish spam mails from legitimate ones. To address this problem, we follow hyperlinks contained in the email body, fetch contents of a remote web page, and extract hints from both original email body and fetched web pages. A two-phase approach is applied to filter spam mails in which definite hint is used first, and then less definite textual information is used. In our experiment, the proposed two-phase method achieved an improvement of recall by 32.4% on the average over the $1^{st}$ phase or the $2^{nd}$ phase only works.

  • PDF

Spam-mail Filtering based on Lexical Information and Thesaurus (어휘정보와 시소러스에 기반한 스팸메일 필터링)

  • Kang Shin-Jae;Kim Jong-Wan
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.11 no.1
    • /
    • pp.13-20
    • /
    • 2006
  • In this paper, we constructed a spam-mail filtering system based on the lexical and conceptual information. There are two kinds of information that can distinguish the spam mail from the legitimate mil. The definite information is the mail sender's information, URL, a certain spam keyword list, and the less definite information is the word lists and concept codes extracted from the mail body. We first classified the spam mail by using the definite information, and then used the less definite information. We used the lexical information and concept codes contained in the email body for SVM learning. According to our results the spam precision was increased if more lexical information was used as features, and the spam recall was increased when the concept codes were included in features as well.

  • PDF

A Development of the SMBC platform for supporting advanced performance of blocking spam-mails (향상된 차단 성능 지원을 위한 SMBC 플랫폼 개발)

  • Sso, Sang-Jin;Jin, Hyun-Joon;Park, Noh-Kyung
    • Journal of Internet Computing and Services
    • /
    • v.8 no.2
    • /
    • pp.89-94
    • /
    • 2007
  • Even though lots of research have been doing about spam mail blocking technologies and their systems, the emergence of spam mails of new types causes the spam mail filtering rate to decrease and the occurrences of false-positive mails to increase. Therefore, existing spam mail filtering algorithms suffer from increasing load to be processed and decreasing reliability in spam mail blocking systems due to the shortage of newly developed algorithms and their research. This paper presents the Fit-FA Finder which is able to select appropriate algorithms to be applied and their procedures, and the development of the SMBC platform. The Fit-FA Finder is developed and implemented in the SMBC platform in which recovering process based on privacy information is employed for false-positive mails

  • PDF

Implementation of A Mobile Application for Spam SMS Filtering Using Set-Based POI Search Algorithm (집합 기반 POI 검색 알고리즘을 활용한 스팸 메시지 판별 모바일 앱 구현)

  • Ahn, Hye-yeong;Cho, Wan-zee;Lee, Jong-woo
    • Journal of Digital Contents Society
    • /
    • v.16 no.5
    • /
    • pp.815-822
    • /
    • 2015
  • By the growing of SMS phishing victims, applications for processing spam messages are being released in succession. However most spam messages that cleverly modified the content like separating the consonants and vowels are fail to be filtered. In this paper, we implemented an application 'AntiSpam' which is able to identify spam strings in the text message to solve this problem. 'AntiSpam' searches spam strings in the text message by using set-based POI search algorithm, and then calculate the possibility of whether it is spam or not in accordance with the search results. In addition, it catches skillfully disguised spam messages in order to avoid missing the spam filtering. Users, who received a message, can check the result in spam message possibility decision result and the contents of the message and they can choose how to handling the message.

Korean Mobile Spam Filtering System Considering Characteristics of Text Messages (문자메시지의 특성을 고려한 한국어 모바일 스팸필터링 시스템)

  • Sohn, Dae-Neung;Lee, Jung-Tae;Lee, Seung-Wook;Shin, Joong-Hwi;Rim, Hae-Chang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.7
    • /
    • pp.2595-2602
    • /
    • 2010
  • This paper introduces a mobile spam filtering system that considers the style of short text messages sent to mobile phones for detecting spam. The proposed system not only relies on the occurrence of content words as previously suggested but additionally leverages the style information to reduce critical cases in which legitimate messages containing spam words are mis-classified as spam. Moreover, the accuracy of spam classification is improved by normalizing the messages through the correction of word spacing and spelling errors. Experiment results using real world Korean text messages show that the proposed system is effective for Korean mobile spam filtering.

A Normalization Method of Distorted Korean SMS Sentences for Spam Message Filtering (스팸 문자 필터링을 위한 변형된 한글 SMS 문장의 정규화 기법)

  • Kang, Seung-Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.271-276
    • /
    • 2014
  • Short message service(SMS) in a mobile communication environment is a very convenient method. However, it caused a serious side effect of generating spam messages for advertisement. Those who send spam messages distort or deform SMS sentences to avoid the messages being filtered by automatic filtering system. In order to increase the performance of spam filtering system, we need to recover the distorted sentences into normal sentences. This paper proposes a method of normalizing the various types of distorted sentence and extracting keywords through automatic word spacing and compound noun decomposition.

Spam-Filtering by Identifying Automatically Generated Email Accounts (자동 생성 메일계정 인식을 통한 스팸 필터링)

  • Lee Sangho
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.378-384
    • /
    • 2005
  • In this paper, we describe a novel method of spam-filtering to improve the performance of conventional spam-filtering systems. Conventional systems filter emails by investigating words distribution in email headers or bodies. Nowadays, spammers begin making email accounts in web-based email service sites and sending emails as if they are not spams. Investigating the email accounts of those spams, we notice that there is a large difference between the automatically generated accounts and ordinaries. Based on that difference, incoming emails are classified into spam/non-spam classes. To classify emails from only account strings, we used decision trees, which have been generally used for conventional pattern classification problems. We collected about 2.15 million account strings from email service sites, and our account checker resulted in the accuracy of $96.3\%$. The previous filter system with the checker yielded the improved filtering performance.

Implementation of a Spam Message Filtering System using Sentence Similarity Measurements (문장유사도 측정 기법을 통한 스팸 필터링 시스템 구현)

  • Ou, SooBin;Lee, Jongwoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.57-64
    • /
    • 2017
  • Short message service (SMS) is one of the most important communication methods for people who use mobile phones. However, illegal advertising spam messages exploit people because they can be used without the need for friend registration. Recently, spam message filtering systems that use machine learning have been developed, but they have some disadvantages such as requiring many calculations. In this paper, we implemented a spam message filtering system using the set-based POI search algorithm and sentence similarity without servers. This algorithm can judge whether the input query is a spam message or not using only letter composition without any server computing. Therefore, we can filter the spam message although the input text message has been intentionally modified. We added a specific preprocessing option which aims to enable spam filtering. Based on the experimental results, we observe that our spam message filtering system shows better performance than the original set-based POI search algorithm. We evaluate the proposed system through extensive simulation. According to the simulation results, the proposed system can filter the text message and show high accuracy performance against the text message which cannot be filtered by the 3 major telecom companies.