• Title/Summary/Keyword: Text Spam

Search Result 37, Processing Time 0.022 seconds

Coward Analysis based Spam SMS Detection Scheme (동시출현 단어분석 기반 스팸 문자 탐지 기법)

  • Oh, Hayoung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.3
    • /
    • pp.693-700
    • /
    • 2016
  • Analyzing characteristics of spam text messages had limitations since spam datasets are typically difficult to obtain publicly and previous studies focused on spam email. Although existing studies, such as through the use of spam e-mail characterization and utilization of data mining techniques, there are limitations that influence is limited to high spam detection techniques using a single word character. In this paper, we reveal the characteristics of the spam SMS based on experiment and analysis from different perspectives and propose coward analysis based spam SMS detection scheme with a publicly disclosed spam SMS from the University of Singapore. With the extensive performance evaluations, we show false positive and false negative of the proposed method is less than 2%.

Implementation of a Spam Message Filtering System using Sentence Similarity Measurements (문장유사도 측정 기법을 통한 스팸 필터링 시스템 구현)

  • Ou, SooBin;Lee, Jongwoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.57-64
    • /
    • 2017
  • Short message service (SMS) is one of the most important communication methods for people who use mobile phones. However, illegal advertising spam messages exploit people because they can be used without the need for friend registration. Recently, spam message filtering systems that use machine learning have been developed, but they have some disadvantages such as requiring many calculations. In this paper, we implemented a spam message filtering system using the set-based POI search algorithm and sentence similarity without servers. This algorithm can judge whether the input query is a spam message or not using only letter composition without any server computing. Therefore, we can filter the spam message although the input text message has been intentionally modified. We added a specific preprocessing option which aims to enable spam filtering. Based on the experimental results, we observe that our spam message filtering system shows better performance than the original set-based POI search algorithm. We evaluate the proposed system through extensive simulation. According to the simulation results, the proposed system can filter the text message and show high accuracy performance against the text message which cannot be filtered by the 3 major telecom companies.

Comparing Feature Selection Methods in Spam Mail Filtering

  • Kim, Jong-Wan;Kang, Sin-Jae
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2005.11a
    • /
    • pp.17-20
    • /
    • 2005
  • In this work, we compared several feature selection methods in the field of spam mail filtering. The proposed fuzzy inference method outperforms information gain and chi squared test methods as a feature selection method in terms of error rate. In the case of junk mails, since the mail body has little text information, it provides insufficient hints to distinguish spam mails from legitimate ones. To address this problem, we follow hyperlinks contained in the email body, fetch contents of a remote web page, and extract hints from both original email body and fetched web pages. A two-phase approach is applied to filter spam mails in which definite hint is used first, and then less definite textual information is used. In our experiment, the proposed two-phase method achieved an improvement of recall by 32.4% on the average over the $1^{st}$ phase or the $2^{nd}$ phase only works.

  • PDF

Personalized Anti-spam Filter Considering Users' Different Preferences

  • Kim, Jong-Wan
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.6
    • /
    • pp.841-848
    • /
    • 2010
  • Conventional filters using email header and body information equally judge whether an incoming email is spam or not. However this is unrealistic in everyday life because each person has different criteria to judge what is spam or not. To resolve this problem, we consider user preference information as well as email category information derived from the email content. In this paper, we have developed a personalized anti-spam system using ontologies constructed from rules derived in a data mining process. The reason why traditional content-based filters are not applicable to the proposed experimental situation is described. In also, several experiments constructing classifiers to decide email category and comparing classification rule learners are performed. Especially, an ID3 decision tree algorithm improved the overall accuracy around 17% compared to a conventional SVM text miner on the decision of email category. Some discussions about the axioms generated from the experimental dataset are given too.

A Study on the Effective Countermeasure of SPAM : Focused on Policy Suggestion (불법스팸 방지를 위한 개선방안 : 정책적 제안을 중심으로)

  • Sohn, Jong-Mo;Lim, Hyo-Chang
    • Journal of Industrial Convergence
    • /
    • v.19 no.6
    • /
    • pp.37-47
    • /
    • 2021
  • Today, people share information and communicate with others using various information and communication media such as e-mail, smartphones, SNS, etc. However, it is being used in malicious attacks to send a large amount of illegal spam or to use it for fraud by using illegally collected personal information and devices that are vulnerable to security. Illegal spam, smishing, and fraudulent mail(SCAM) cause a lot of direct and indirect damage to companies and users, including not only social costs such as mental fatigue, but also unnecessary consumption of IT infrastructure resources and economic losses. Although there are regulations related to spam, violators of the law are still on the rise by circumventing the law, and victims are constantly occurring, so it is necessary to review what the problem is. This study examined domestic and foreign spam-related regulations and spam-related response activities, identified problems, and suggested improvement countermeasures. Through this study, it was intended to suggest directions for improving spam-related systems in order to block illegal spam and prevent fraudulent damage.

Intelligent Spam-mail Filtering Based on Textual Information and Hyperlinks (텍스트정보와 하이퍼링크에 기반한 지능형 스팸 메일 필터링)

  • Kang, Sin-Jae;Kim, Jong-Wan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.7
    • /
    • pp.895-901
    • /
    • 2004
  • This paper describes a two-phase intelligent method for filtering spam mail based on textual information and hyperlinks. Scince the body of spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information (sender`s information and definite spam keyword lists) and less definite textual information (words or phrases, and particular features of email). In filtering spam mails, definite information is used first, and then less definite textual information is applied. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using on original email header and body only.

Facebook Spam Post Filtering based on Instagram-based Transfer Learning and Meta Information of Posts (인스타그램 기반의 전이학습과 게시글 메타 정보를 활용한 페이스북 스팸 게시글 판별)

  • Kim, Junhong;Seo, Deokseong;Kim, Haedong;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.43 no.3
    • /
    • pp.192-202
    • /
    • 2017
  • This study develops a text spam filtering system for Facebook based on two variable categories: keywords learned from Instagram and meta-information of Facebook posts. Since there is no explicit labels for spam/ham posts, we utilize hash tags in Instagram to train classification models. In addition, the filtering accuracy is enhanced by considering meta-information of Facebook posts. To verify the proposed filtering system, we conduct an empirical experiment based on a total of 1,795,067 and 761,861 Facebook and Instagram documents, respectively. Employing random forest as a base classification algorithm, experimental result shows that the proposed filtering system yield 99% and 98% in terms of filtering accuracy and F1-measure, respectively. We expect that the proposed filtering scheme can be applied other web services suffering from massive spam posts but no explicit spam labels are available.

An Automatic Spam e-mail Filter System Using χ2 Statistics and Support Vector Machines (카이 제곱 통계량과 지지벡터기계를 이용한 자동 스팸 메일 분류기)

  • Lee, Songwook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.592-595
    • /
    • 2009
  • We propose an automatic spam mail classifier for e-mail data using Support Vector Machines (SVM). We use a lexical form of a word and its part of speech (POS) tags as features. We select useful features with ${\chi}^2$ statistics and represent each feature using text frequency (TF) and inversed document frequency (IDF) values for each feature. After training SVM with the features, SVM classifies each email as spam mail or not. In experiment, we acquired 82.7% of accuracy with e-mail data collected from a web mail system.

  • PDF

An Effective Counterattack System for the Voice Spam (효과적인 음성스팸 역공격 시스템)

  • Park, Haeryong;Park, Sujeong;Park, Kangil;Jung, Chanwoo;KIM, Jongpyo;Choi, KeunMo;Mo, Yonghun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.6
    • /
    • pp.1267-1277
    • /
    • 2021
  • The phone number used for advertising messages and voices used as bait in the voice phishing crime access stage is being used to send out a large amount of illegal loan spam, so we want to quickly block it. In this paper, our system is designed to block the usage of the phone number by rapidly restricting the use of the voice spam phone number that conducts illegal loan spam and voice phishing, and at the same time sends continuous calls to the phone number to prevent smooth phone call connection. The proposed system is a representative collaboration model between an illegal spam reporting agency and an investigation agency. As a result of developing the system and applying it in practice, the number of reports of illegal loaned voice spam and text spam decreased by 1/3, respectively. We can prove the effectiveness of this system by confirming that.

A Novel Statistical Feature Selection Approach for Text Categorization

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1397-1409
    • /
    • 2017
  • For text categorization task, distinctive text features selection is important due to feature space high dimensionality. It is important to decrease the feature space dimension to decrease processing time and increase accuracy. In the current study, for text categorization task, we introduce a novel statistical feature selection approach. This approach measures the term distribution in all collection documents, the term distribution in a certain category and the term distribution in a certain class relative to other classes. The proposed method results show its superiority over the traditional feature selection methods.