• Title/Summary/Keyword: Spam Detection

Search Result 58, Processing Time 0.024 seconds

A Method for Twitter Spam Detection Using N-Gram Dictionary Under Limited Labeling (트레이닝 데이터가 제한된 환경에서 N-Gram 사전을 이용한 트위터 스팸 탐지 방법)

  • Choi, Hyeok-Jun;Park, Cheong Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.9
    • /
    • pp.445-456
    • /
    • 2017
  • In this paper, we propose a method to detect spam tweets containing unhealthy information by using an n-gram dictionary under limited labeling. Spam tweets that contain unhealthy information have a tendency to use similar words and sentences. Based on this characteristic, we show that spam tweets can be effectively detected by applying a Naive Bayesian classifier using n-gram dictionaries which are constructed from spam tweets and normal tweets. On the other hand, constructing an initial training set requires very high cost because a large amount of data flows in real time in a twitter. Therefore, there is a need for a spam detection method that can be applied in an environment where the initial training set is very small or non exist. To solve the problem, we propose a method to generate pseudo-labels by utilizing twitter's retweet function and use them for the configuration of the initial training set and the n-gram dictionary update. The results from various experiments using 1.3 million korean tweets collected from December 1, 2016 to December 7, 2016 prove that the proposed method has superior performance than the compared spam detection methods.

Detection of Zombie PCs Based on Email Spam Analysis

  • Jeong, Hyun-Cheol;Kim, Huy-Kang;Lee, Sang-Jin;Kim, Eun-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.5
    • /
    • pp.1445-1462
    • /
    • 2012
  • While botnets are used for various malicious activities, it is well known that they are widely used for email spam. Though the spam filtering systems currently in use block IPs that send email spam, simply blocking the IPs of zombie PCs participating in a botnet is not enough to prevent the spamming activities of the botnet because these IPs can easily be changed or manipulated. This IP blocking is also insufficient to prevent crimes other than spamming, as the botnet can be simultaneously used for multiple purposes. For this reason, we propose a system that detects botnets and zombie PCs based on email spam analysis. This study introduces the concept of "group pollution level" - the degree to which a certain spam group is suspected of being a botnet - and "IP pollution level" - the degree to which a certain IP in the spam group is suspected of being a zombie PC. Such concepts are applied in our system that detects botnets and zombie PCs by grouping spam mails based on the URL links or attachments contained, and by assessing the pollution level of each group and each IP address. For empirical testing, we used email spam data collected in an "email spam trap system" - Korea's national spam collection system. Our proposed system detected 203 botnets and 18,283 zombie PCs in a day and these zombie PCs sent about 70% of all the spam messages in our analysis. This shows the effectiveness of detecting zombie PCs by email spam analysis, and the possibility of a dramatic reduction in email spam by taking countermeasure against these botnets and zombie PCs.

A Scheme of VoIP Spam Detection Using Improved Multi Gray-Leveling (향상된 Multi Gray-Leveling을 통한 VoIP 스팸 탐지 기법)

  • Chae, Kang-Suk;Jung, Sou-Hwan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.8B
    • /
    • pp.630-636
    • /
    • 2012
  • In this paper, we propose an improved Multi Gray-Leveling scheme which reduces the problems of the existing Multi Gray-Leveling scheme suggested as a way of prevention against call spam in VoIP environment. The existing scheme having two different time period distinguishes the possibility of call spam by checking the call interval, so that it prevents the spammer's avoidance controlling the call interval. This is the strength of the existing one but it can misunderstand the normal user as a spammer due to taking long term time period. To solve this problem, this paper proposes the upgrade scheme which utilizes the receiver's action pattern as well as the caller's action pattern. It has such a good strength that can do gray leveling via the collected information in the database of VoIP service provider without user's direct involvement. Hence it can be a very effective way of VoIP spam detection.

A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance (크라우드소싱 기반 문장재구성 방법을 통한 의견 스팸 데이터셋 구축 및 평가)

  • Lee, Seongwoon;Kim, Seongsoon;Park, Donghyeon;Kang, Jaewoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.338-343
    • /
    • 2016
  • Today, opinion reviews on the Web are often used as a means of information exchange. As the importance of opinion reviews continues to grow, the number of issues for opinion spam also increases. Even though many research studies on detecting spam reviews have been conducted, some limitations of gold-standard datasets hinder research. Therefore, we introduce a new dataset called "Paraphrased Opinion Spam (POS)" that contains a new type of review spam that imitates truthful reviews. We have noticed that spammers refer to existing truthful reviews to fabricate spam reviews. To create such a seemingly truthful review spam dataset, we asked task participants to paraphrase truthful reviews to create a new deceptive review. The experiment results show that classifying our POS dataset is more difficult than classifying the existing spam datasets since the reviews in our dataset more linguistically look like truthful reviews. Also, training volume has been found to be an important factor for classification model performance.

Incremental SVM for Online Product Review Spam Detection (온라인 제품 리뷰 스팸 판별을 위한 점증적 SVM)

  • Ji, Chengzhang;Zhang, Jinhong;Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.89-93
    • /
    • 2014
  • Reviews are very important for potential consumer' making choices. They are also used by manufacturers to find problems of their products and to collect competitors' business information. But someone write fake reviews to mislead readers to make wrong choices. Therefore detecting fake reviews is an important problem for the E-commerce sites. Support Vector Machines (SVMs) are very important text classification algorithms with excellent performance. In this paper, we propose a new incremental algorithm based on weight and the extension of Karush-Kuhn-Tucker(KKT) conditions and Convex Hull for online Review Spam Detection. Finally, we analyze its performance in theory.

  • PDF

A Study on Clustering of SNS SPAM using Heuristic Method (경험기법을 사용한 SNS 스팸의 클러스터링에 관한 연구)

  • Kwon, Young-Man;Lee, In-Rak;Kim, Myung-Gwan
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.7-12
    • /
    • 2014
  • It has good features for social networking with friends SNS is maintained. However, various enterprises, individuals invading the inconvenience spammers have exposure to a number of users to tweet spam. The study was conducted in the existing research on these spam tweets. However, the results showed a more accurate classification and detection is difficult because of the lack of precision and different causes. In this paper, we describe how to classify the characteristics of spammers, classification criteria. Also has a link rate and difference between followers and following, these features were present classification criteria for spammers account. This experiment was performed according to the criteria. Randomized trial of spam and non-spam accounts were selected and account type was conducted according to the criteria 68% of the link ratio of spam accounts. Followers / Following ratio was 27581.5. Non-spam accounts was 6.12%. Followers / Following ratio was 1.26.

DEVS Simulation of Spam Voice Signal Detection in VoIP Service (VoIP 스팸 콜 탐지를 위한 음성신호의 DEVS 모델링 및 시뮬레이션)

  • Kim, Ji-Yeon;Kim, Hyung-Jong;Cho, Young-Duk;Kim, Hwan-Kuk;Won, Yoo-Jae;Kim, Myuhng-Joo
    • Journal of the Korea Society for Simulation
    • /
    • v.16 no.3
    • /
    • pp.75-87
    • /
    • 2007
  • As the VoIP service quality is getting better and many shortcomings are being overcome, users are getting interested in this service. Also, there are several additional features that provide a convenience to users such as presence service, instant messaging service and so on. But, as there are always two sides of rein, some security issues have users hesitate to make use of it. This paper deals with one of the issues, the VoIP spam problem. We took into account the signal pattern of voice message in spam call and we have constructed voice signal models of normal call, normal call with noise and spam call. Each voice signal case is inserted into our spam decision algorithm which detects the spam calls based on the amount of information in the call signal. We made use of the DEVS-$Java^{TM}$ for our modeling and simulation. The contribution of this work is in suggestion of a way to detect voice spam call signal and testing of the method using modeling and simulation methodology.

  • PDF

A Study on Spam Document Classification Method using Characteristics of Keyword Repetition (단어 반복 특징을 이용한 스팸 문서 분류 방법에 관한 연구)

  • Lee, Seong-Jin;Baik, Jong-Bum;Han, Chung-Seok;Lee, Soo-Won
    • The KIPS Transactions:PartB
    • /
    • v.18B no.5
    • /
    • pp.315-324
    • /
    • 2011
  • In Web environment, a flood of spam causes serious social problems such as personal information leak, monetary loss from fishing and distribution of harmful contents. Moreover, types and techniques of spam distribution which must be controlled are varying as days go by. The learning based spam classification method using Bag-of-Words model is the most widely used method until now. However, this method is vulnerable to anti-spam avoidance techniques, which recent spams commonly have, because it classifies spam documents utilizing only keyword occurrence information from classification model training process. In this paper, we propose a spam document detection method using a characteristic of repeating words occurring in spam documents as a solution of anti-spam avoidance techniques. Recently, most spam documents have a trend of repeating key phrases that are designed to spread, and this trend can be used as a measure in classifying spam documents. In this paper, we define six variables, which represent a characteristic of word repetition, and use those variables as a feature set for constructing a classification model. The effectiveness of proposed method is evaluated by an experiment with blog posts and E-mail data. The result of experiment shows that the proposed method outperforms other approaches.

Social Network Spam Detection using Recursive Structure Features (소셜 네트워크 상에서의 재귀적 네트워크 구조 특성을 활용한 스팸탐지 기법)

  • Jang, Boyeon;Jeong, Sihyun;Kim, Chongkwon
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1231-1235
    • /
    • 2017
  • Given the network structure in online social network, it is important to determine a way to distinguish spam accounts from the network features. In online social network, the service provider attempts to detect social spamming to maintain their service quality. However the spammer group changes their strategies to avoid being detected. Even though the spammer attempts to act as legitimate users, certain distinguishable structural features are not easily changed. In this paper, we investigate a way to generate meaningful network structure features, and suggest spammer detection method using recursive structural features. From a result of real-world dataset experiment, we found that the proposed algorithm could improve the classification performance by about 8%.