• Title/Summary/Keyword: Spammers

Search Result 16, Processing Time 0.036 seconds

A Study on Clustering of SNS SPAM using Heuristic Method (경험기법을 사용한 SNS 스팸의 클러스터링에 관한 연구)

  • Kwon, Young-Man;Lee, In-Rak;Kim, Myung-Gwan
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.7-12
    • /
    • 2014
  • It has good features for social networking with friends SNS is maintained. However, various enterprises, individuals invading the inconvenience spammers have exposure to a number of users to tweet spam. The study was conducted in the existing research on these spam tweets. However, the results showed a more accurate classification and detection is difficult because of the lack of precision and different causes. In this paper, we describe how to classify the characteristics of spammers, classification criteria. Also has a link rate and difference between followers and following, these features were present classification criteria for spammers account. This experiment was performed according to the criteria. Randomized trial of spam and non-spam accounts were selected and account type was conducted according to the criteria 68% of the link ratio of spam accounts. Followers / Following ratio was 27581.5. Non-spam accounts was 6.12%. Followers / Following ratio was 1.26.

Spam Image Detection Model based on Deep Learning for Improving Spam Filter

  • Seong-Guk Nam;Dong-Gun Lee;Yeong-Seok Seo
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.289-301
    • /
    • 2023
  • Due to the development and dissemination of modern technology, anyone can easily communicate using services such as social network service (SNS) through a personal computer (PC) or smartphone. The development of these technologies has caused many beneficial effects. At the same time, bad effects also occurred, one of which was the spam problem. Spam refers to unwanted or rejected information received by unspecified users. The continuous exposure of such information to service users creates inconvenience in the user's use of the service, and if filtering is not performed correctly, the quality of service deteriorates. Recently, spammers are creating more malicious spam by distorting the image of spam text so that optical character recognition (OCR)-based spam filters cannot easily detect it. Fortunately, the level of transformation of image spam circulated on social media is not serious yet. However, in the mail system, spammers (the person who sends spam) showed various modifications to the spam image for neutralizing OCR, and therefore, the same situation can happen with spam images on social media. Spammers have been shown to interfere with OCR reading through geometric transformations such as image distortion, noise addition, and blurring. Various techniques have been studied to filter image spam, but at the same time, methods of interfering with image spam identification using obfuscated images are also continuously developing. In this paper, we propose a deep learning-based spam image detection model to improve the existing OCR-based spam image detection performance and compensate for vulnerabilities. The proposed model extracts text features and image features from the image using four sub-models. First, the OCR-based text model extracts the text-related features, whether the image contains spam words, and the word embedding vector from the input image. Then, the convolution neural network-based image model extracts image obfuscation and image feature vectors from the input image. The extracted feature is determined whether it is a spam image by the final spam image classifier. As a result of evaluating the F1-score of the proposed model, the performance was about 14 points higher than the OCR-based spam image detection performance.

An Extended Work Architecture for Online Threat Prediction in Tweeter Dataset

  • Sheoran, Savita Kumari;Yadav, Partibha
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.97-106
    • /
    • 2021
  • Social networking platforms have become a smart way for people to interact and meet on internet. It provides a way to keep in touch with friends, families, colleagues, business partners, and many more. Among the various social networking sites, Twitter is one of the fastest-growing sites where users can read the news, share ideas, discuss issues etc. Due to its vast popularity, the accounts of legitimate users are vulnerable to the large number of threats. Spam and Malware are some of the most affecting threats found on Twitter. Therefore, in order to enjoy seamless services it is required to secure Twitter against malicious users by fixing them in advance. Various researches have used many Machine Learning (ML) based approaches to detect spammers on Twitter. This research aims to devise a secure system based on Hybrid Similarity Cosine and Soft Cosine measured in combination with Genetic Algorithm (GA) and Artificial Neural Network (ANN) to secure Twitter network against spammers. The similarity among tweets is determined using Cosine with Soft Cosine which has been applied on the Twitter dataset. GA has been utilized to enhance training with minimum training error by selecting the best suitable features according to the designed fitness function. The tweets have been classified as spammer and non-spammer based on ANN structure along with the voting rule. The True Positive Rate (TPR), False Positive Rate (FPR) and Classification Accuracy are considered as the evaluation parameter to evaluate the performance of system designed in this research. The simulation results reveals that our proposed model outperform the existing state-of-arts.

A Spam Filter System Based on Maximum Entropy Model Using Co-training with Spamminess Features and URL Features (스팸성 자질과 URL 자질의 공동 학습을 이용한 최대 엔트로피 기반 스팸메일 필터 시스템)

  • Gong, Mi-Gyoung;Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.1
    • /
    • pp.61-68
    • /
    • 2008
  • This paper presents a spam filter system using co-training with spamminess features and URL features based on the maximum entropy model. Spamminess features are the emphasizing patterns or abnormal patterns in spam messages used by spammers to express their intention and to avoid being filtered by the spam filter system. Since spammers use URLs to give the details and make a change to the URL format not to be filtered by the black list, normal and abnormal URLs can be key features to detect the spam messages. Co-training with spamminess features and URL features uses two different features which are independent each other in training. The filter system can learn information from them independently. Experiment results on TREC spam test collection shows that the proposed approach achieves 9.1% improvement and 6.9% improvement in accuracy compared to the base system and bogo filter system, respectively. The result analysis shows that the proposed spamminess features and URL features are helpful. And an experiment result of the co-training shows that two feature sets are useful since the number of training documents are reduced while the accuracy is closed to the batch learning.

Improved Spam Filter via Handling of Text Embedded Image E-mail

  • Youn, Seongwook;Cho, Hyun-Chong
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.1
    • /
    • pp.401-407
    • /
    • 2015
  • The increase of image spam, a kind of spam in which the text message is embedded into attached image to defeat spam filtering technique, is a major problem of the current e-mail system. For nearly a decade, content based filtering using text classification or machine learning has been a major trend of anti-spam filtering system. Recently, spammers try to defeat anti-spam filter by many techniques. Text embedding into attached image is one of them. We proposed an ontology spam filters. However, the proposed system handles only text e-mail and the percentage of attached images is increasing sharply. The contribution of the paper is that we add image e-mail handling capability into the anti-spam filtering system keeping the advantages of the previous text based spam e-mail filtering system. Also, the proposed system gives a low false negative value, which means that user's valuable e-mail is rarely regarded as a spam e-mail.

Specifying Spammers by Cycle Detection in Social Network (소셜 네트웍의 순환 관계를 적용한 스패머 특정화)

  • Eom, Chris Soo-Hyun;Lee, Woo-Key;Lee, James Jung-Hun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06d
    • /
    • pp.19-20
    • /
    • 2012
  • 소셜 네트웍을 통한 다양한 서비스들은 새로운 비즈니스 모델의 형성이라는 긍정적인 측면이 있는 반면에, 개인정보 누출이나 스팸과 같이 부정적인 측면도 등장하고 있다. 현재 스팸을 차단하기 위해 스팸방지 가이드라인, 스팸 단어 검색에 의한 스팸 메시지 차단 등 많은 방법 및 연구들이 논의되어 왔다. 하지만, 기존의 차수를 활용한 방법은 스팸이 아닌 정점 또한 스팸으로 간주하는 문제점을 가지고 있어 부정확하다는 단점이 있다. 본 연구에서는 이를 해결하고자 다른 구조적 특성인 순환을 분석하여 스팸들을 차단하는 방법을 제안하고 그 효과를 입증하였다.

A Technique of Statistical Message Filtering for Blocking Spam Message (통계적 기법을 이용한 스팸메시지 필터링 기법)

  • Kim, Seongyoon;Cha, Taesoo;Park, Jeawon;Choi, Jaehyun;Lee, Namyong
    • Journal of Information Technology Services
    • /
    • v.13 no.3
    • /
    • pp.299-308
    • /
    • 2014
  • Due to indiscriminately received spam messages on information society, spam messages cause damages not only to person but also to our community. Nowadays a lot of spam filtering techniques, such as blocking characters, are studied actively. Most of these studies are content-based spam filtering technologies through machine learning.. Because of a spam message transmission techniques are being developed, spammers have to send spam messages using term spamming techniques. Spam messages tend to include number of nouns, using repeated words and inserting special characters between words in a sentence. In this paper, considering three features, SPSS statistical program were used in parameterization and we derive the equation. And then, based on this equation we measured the performance of classification of spam messages. The study compared with previous studies FP-rate in terms of further minimizing the cost of product was confirmed to show an excellent performance.

Comparative Study of Machine learning Techniques for Spammer Detection in Social Bookmarking Systems (소셜 복마킹 시스템의 스패머 탐지를 위한 기계학습 기술의 성능 비교)

  • Kim, Chan-Ju;Hwang, Kyu-Baek
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.5
    • /
    • pp.345-349
    • /
    • 2009
  • Social bookmarking systems are a typical web 2.0 service based on folksonomy, providing the platform for storing and sharing bookmarking information. Spammers in social bookmarking systems denote the users who abuse the system for their own interests in an improper way. They can make the entire resources in social bookmarking systems useless by posting lots of wrong information. Hence, it is important to detect spammers as early as possible and protect social bookmarking systems from their attack. In this paper, we applied a diverse set of machine learning approaches, i.e., decision tables, decision trees (ID3), $na{\ddot{i}}ve$ Bayes classifiers, TAN (tree-augment $na{\ddot{i}}ve$ Bayes) classifiers, and artificial neural networks to this task. In our experiments, $na{\ddot{i}}ve$ Bayes classifiers performed significantly better than other methods with respect to the AUC (area under the ROC curve) score as veil as the model building time. Plausible explanations for this result are as follows. First, $na{\ddot{i}}ve$> Bayes classifiers art known to usually perform better than decision trees in terms of the AUC score. Second, the spammer detection problem in our experiments is likely to be linearly separable.

A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance (크라우드소싱 기반 문장재구성 방법을 통한 의견 스팸 데이터셋 구축 및 평가)

  • Lee, Seongwoon;Kim, Seongsoon;Park, Donghyeon;Kang, Jaewoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.338-343
    • /
    • 2016
  • Today, opinion reviews on the Web are often used as a means of information exchange. As the importance of opinion reviews continues to grow, the number of issues for opinion spam also increases. Even though many research studies on detecting spam reviews have been conducted, some limitations of gold-standard datasets hinder research. Therefore, we introduce a new dataset called "Paraphrased Opinion Spam (POS)" that contains a new type of review spam that imitates truthful reviews. We have noticed that spammers refer to existing truthful reviews to fabricate spam reviews. To create such a seemingly truthful review spam dataset, we asked task participants to paraphrase truthful reviews to create a new deceptive review. The experiment results show that classifying our POS dataset is more difficult than classifying the existing spam datasets since the reviews in our dataset more linguistically look like truthful reviews. Also, training volume has been found to be an important factor for classification model performance.