• Title/Summary/Keyword: Spam Detection

Search Result 58, Processing Time 0.03 seconds

The Adaptive SPAM Mail Detection System using Clustering based on Text Mining

  • Hong, Sung-Sam;Kong, Jong-Hwan;Han, Myung-Mook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.6
    • /
    • pp.2186-2196
    • /
    • 2014
  • Spam mail is one of the most general mail dysfunctions, which may cause psychological damage to internet users. As internet usage increases, the amount of spam mail has also gradually increased. Indiscriminate sending, in particular, occurs when spam mail is sent using smart phones or tablets connected to wireless networks. Spam mail consists of approximately 68% of mail traffic; however, it is believed that the true percentage of spam mail is at a much more severe level. In order to analyze and detect spam mail, we introduce a technique based on spam mail characteristics and text mining; in particular, spam mail is detected by extracting the linguistic analysis and language processing. Existing spam mail is analyzed, and hidden spam signatures are extracted using text clustering. Our proposed method utilizes a text mining system to improve the detection and error detection rates for existing spam mail and to respond to new spam mail types.

Spam Image Detection Model based on Deep Learning for Improving Spam Filter

  • Seong-Guk Nam;Dong-Gun Lee;Yeong-Seok Seo
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.289-301
    • /
    • 2023
  • Due to the development and dissemination of modern technology, anyone can easily communicate using services such as social network service (SNS) through a personal computer (PC) or smartphone. The development of these technologies has caused many beneficial effects. At the same time, bad effects also occurred, one of which was the spam problem. Spam refers to unwanted or rejected information received by unspecified users. The continuous exposure of such information to service users creates inconvenience in the user's use of the service, and if filtering is not performed correctly, the quality of service deteriorates. Recently, spammers are creating more malicious spam by distorting the image of spam text so that optical character recognition (OCR)-based spam filters cannot easily detect it. Fortunately, the level of transformation of image spam circulated on social media is not serious yet. However, in the mail system, spammers (the person who sends spam) showed various modifications to the spam image for neutralizing OCR, and therefore, the same situation can happen with spam images on social media. Spammers have been shown to interfere with OCR reading through geometric transformations such as image distortion, noise addition, and blurring. Various techniques have been studied to filter image spam, but at the same time, methods of interfering with image spam identification using obfuscated images are also continuously developing. In this paper, we propose a deep learning-based spam image detection model to improve the existing OCR-based spam image detection performance and compensate for vulnerabilities. The proposed model extracts text features and image features from the image using four sub-models. First, the OCR-based text model extracts the text-related features, whether the image contains spam words, and the word embedding vector from the input image. Then, the convolution neural network-based image model extracts image obfuscation and image feature vectors from the input image. The extracted feature is determined whether it is a spam image by the final spam image classifier. As a result of evaluating the F1-score of the proposed model, the performance was about 14 points higher than the OCR-based spam image detection performance.

Knowledge Graph-based Korean New Words Detection Mechanism for Spam Filtering (스팸 필터링을 위한 지식 그래프 기반의 신조어 감지 매커니즘)

  • Kim, Ji-hye;Jeong, Ok-ran
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.79-85
    • /
    • 2020
  • Today, to block spam texts on smartphone, a simple string comparison between text messages and spam keywords or a blocking spam phone numbers is used. As results, spam text is sent in a gradually hanged way to prevent if from being automatically blocked. In particular, for words included in spam keywords, spam texts are sent to abnormal words using special characters, Chinese characters, and whitespace to prevent them from being detected by simple string match. There is a limit that traditional spam filtering methods can't block these spam texts well. Therefore, new technologies are needed to respond to changing spam text messages. In this paper, we propose a knowledge graph-based new words detection mechanism that can detect new words frequently used in spam texts and respond to changing spam texts. Also, we show experimental results of the performance when detected Korean new words are applied to the Naive Bayes algorithm.

Finding Rotten Eggs: A Review Spam Detection Model using Diverse Feature Sets

  • Akram, Abubakker Usman;Khan, Hikmat Ullah;Iqbal, Saqib;Iqbal, Tassawar;Munir, Ehsan Ullah;Shafi, Dr. Muhammad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.10
    • /
    • pp.5120-5142
    • /
    • 2018
  • Social media enables customers to share their views, opinions and experiences as product reviews. These product reviews facilitate customers in buying quality products. Due to the significance of online reviews, fake reviews, commonly known as spam reviews are generated to mislead the potential customers in decision-making. To cater this issue, review spam detection has become an active research area. Existing studies carried out for review spam detection have exploited feature engineering approach; however limited number of features are considered. This paper proposes a Feature-Centric Model for Review Spam Detection (FMRSD) to detect spam reviews. The proposed model examines a wide range of feature sets including ratings, sentiments, content, and users. The experimentation reveals that the proposed technique outperforms the baseline and provides better results.

Coward Analysis based Spam SMS Detection Scheme (동시출현 단어분석 기반 스팸 문자 탐지 기법)

  • Oh, Hayoung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.3
    • /
    • pp.693-700
    • /
    • 2016
  • Analyzing characteristics of spam text messages had limitations since spam datasets are typically difficult to obtain publicly and previous studies focused on spam email. Although existing studies, such as through the use of spam e-mail characterization and utilization of data mining techniques, there are limitations that influence is limited to high spam detection techniques using a single word character. In this paper, we reveal the characteristics of the spam SMS based on experiment and analysis from different perspectives and propose coward analysis based spam SMS detection scheme with a publicly disclosed spam SMS from the University of Singapore. With the extensive performance evaluations, we show false positive and false negative of the proposed method is less than 2%.

Detecting Spam Data for Securing the Reliability of Text Analysis (텍스트 분석의 신뢰성 확보를 위한 스팸 데이터 식별 방안)

  • Hyun, Yoonjin;Kim, Namgyu
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.493-504
    • /
    • 2017
  • Recently, tremendous amounts of unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers and practitioners as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, more and more attempts to gain profits by distorting text data maliciously or nonmaliciously are also increasing. This increase in spam text data not only burdens users who want to obtain useful information with a large amount of inappropriate information, but also damages the reliability of information and information providers. Therefore, efforts must be made to improve the reliability of information and the quality of analysis results by detecting and removing spam data in advance. For this purpose, many studies to detect spam have been actively conducted in areas such as opinion spam detection, spam e-mail detection, and web spam detection. In this study, we introduce core concepts and current research trends of spam detection and propose a methodology to detect the spam tag of a blog as one of the challenging attempts to improve the reliability of blog information.

Unsupervised Scheme for Reverse Social Engineering Detection in Online Social Networks (온라인 소셜 네트워크에서 역 사회공학 탐지를 위한 비지도학습 기법)

  • Oh, Hayoung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.3
    • /
    • pp.129-134
    • /
    • 2015
  • Since automatic social engineering based spam attacks induce for users to click or receive the short message service (SMS), e-mail, site address and make a relationship with an unknown friend, it is very easy for them to active in online social networks. The previous spam detection schemes only apply manual filtering of the system managers or labeling classifications regardless of the features of social networks. In this paper, we propose the spam detection metric after reflecting on a couple of features of social networks followed by analysis of real social network data set, Twitter spam. In addition, we provide the online social networks based unsupervised scheme for automated social engineering spam with self organizing map (SOM). Through the performance evaluation, we show the detection accuracy up to 90% and the possibility of real time training for the spam detection without the manager.

Modeling and Evaluating Information Diffusion for Spam Detection in Micro-blogging Networks

  • Chen, Kan;Zhu, Peidong;Chen, Liang;Xiong, Yueshan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.8
    • /
    • pp.3005-3027
    • /
    • 2015
  • Spam has become one of the top threats of micro-blogging networks as the representations of rumor spreading, advertisement abusing and malware distribution. With the increasing popularity of micro-blogging, the problems will exacerbate. Prior detection tools are either designed for specific types of spams or not robust enough. Spammers may escape easily from being detected by adjusting their behaviors. In this paper, we present a novel model to quantitatively evaluate information diffusion in micro-blogging networks. Under this model, we found that spam posts differ wildly from the non-spam ones. First, the propagations of non-spam posts mostly result from their followers, but those of spam posts are mainly from strangers. Second, the non-spam posts relatively last longer than the spam posts. Besides, the non-spam posts always get their first reposts/comments much sooner than the spam posts. With the features defined in our model, we propose an RBF-based approach to detect spams. Different from the previous works, in which the features are extracted from individual profiles or contents, the diffusion features are not determined by any single user but the crowd. Thus, our method is more robust because any single user's behavior changes will not affect the effectiveness. Besides, although the spams vary in types and forms, they're propagated in the same way, so our method is effective for all types of spams. With the real data crawled from the leading micro-blogging services of China, we are able to evaluate the effectiveness of our model. The experiment results show that our model can achieve high accuracy both in precision and recall.

Performance Evaluation of Review Spam Detection for a Domestic Shopping Site Application (국내 쇼핑 사이트 적용을 위한 리뷰 스팸 탐지 방법의 성능 평가)

  • Park, Jihyun;Kim, Chong-kwon
    • Journal of KIISE
    • /
    • v.44 no.4
    • /
    • pp.339-343
    • /
    • 2017
  • As the number of customers who write fake reviews is increasing, online shopping sites have difficulty in providing reliable reviews. Fake reviews are called review spam, and they are written to promote or defame the product. They directly affect sales volume of the product; therefore, it is important to detect review spam. Review spam detection methods suggested in prior researches were only based on an international site even though review spam is a widespread problem in domestic shopping sites. In this paper, we have presented new review features of the domestic shopping site NAVER, and we have applied the formerly introduced method to this site for performing an evaluation.

A Re-configuration Scheme for Social Network Based Large-scale SMS Spam (소셜 네트워크 기반 대량의 SMS 스팸 데이터 재구성 기법)

  • Jeong, Sihyun;Noh, Giseop;Oh, Hayoung;Kim, Chong-Kwon
    • Journal of KIISE
    • /
    • v.42 no.6
    • /
    • pp.801-806
    • /
    • 2015
  • The Short Message Service (SMS) is one of the most popular communication tools in the world. As the cost of SMS decreases, SMS spam has been growing largely. Even though there are many existing studies on SMS spam detection, researchers commonly have limitation collecting users' private SMS contents. They need to gather the information related to social network as well as personal SMS due to the intelligent spammers being aware of the social networks. Therefore, this paper proposes the Social network Building Scheme for SMS spam detection (SBSS) algorithm that builds synthetic social network dataset realistically, without the collection of private information. Also, we analyze and categorize the attack types of SMS spam to build more complete and realistic social network dataset including SMS spam.