• Title/Summary/Keyword: Spam Detection

Search Result 58, Processing Time 0.027 seconds

On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection

  • AKINYELU, Andronicus Ayobami;ADEWUMI, Aderemi Oluyinka
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.3
    • /
    • pp.1348-1375
    • /
    • 2018
  • Support Vector Machine (SVM) is a well-known machine learning classification algorithm, which has been widely applied to many data mining problems, with good accuracy. However, SVM classification speed decreases with increase in dataset size. Some applications, like video surveillance and intrusion detection, requires a classifier to be trained very quickly, and on large datasets. Hence, this paper introduces two filter-based instance selection techniques for optimizing SVM training speed. Fast classification is often achieved at the expense of classification accuracy, and some applications, such as phishing and spam email classifiers, are very sensitive to slight drop in classification accuracy. Hence, this paper also introduces two wrapper-based instance selection techniques for improving SVM predictive accuracy and training speed. The wrapper and filter based techniques are inspired by Cuckoo Search Algorithm and Bat Algorithm. The proposed techniques are validated on three popular e-fraud types: credit card fraud, spam email and phishing email. In addition, the proposed techniques are validated on 20 other datasets provided by UCI data repository. Moreover, statistical analysis is performed and experimental results reveals that the filter-based and wrapper-based techniques significantly improved SVM classification speed. Also, results reveal that the wrapper-based techniques improved SVM predictive accuracy in most cases.

A Splog Detection System Using Support Vector Systems (지지벡터기계를 이용한 스팸 블로그(Splog) 판별 시스템)

  • Lee, Song-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.1
    • /
    • pp.163-168
    • /
    • 2011
  • Blogs are an easy way to publish information, engage in discussions, and form communities on the Internet. Recently, there are several varieties of spam blog whose purpose is to host ads or raise the PageRank of target sites. Our purpose is to develope the system which detects these spam blogs (splogs) automatically among blogs on Web environment. After removing HTML of blogs, they are tagged by part of speech(POS) tagger. Words and their POS tags information is used as a feature type. Among features, we select useful features with X2 statistics and train the SVM with the selected features. Our system acquired 90.5% of F1 measure with SPLOG data set.

Splog Detection Using Post Structure Similarity and Daily Posting Count (포스트의 구조 유사성과 일일 발행수를 이용한 스플로그 탐지)

  • Beak, Jee-Hyun;Cho, Jung-Sik;Kim, Sung-Kwon
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.2
    • /
    • pp.137-147
    • /
    • 2010
  • A blog is a website, usually maintained by an individual, with regular entries of commentary, descriptions of events, or other material such as graphics or video. Entries are commonly displayed in reverse chronological order. Blog search engines, like web search engines, seek information for searchers on blogs. Blog search engines sometimes output unsatisfactory results, mainly due to spam blogs or splogs. Splogs are blogs hosting spam posts, plagiarized or auto-generated contents for the sole purpose of hosting advertizements or raising the search rankings of target sites. This thesis focuses on splog detection. This thesis proposes a new splog detection method, which is based on blog post structure similarity and posting count per day. Experiments based on methods proposed a day show excellent result on splog detection tasks with over 90% accuracy.

Modeling and Simulation for Performance Evaluation of VoIP Spam Detection Mechanism (VoIP 스팸 탐지 기술의 성능 평가를 위한 모델링 및 시물레이션)

  • Kim, Ji-Yeon;Kim, Hyung-Jong;Kim, Myuhng-Joo;Jeong, Jong-Il
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.19 no.3
    • /
    • pp.95-105
    • /
    • 2009
  • Spam call is one of the main security threat in VoIP services. In this paper, we have designed simulation model for performance evaluation of VoIP spam defense mechanism. The simulation model has functions for performance evaluation such as calls generation and input/output comparison. Four representative caller models have been developed for performance evaluation and each model has its own characteristics as statistical parameters. The target mechanism of performance evaluation is SPIT(Spam over Internet Telephony) level decision algorithm, and we have derived SPIT levels of caller models. The performance evaluation model is designed using the DEVS formalism and DEVSJAVA$^{TM}$ is exploited for development and execution of simulation models.

Design of intelligent fire detection / emergency based on wireless sensor network (무선 센서 네트워크 기반 지능형 화재 감지/경고 시스템 설계)

  • Kim, Sung-Ho;Youk, Yui-Su
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.3
    • /
    • pp.310-315
    • /
    • 2007
  • When a mail was given to users, each user's response could be different according to his or her preference. This paper presents a solution for this situation by constructing a u!;or preferred ontology for anti-spam systems. To define an ontology for describing user behaviors, we applied associative classification mining to study preference information of users and their responses to emails. Generated classification rules can be represented in a formal ontology language. A user preferred ontology can explain why mail is decided to be spam or non-spam in a meaningful way. We also suggest a nor rule optimization procedure inspired from logic synthesis to improve comprehensibility and exclude redundant rules.

A study on Countermeasures by Detecting Trojan-type Downloader/Dropper Malicious Code

  • Kim, Hee Wan
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.288-294
    • /
    • 2021
  • There are various ways to be infected with malicious code due to the increase in Internet use, such as the web, affiliate programs, P2P, illegal software, DNS alteration of routers, word processor vulnerabilities, spam mail, and storage media. In addition, malicious codes are produced more easily than before through automatic generation programs due to evasion technology according to the advancement of production technology. In the past, the propagation speed of malicious code was slow, the infection route was limited, and the propagation technology had a simple structure, so there was enough time to study countermeasures. However, current malicious codes have become very intelligent by absorbing technologies such as concealment technology and self-transformation, causing problems such as distributed denial of service attacks (DDoS), spam sending and personal information theft. The existing malware detection technique, which is a signature detection technique, cannot respond when it encounters a malicious code whose attack pattern has been changed or a new type of malicious code. In addition, it is difficult to perform static analysis on malicious code to which code obfuscation, encryption, and packing techniques are applied to make malicious code analysis difficult. Therefore, in this paper, a method to detect malicious code through dynamic analysis and static analysis using Trojan-type Downloader/Dropper malicious code was showed, and suggested to malicious code detection and countermeasures.

A Study on Human Vulnerability Factors of Companies : Through Spam Mail Simulation Training Experiments (스팸메일 모의훈련 현장실험을 통한 기업의 인적 취약요인 연구)

  • Lee, Jun-hee;Kwon, Hun-yeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.4
    • /
    • pp.847-857
    • /
    • 2019
  • Recently, various cyber threats such as Ransomware and APT attack are increasing by e-mail. The characteristic of such an attack is that it is important to take administrative measures by improving personal perception of security because it bypasses technological measures such as past pattern-based detection The purpose of this study is to investigate the human factors of employees who are vulnerable to spam mail attacks through field experiments and to establish future improvement plans. As a result of sending 7times spam mails to employees of a company and analyzing training report, It was confirmed that factors such as the number of training and the recipient 's gender, age, and workplace were related to the reading rate. Based on the results of this analysis, we suggest ways to improve the training and to improve the ability of each organization to carry out effective simulation training and improve the ability to respond to spam mail by awareness improvement.

Ensemble Machine Learning Model Based YouTube Spam Comment Detection (앙상블 머신러닝 모델 기반 유튜브 스팸 댓글 탐지)

  • Jeong, Min Chul;Lee, Jihyeon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.5
    • /
    • pp.576-583
    • /
    • 2020
  • This paper proposes a technique to determine the spam comments on YouTube, which have recently seen tremendous growth. On YouTube, the spammers appeared to promote their channels or videos in popular videos or leave comments unrelated to the video, as it is possible to monetize through advertising. YouTube is running and operating its own spam blocking system, but still has failed to block them properly and efficiently. Therefore, we examined related studies on YouTube spam comment screening and conducted classification experiments with six different machine learning techniques (Decision tree, Logistic regression, Bernoulli Naive Bayes, Random Forest, Support vector machine with linear kernel, Support vector machine with Gaussian kernel) and ensemble model combining these techniques in the comment data from popular music videos - Psy, Katy Perry, LMFAO, Eminem and Shakira.

Survey on Fake Review Detection of E-commerce Sites (전자 상거래 사이트의 가짜 리뷰 판별 기법 조사)

  • Ji, Chengzhang;Zhang, Jinhong;Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.79-81
    • /
    • 2014
  • People increasingly rely on sources of information from E-commerce reviews. Product reviews is an important determinant of potential customers' buying choices. They are also utilized by product manufacturers to find problems of their products and to collect competitive intelligence information about their competitors. Unfortunately, it is well-known that many online product reviews are not made by genuine costumers of products. Reviewers could write some undeserving positive reviews to promote or fake negative reviews to defame some certain product, and we call them fake product reviews. Fake product review detection makes an attempt to detect fake reviews and removes them to restore the truthful ones for readers. To the best of our knowledge, there is still less published study on this problem. In this paper, we make a survey and an attempt to give a brief overview on fake product review detection. The related work of fake product review detection is presented including web spam and spam email. Then some methods to detect fake reviews are introduced and summarized. The trend of fake product review detection is concluded finally.

  • PDF

Downscaling Forgery Detection using Pixel Value's Gradients of Digital Image (디지털 영상 픽셀값의 경사도를 이용한 Downscaling Forgery 검출)

  • RHEE, Kang Hyeon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.2
    • /
    • pp.47-52
    • /
    • 2016
  • The used digital images in the smart device and small displayer has been a downscaled image. In this paper, the detection of the downscaling image forgery is proposed using the feature vector according to the pixel value's gradients. In the proposed algorithm, AR (Autoregressive) coefficients are computed from pixel value's gradients of the image. These coefficients as the feature vectors are used in the learning of a SVM (Support Vector Machine) classification for the downscaling image forgery detector. On the performance of the proposed algorithm, it is excellent at the downscaling 90% image forgery compare to MFR (Median Filter Residual) scheme that had the same 10-Dim. feature vectors and 686-Dim. SPAM (Subtractive Pixel Adjacency Matrix) scheme. In averaging filtering ($3{\times}3$) and median filtering ($3{\times}3$) images, it has a higher detection ratio. Especially, the measured performances of all items in averaging and median filtering ($3{\times}3$), AUC (Area Under Curve) by the sensitivity and 1-specificity is approached to 1. Thus, it is confirmed that the grade evaluation of the proposed algorithm is 'Excellent (A)'.