• Title/Summary/Keyword: spam

Search Result 284, Processing Time 0.035 seconds

Spam Filter by Using X2 Statistics and Support Vector Machines (카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터)

  • Lee, Song-Wook
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.249-254
    • /
    • 2010
  • We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.

A Study on Spam Protection Technolgy for Secure VoIP Service in Broadband convergence Network Environment (BcN 환경에서 안전한 VoIP 서비스를 위한 스팸대응 기술 연구)

  • Sung, Kyung;Kim, Seok-Hun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.4
    • /
    • pp.670-676
    • /
    • 2008
  • There is a difficult plane letting a security threat to occur in Internet networks as VoIP service uses technology-based the Internet is inherent, and you protect without adjustment of the existing security solution or changes with real-time service characteristics. It is a voice to single networks The occurrence security threat that it is possible is inherent in IP networks that effort and cost to protect a data network only are complicated relatively as provide service integrated data. This paper about various response way fields to be able to prevent analysis regarding definition regarding VoIP spam and VoIP spam technology and VoIP spam.

Comparing Korean Spam Document Classification Using Document Classification Algorithms (문서 분류 알고리즘을 이용한 한국어 스팸 문서 분류 성능 비교)

  • Song, Chull-Hwan;Yoo, Seong-Joon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10c
    • /
    • pp.222-225
    • /
    • 2006
  • 한국은 다른 나라에 비해 많은 인터넷 사용자를 가지고 있다. 이에 비례해서 한국의 인터넷 유저들은 Spam Mail에 대해 많은 불편함을 호소하고 있다. 이러한 문제를 해결하기 위해 본 논문은 다양한 Feature Weighting, Feature Selection 그리고 문서 분류 알고리즘들을 이용한 한국어 스팸 문서 Filtering연구에 대해 기술한다. 그리고 한국어 문서(Spam/Non-Spam 문서)로부터 영사를 추출하고 이를 각 분류 알고리즘의 Input Feature로써 이용한다. 그리고 우리는 Feature weighting 에 대해 기존의 전통적인 방법이 아니라 각 Feature에 대해 Variance 값을 구하고 Global Feature를 선택하기 위해 Max Value Selection 방법에 적용 후에 전통적인 Feature Selection 방법인 MI, IG, CHI 들을 적용하여 Feature들을 추출한다. 이렇게 추출된 Feature들을 Naive Bayes, Support Vector Machine과 같은 분류 알고리즘에 적용한다. Vector Space Model의 경우에는 전통적인 방법 그대로 사용한다. 그 결과 우리는 Support Vector Machine Classifier, TF-IDF Variance Weighting(Combined Max Value Selection), CHI Feature Selection 방법을 사용할 경우 Recall(99.4%), Precision(97.4%), F-Measure(98.39%)의 성능을 보였다.

  • PDF

Identification and Characterization of Rodent Germ Cells-Specific Hyaluronidases

  • Kim, Ekyune;Chang, Kyu-Tae
    • Reproductive and Developmental Biology
    • /
    • v.36 no.3
    • /
    • pp.155-161
    • /
    • 2012
  • Germ cell-specific hyaluronidases such as sperm adhesion molecule 1 (SPAM1) and hyaluronoglucosaminidase 5 (Hyal5) are in part responsible for dispersal of the cumulus cell mass, which is a critical step in establishing fertilization in mammals. In this study, we identified two testis-hyaluronidases, SPAM1 and Hyal5, in hamster and rat. These two genes were expressed specifically in the testis. At the protein level, hamster SPAM1 and Hyal5 display 78.7% and 75.4% identity with mouse SPAM1 and Hyal5. Further, the activity of the enzymes with respect to cumulus cell dispersion did not differ, although we observed that the enzymatic activity differed in pH range. These studies suggest that different sperm hyaluronidases are capable of dispersing the cumulus cell mass despite differences in enzyme activity.

A Method to Block Spam Mail Automatically Through the Connection to Link URL (링크 유알엘 접속을 통한 스팸메일 자동 차단 방법에 관한 연구)

  • Jung, Nam-Cheol
    • Journal of Digital Contents Society
    • /
    • v.8 no.4
    • /
    • pp.451-458
    • /
    • 2007
  • In this paper, I developed a method whereby spam mail is automatically blocked through the connection to link URL. The blocking system works as follows. First, the system extracts information of URL linked to electronic mail which was delivered from any server on the internet. Next, the system lets itself be connected to the web pages through this URL. Last, the system blocks the electronic mail if those web pages contain any key word which was defined as a clue to spam mail.

  • PDF

An Automatic Spam e-mail Filter System Using χ2 Statistics and Support Vector Machines (카이 제곱 통계량과 지지벡터기계를 이용한 자동 스팸 메일 분류기)

  • Lee, Songwook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.592-595
    • /
    • 2009
  • We propose an automatic spam mail classifier for e-mail data using Support Vector Machines (SVM). We use a lexical form of a word and its part of speech (POS) tags as features. We select useful features with ${\chi}^2$ statistics and represent each feature using text frequency (TF) and inversed document frequency (IDF) values for each feature. After training SVM with the features, SVM classifies each email as spam mail or not. In experiment, we acquired 82.7% of accuracy with e-mail data collected from a web mail system.

  • PDF

Intelligent Spam-mail Filtering Based on Textual Information and Hyperlinks (텍스트정보와 하이퍼링크에 기반한 지능형 스팸 메일 필터링)

  • Kang, Sin-Jae;Kim, Jong-Wan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.7
    • /
    • pp.895-901
    • /
    • 2004
  • This paper describes a two-phase intelligent method for filtering spam mail based on textual information and hyperlinks. Scince the body of spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information (sender`s information and definite spam keyword lists) and less definite textual information (words or phrases, and particular features of email). In filtering spam mails, definite information is used first, and then less definite textual information is applied. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using on original email header and body only.

Analysis on Static Characteristics for Greylist-based SPIT Level Decision of VoIP SPAM Calls (VoIP 스팸 Call의 Grey List 기반 SPIT 레벨 결정을 위한 정적 속성 분석 연구*)

  • Chang, Eun-Shil;Kim, Hyoug-Jong;Kang, Seung-Seok;Cho, Young-Duk;Kim, Myuhng-Joo
    • Convergence Security Journal
    • /
    • v.7 no.3
    • /
    • pp.109-120
    • /
    • 2007
  • VoIP service provides various functions that PSTN phone service hasn't been able to provide. Since it also has superiority in service charge, the number of user is increasing these days. When we think of the other side in cost aspect, the spam caller can also send his/her commercial message over phone line using more economic way. This paper presents the characteristics that should be considered to detect the spam call using greylisting method. We have explored static and dynamic characteristics in VoIP calls, and analyzed the relation among them. Especially, we have surveyed the authentication and charging method of Korean VoIP service provider. We have analyzed each charging method using our spam call simulation result, and derived the charging method that can be favored by spam caller. The contribution of the work is in analysis result of static aspect for SPIT Level calculation in greylisting method.

  • PDF

Spam-Filtering by Identifying Automatically Generated Email Accounts (자동 생성 메일계정 인식을 통한 스팸 필터링)

  • Lee Sangho
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.378-384
    • /
    • 2005
  • In this paper, we describe a novel method of spam-filtering to improve the performance of conventional spam-filtering systems. Conventional systems filter emails by investigating words distribution in email headers or bodies. Nowadays, spammers begin making email accounts in web-based email service sites and sending emails as if they are not spams. Investigating the email accounts of those spams, we notice that there is a large difference between the automatically generated accounts and ordinaries. Based on that difference, incoming emails are classified into spam/non-spam classes. To classify emails from only account strings, we used decision trees, which have been generally used for conventional pattern classification problems. We collected about 2.15 million account strings from email service sites, and our account checker resulted in the accuracy of $96.3\%$. The previous filter system with the checker yielded the improved filtering performance.

A SVM-based Spam Filtering System for Short Message Service (SMS) (휴대폰 SMS를 위한 SVM 기반의 스팸 필터링 시스템)

  • Joe, In-Whee;Shim, Hye-Taek
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.9B
    • /
    • pp.908-913
    • /
    • 2009
  • Mobile phones became important household appliance that cannot be without in our daily lives. And the short messaging service (SMS) in these mobile phones is 1.5 to 2 times more than the voice service. However, the spam filtering functions installed in mobile phones take a method to receive specific number patterns or words and recognize spam messages when those numbers or words are present. However, this method cannot properly filters various types of spam messages currently dispatched. This paper proposes a more powerful and more adaptive spam filtering system using SVM and thesaurus. The system went through a process of isolating words from sample data through pro-processing device and integrating meanings of isolated words using a thesaurus. Then it generated characteristics of integrated words through the chi-square statistics and studied the characteristics. The proposed system is realized in a Window environment and the performance is confirmed through experiments.