• Title/Summary/Keyword: 스팸 필터

Search Result 94, Processing Time 0.019 seconds

Automatic e-mail Hierarchy Classification using Dynamic Category Hierarchy and Principal Component Analysis (PCA와 동적 분류체계를 사용한 자동 이메일 계층 분류)

  • Park, Sun
    • Journal of Advanced Navigation Technology
    • /
    • v.13 no.3
    • /
    • pp.419-425
    • /
    • 2009
  • The amount of incoming e-mails is increasing rapidly due to the wide usage of Internet. Therefore, it is more required to classify incoming e-mails efficiently and accurately. Currently, the e-mail classification techniques are focused on two way classification to filter spam mails from normal ones based mainly on Bayesian and Rule. The clustering method has been used for the multi-way classification of e-mails. But it has a disadvantage of low accuracy of classification and no category labels. The classification methods have a disadvantage of training and setting of category labels by user. In this paper, we propose a novel multi-way e-mail hierarchy classification method that uses PCA for automatic category generation and dynamic category hierarchy for high accuracy of classification. It classifies a huge amount of incoming e-mails automatically, efficiently, and accurately.

  • PDF

A study on the Filtering of Spam E-mail using n-Gram indexing and Support Vector Machine (n-Gram 색인화와 Support Vector Machine을 사용한 스팸메일 필터링에 대한 연구)

  • 서정우;손태식;서정택;문종섭
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.14 no.2
    • /
    • pp.23-33
    • /
    • 2004
  • Because of a rapid growth of internet environment, it is also fast increasing to exchange message using e-mail. But, despite the convenience of e-mail, it is rising a currently bi9 issue to waste their time and cost due to the spam mail in an individual or enterprise. Many kinds of solutions have been studied to solve harmful effects of spam mail. Such typical methods are as follows; pattern matching using the keyword with representative method and method using the probability like Naive Bayesian. In this paper, we propose a classification method of spam mails from normal mails using Support Vector Machine, which has excellent performance in pattern classification problems, to compensate for the problems of existing research. Especially, the proposed method practices efficiently a teaming procedure with a word dictionary including a generated index by the n-Gram. In the conclusion, we verified the proposed method through the accuracy comparison of spm mail separation between an existing research and proposed scheme.

Competition Relation Extraction based on Combining Machine Learning and Filtering (기계학습 및 필터링 방법을 결합한 경쟁관계 인식)

  • Lee, ChungHee;Seo, YoungHoon;Kim, HyunKi
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.367-378
    • /
    • 2015
  • This study was directed at the design of a hybrid algorithm for competition relation extraction. Previous works on relation extraction have relied on various lexical and deep parsing indicators and mostly utilize only the machine learning method. We present a new algorithm integrating machine learning with various filtering methods. Some simple but useful features for competition relation extraction are also introduced, and an optimum feature set is proposed. The goal of this paper was to increase the precision of competition relation extraction by combining supervised learning with various filtering methods. Filtering methods were employed for classifying compete relation occurrence, using distance restriction for the filtering of feature pairs, and classifying whether or not the candidate entity pair is spam. For evaluation, a test set consisting of 2,565 sentences was examined. The proposed method was compared with the rule-based method and general relation extraction method. As a result, the rule-based method achieved positive precision of 0.812 and accuracy of 0.568, while the general relation extraction method achieved 0.612 and 0.563, respectively. The proposed system obtained positive precision of 0.922 and accuracy of 0.713. These results demonstrate that the developed method is effective for competition relation extraction.

Finding Influential Users in the SNS Using Interaction Concept : Focusing on the Blogosphere with Continuous Referencing Relationships (상호작용성에 의한 SNS 영향유저 선정에 관한 연구 : 연속적인 참조관계가 있는 블로고스피어를 중심으로)

  • Park, Hyunjung;Rho, Sangkyu
    • The Journal of Society for e-Business Studies
    • /
    • v.17 no.4
    • /
    • pp.69-93
    • /
    • 2012
  • Various influence-related relationships in Social Network Services (SNS) among users, posts, and user-and-post, can be expressed using links. The current research evaluates the influence of specific users or posts by analyzing the link structure of relevant social network graphs to identify influential users. We applied the concept of mutual interactions proposed for ranking semantic web resources, rather than the voting notion of Page Rank or HITS, to blogosphere, one of the early SNS. Through many experiments with network models, where the performance and validity of each alternative approach can be analyzed, we showed the applicability and strengths of our approach. The weight tuning processes for the links of these network models enabled us to control the experiment errors form the link weight differences and compare the implementation easiness of alternatives. An additional example of how to enter the content scores of commercial or spam posts into the graph-based method is suggested on a small network model as well. This research, as a starting point of the study on identifying influential users in SNS, is distinctive from the previous researches in the following points. First, various influence-related properties that are deemed important but are disregarded, such as scraping, commenting, subscribing to RSS feeds, and trusting friends, can be considered simultaneously. Second, the framework reflects the general phenomenon where objects interacting with more influential objects increase their influence. Third, regarding the extent to which a bloggers causes other bloggers to act after him or her as the most important factor of influence, we treated sequential referencing relationships with a viewpoint from that of PageRank or HITS (Hypertext Induced Topic Selection).