• 제목/요약/키워드: URL Filtering

검색결과 19건 처리시간 0.026초

온톨로지 인스턴스 구축을 위한 주제 중심 웹문서 수집에 관한 연구 (A Study on Focused Crawling of Web Document for Building of Ontology Instances)

  • 장문수
    • 한국지능시스템학회논문지
    • /
    • 제18권1호
    • /
    • pp.86-93
    • /
    • 2008
  • 복잡한 의미관계를 정의하는 온톨로지를 구축하는 일은 매우 정밀하고 전문적인 작업이다. 잘 구축된 온톨로지를 응용 시스템에 활용하기 위해서는 온톨로지 클래스에 대한 많은 인스턴스 정보를 구축해야 한다. 본 논문은 온톨로지 인스턴스 정보 추출을 위하여 방대한 양의 웹 문서로부터 주어진 주제에 적합한 문서만을 추출하는 주제 중심 웹 문서 수집 알고리즘을 제안하고, 이 알고리즘을 바탕으로 문서 수집 시스템을 개발한다. 제안하는 문서 수집 알고리즘은 URL의 패턴을 이용하여 주제에 적합한 링크만을 추출함으로써 빠른 속도의 문서 수집을 가능하게 한다. 또한 링크 블록 텍스트에 대한 퍼지집합으로 표현된 주제 적합도는 문서의 주제 관련성을 지능적으로 판단하여 주제 중심 문서 수집의 정확도를 향상시킨다.

Analyzing the Effect of Lexical and Conceptual Information in Spam-mail Filtering System

  • Kang Sin-Jae;Kim Jong-Wan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제6권2호
    • /
    • pp.105-109
    • /
    • 2006
  • In this paper, we constructed a two-phase spam-mail filtering system based on the lexical and conceptual information. There are two kinds of information that can distinguish the spam mail from the ham (non-spam) mail. The definite information is the mail sender's information, URL, a certain spam keyword list, and the less definite information is the word list and concept codes extracted from the mail body. We first classified the spam mail by using the definite information, and then used the less definite information. We used the lexical information and concept codes contained in the email body for SVM learning in the 2nd phase. According to our results the ham misclassification rate was reduced if more lexical information was used as features, and the spam misclassification rate was reduced when the concept codes were included in features as well.

Analyzing the correlation of Spam Recall and Thesaurus

  • Kang, Sin-Jae;Kim, Jong-Wan
    • 한국정보기술응용학회:학술대회논문집
    • /
    • 한국정보기술응용학회 2005년도 6th 2005 International Conference on Computers, Communications and System
    • /
    • pp.21-25
    • /
    • 2005
  • In this paper, we constructed a two-phase spam-mail filtering system based on the lexical and conceptual information. There are two kinds of information that can distinguish the spam mail from the legitimate mail. The definite information is the mail sender's information, URL, a certain spam list, and the less definite information is the word list and concept codes extracted from the mail body. We first classified the spam mail by using the definite information, and then used the less definite information. We used the lexical information and concept codes contained in the email body for SVM learning in the $2^{nd}$ phase. According to our results the spam precision was increased if more lexical information was used as features, and the spam recall was increased when the concept codes were included in features as well.

  • PDF

스팸 메일 차단솔루션의 새로운 제어 방식 제안 (The Suggestion of a New Control Method for SPAM Mail Prevention Solution)

  • 김민홍;두창호
    • 한국컴퓨터산업학회논문지
    • /
    • 제5권4호
    • /
    • pp.453-460
    • /
    • 2004
  • 스팸메일은 최근 전 세계적으로 사회문제가 되고 있으며, 이에 대한 차단 솔루션에 대한 개발 제품이 출시되고 있다. 본 논문은 기존 스팸메일 방지 솔루션을 설치 형태에 따른 분류, 장단점 분석과 스팸의 판정 법에 따른 분류 고찰하였다. 이에 기존 스팸메일 솔루션의 문제점을 도출하고 현재 적용되지 않은 새로운 필터링 방법인 URL Prefetch 방식을 새롭게 제안하고 이에 따른 방법에 의한 실험을 통한 스팸메일 차단 상승효과를 도출하고, 또한 HTML 유형 방식에 의한 차단방법도 함께 제안한다.

  • PDF

2단계 분류기법을 이용한 영상분류기 개발 (A Study on development for image detection tool using two layer voting method)

  • 김명관
    • 한국컴퓨터산업학회논문지
    • /
    • 제3권5호
    • /
    • pp.605-610
    • /
    • 2002
  • 영상물에 대한 학습과 분류를 위해 단순 베이지안, N-Nearest 방법 등이 사용된다. 이 방법들은 단순하면서 높은 정확도를 갖는다. 본 논문에서는 2단계 투표를 통해 이들 방법들을 조합하여 사용하였다. 유해 영상물들을 대상으로 학습 및 분류를 실험하였다. 결과로 색상분포에 따른 영상 분류가 실시간 처리 및 유해 영상 인식에 효과적임을 보였다. 또한 2단계 투표 방식의 알고리즘으로 약 2000장 이상의 사진을 가지고 학습 및 분류를 시행했으며 결과 80%에 가까운 높은 정확도와 대상 사진에 영향 받지 않는 안정도를 보였다.

  • PDF

다중 머신러닝 알고리즘을 이용한 악성 URL 예측 시스템 설계 및 구현 (Design and Implementation of Malicious URL Prediction System based on Multiple Machine Learning Algorithms)

  • 강홍구;신삼신;김대엽;박순태
    • 한국멀티미디어학회논문지
    • /
    • 제23권11호
    • /
    • pp.1396-1405
    • /
    • 2020
  • Cyber threats such as forced personal information collection and distribution of malicious codes using malicious URLs continue to occur. In order to cope with such cyber threats, a security technologies that quickly detects malicious URLs and prevents damage are required. In a web environment, malicious URLs have various forms and are created and deleted from time to time, so there is a limit to the response as a method of detecting or filtering by signature matching. Recently, researches on detecting and predicting malicious URLs using machine learning techniques have been actively conducted. Existing studies have proposed various features and machine learning algorithms for predicting malicious URLs, but most of them are only suggesting specialized algorithms by supplementing features and preprocessing, so it is difficult to sufficiently reflect the strengths of various machine learning algorithms. In this paper, a system for predicting malicious URLs using multiple machine learning algorithms was proposed, and an experiment was performed to combine the prediction results of multiple machine learning models to increase the accuracy of predicting malicious URLs. Through experiments, it was proved that the combination of multiple models is useful in improving the prediction performance compared to a single model.

URL Filtering by Using Machine Learning

  • Saqib, Malik Najmus
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.275-279
    • /
    • 2022
  • The growth of technology nowadays has made many things easy for humans. These things are from everyday small task to more complex tasks. Such growth also comes with the illegal activities that are perform by using technology. These illegal activities can simple as displaying annoying message to big frauds. The easiest way for the attacker to perform such activities is to convenience user to click on the malicious link. It has been a great concern since a decay to classify URLs as malicious or benign. The blacklist has been used initially for that purpose and is it being used nowadays. It is efficient but has a drawback to update blacklist automatically. So, this method is replace by classification of URLs based on machine learning algorithms. In this paper we have use four machine learning classification algorithms to classify URLs as malicious or benign. These algorithms are support vector machine, random forest, n-nearest neighbor, and decision tree. The dataset that is used in this research has 36694 instances. A comparison of precision accuracy and recall values are shown for dataset with and without preprocessing.

Detection of Zombie PCs Based on Email Spam Analysis

  • Jeong, Hyun-Cheol;Kim, Huy-Kang;Lee, Sang-Jin;Kim, Eun-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제6권5호
    • /
    • pp.1445-1462
    • /
    • 2012
  • While botnets are used for various malicious activities, it is well known that they are widely used for email spam. Though the spam filtering systems currently in use block IPs that send email spam, simply blocking the IPs of zombie PCs participating in a botnet is not enough to prevent the spamming activities of the botnet because these IPs can easily be changed or manipulated. This IP blocking is also insufficient to prevent crimes other than spamming, as the botnet can be simultaneously used for multiple purposes. For this reason, we propose a system that detects botnets and zombie PCs based on email spam analysis. This study introduces the concept of "group pollution level" - the degree to which a certain spam group is suspected of being a botnet - and "IP pollution level" - the degree to which a certain IP in the spam group is suspected of being a zombie PC. Such concepts are applied in our system that detects botnets and zombie PCs by grouping spam mails based on the URL links or attachments contained, and by assessing the pollution level of each group and each IP address. For empirical testing, we used email spam data collected in an "email spam trap system" - Korea's national spam collection system. Our proposed system detected 203 botnets and 18,283 zombie PCs in a day and these zombie PCs sent about 70% of all the spam messages in our analysis. This shows the effectiveness of detecting zombie PCs by email spam analysis, and the possibility of a dramatic reduction in email spam by taking countermeasure against these botnets and zombie PCs.

온톨로지 기반 EPC 코드 자동 변환 방법 (A method for automatic EPC code conversion based on ontology methodology)

  • 노영식;변영철
    • 한국정보통신학회논문지
    • /
    • 제12권3호
    • /
    • pp.452-460
    • /
    • 2008
  • ALE 기반 RFID 미들웨어는 리더 장치로부터 EPC 데이터를 입력받아 내부적으로 URN 형태로 변환하고 이를 필터링, 그룹핑 등을 수행한 후 응용으로 전송한다. 한편, EPC 데이터의 경우 유형이 다양할 뿐만 아니라 향후 또 다른 새로운 형식의 EPC 데이터가 제안될 수도 있으므로 RFID 미들웨어는 다양한 유형의 EPC 데이터를 효율적으로 처리할 수 있어야 한다. 본 논문에서는 EPCglobal의 ALE 표준 스펙을 기반으로 한 RFID 미들웨어에서 RFID 리더로부터 수집된 다양한 유형의 EPC 데이터를 효율적으로 처리하기 위하여 온톨로지 기반의 데이터 처리 방법을 제안한다. 즉, 다양한 유형의 EPC 데이터를 URN 형태로 효과적으로 변환하기 위하여 데이터 유형별 변환 규칙을 온톨로지로 구축한다. 그럼으로써 온톨로지의 재사용은 물론 새로운 유형의 EPC 데이터가 제안되어도 해당 EPC 데이터에 대한 온톨로지만 추가함으로써 미들웨어로 하여금 효과적으로 처리할 수 있도록 확장할 수 있다.