• Title/Summary/Keyword: TF-IDF weight

Search Result 33, Processing Time 0.037 seconds

A Validation of Effectiveness for Intrusion Detection Events Using TF-IDF (TF-IDF를 이용한 침입탐지이벤트 유효성 검증 기법)

  • Kim, Hyoseok;Kim, Yong-Min
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.6
    • /
    • pp.1489-1497
    • /
    • 2018
  • Web application services have diversified. At the same time, research on intrusion detection is continuing due to the surge of cyber threats. Also, As a single-defense system evolves into multi-level security, we are responding to specific intrusions by correlating security events that have become vast. However, it is difficult to check the OS, service, web application type and version of the target system in real time, and intrusion detection events occurring in network-based security devices can not confirm vulnerability of the target system and success of the attack A blind spot can occur for threats that are not analyzed for problems and associativity. In this paper, we propose the validation of effectiveness for intrusion detection events using TF-IDF. The proposed scheme extracts the response traffics by mapping the response of the target system corresponding to the attack. Then, Response traffics are divided into lines and weights each line with an TF-IDF weight. we checked the valid intrusion detection events by sequentially examining the lines with high weights.

A Term Weight Mensuration based on Popularity for Search Query Expansion (검색 질의 확장을 위한 인기도 기반 단어 가중치 측정)

  • Lee, Jung-Hun;Cheon, Suh-Hyun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.8
    • /
    • pp.620-628
    • /
    • 2010
  • With the use of the Internet pervasive in everyday life, people are now able to retrieve a lot of information through the web. However, exponential growth in the quantity of information on the web has brought limits to online search engines in their search performance by showing piles and piles of unwanted information. With so much unwanted information, web users nowadays need more time and efforts than in the past to search for needed information. This paper suggests a method of using query expansion in order to quickly bring wanted information to web users. Popularity based Term Weight Mensuration better performance than the TF-IDF and Simple Popularity Term Weight Mensuration to experiments without changes of search subject. When a subject changed during search, Popularity based Term Weight Mensuration's performance change is smaller than others.

Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels (TF-IDF와 소설 텍스트의 구조를 이용한 주제어 추출 연구)

  • You, Eun-Soon;Choi, Gun-Hee;Kim, Seung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.121-129
    • /
    • 2015
  • With the explosive growth of information about books, there is a growing number of customers who find it difficult to pick a book. Against the backdrop, the importance of a book recommendation system becomes greater, through which appropriate information about books could be offered then to encourage customers to buy a book in the end. However, existing recommendation systems based on the bibliographical information or user data reveal the reliability issue found in their recommendation results. This is why it is necessary to reflect semantic information extracted from the texts of a book's main body in a recommendation system. Accordingly, this paper suggests a method for extracting keywords from the main body of novels, as a preceding research, by using TF-IDF method as well as the text structure. To this end, the texts of 100 novels have been collected then to divide them into four structural elements of preface, dialogue, non-dialogue and closing. Then, the TF-IDF weight of each keyword has been calculated. The calculation results show that the extraction accuracy of keywords improves by 42.1% in performance when more weight is given to dialogue while including preface and closing instead of using just the main body.

A Study on Improving Precision Rate in Security Events Using Cyber Attack Dictionary and TF-IDF (공격키워드 사전 및 TF-IDF를 적용한 침입탐지 정탐률 향상 연구)

  • Jongkwan Kim;Myongsoo Kim
    • Convergence Security Journal
    • /
    • v.22 no.2
    • /
    • pp.9-19
    • /
    • 2022
  • As the expansion of digital transformation, we are more exposed to the threat of cyber attacks, and many institution or company is operating a signature-based intrusion prevention system at the forefront of the network to prevent the inflow of attacks. However, in order to provide appropriate services to the related ICT system, strict blocking rules cannot be applied, causing many false events and lowering operational efficiency. Therefore, many research projects using artificial intelligence are being performed to improve attack detection accuracy. Most researches were performed using a specific research data set which cannot be seen in real network, so it was impossible to use in the actual system. In this paper, we propose a technique for classifying major attack keywords in the security event log collected from the actual system, assigning a weight to each key keyword, and then performing a similarity check using TF-IDF to determine whether an actual attack has occurred.

A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method (Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상)

  • Lee, Yong-Hun;Lee, Sang-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5089-5096
    • /
    • 2010
  • Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.

TF-IDF Based Association Rule Analysis System for Medical Data (의료 정보 추출을 위한 TF-IDF 기반의 연관규칙 분석 시스템)

  • Park, Hosik;Lee, Minsu;Hwang, Sungjin;Oh, Sangyoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.3
    • /
    • pp.145-154
    • /
    • 2016
  • Because of the recent interest in the u-Health and development of IT technology, a need of utilizing a medical information data has been increased. Among previous studies that utilize various data mining algorithms for processing medical information data, there are studies of association rule analysis. In the studies, an association between the symptoms with specified diseases is the target to discover, however, infrequent terms which can be important information for a disease diagnosis are not considered in most cases. In this paper, we proposed a new association rule mining system considering the importance of each term using TF-IDF weight to consider infrequent but important items. In addition, the proposed system can predict candidate diagnoses from medical text records using term similarity analysis based on medical ontology.

A New Semantic Distance Measurement Method using TF-IDF in Linked Open Data (링크드 오픈 데이터에서 TF-IDF를 이용한 새로운 시맨틱 거리 측정 기법)

  • Cho, Jung-Gil
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.10
    • /
    • pp.89-96
    • /
    • 2020
  • Linked Data allows structured data to be published in a standard way that datasets from various domains can be interlinked. With the rapid evolution of Linked Open Data(LOD), researchers are exploiting it to solve particular problems such as semantic similarity assessment. In this paper, we propose a method, on top of the basic concept of Linked Data Semantic Distance (LDSD), for calculating the Linked Data semantic distance between resources that can be used in the LOD-based recommender system. The semantic distance measurement model proposed in this paper is based on a similarity measurement that combines the LOD-based semantic distance and a new link weight using TF-IDF, which is well known in the field of information retrieval. In order to verify the effectiveness of this paper's approach, performance was evaluated in the context of an LOD-based recommendation system using mixed data of DBpedia and MovieLens. Experimental results show that the proposed method shows higher accuracy compared to other similar methods. In addition, it contributed to the improvement of the accuracy of the recommender system by expanding the range of semantic distance calculation.

Spam Filter by Using X2 Statistics and Support Vector Machines (카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터)

  • Lee, Song-Wook
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.249-254
    • /
    • 2010
  • We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.

A Study on the Applicability of 2-Poisson Model for Selecting Korean Subject Words (2-포아송 모형을 이용한 한글 주제어 선정에 관한 연구)

  • 정영미;최대식
    • Journal of the Korean Society for information Management
    • /
    • v.17 no.1
    • /
    • pp.129-148
    • /
    • 2000
  • Experiments were performed on three subsets of a Korean test collection in order to determine whether 2-Poisson model's Z value is a good measure for selecting subject words from a document to be indexed. It was found that subject word selection based on the Z value was effective for only one subset with short texts, i.e., the Science and Technology subset. Correlation analyses between 2-Poisson model's Z and TF.IDF weight for the three subsets showed that the correlation was relatively high for two test subsets with short texts, i.e., the Science and Technology subset and the Newspaper subset.

  • PDF

Keyword Weight based Paragraph Extraction Algorithm (문단 가중치 분석 기반 본문 영역 선정 알고리즘)

  • Lee, Jongwon;Yu, Seongjong;Kim, Doan;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.462-463
    • /
    • 2018
  • Traditional document analysis systems used word-based analysis using a morphological analyzer or TF-IDF technique. These systems have the advantage of being able to derive key keywords by calculating the weights of the keywords. On the other hand, it is not appropriate to analyze the contents of documents due to the structural limitations. To solve this problem, the proposed algorithm calculates the weights of the documents in the document and divides the paragraphs into areas. And we calculate the importance of the divided regions and let the user know the area with the most important paragraphs in the document. So, it is expected that the user will be provided with a service suitable for analyzing documents rather than using existing document analysis systems.

  • PDF