• Title/Summary/Keyword: 단순 베이지안 분류기

Search Result 6, Processing Time 0.022 seconds

Improving Multinomial Naive Bayes Text Classifier (다항시행접근 단순 베이지안 문서분류기의 개선)

  • 김상범;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.259-267
    • /
    • 2003
  • Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the Problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.

A Window-Based Classification of Stream Data (스트림 데이터의 윈도우 기반 분류)

  • Kim, Sung-Hyun;Lee, Yong-Mi;Jin, Long;Seo, Sung-Bo;Ryu, Keun-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.11a
    • /
    • pp.47-50
    • /
    • 2005
  • 센서와 모바일 기술의 발달로 인해 다양한 센서에서 수집된 스트림 데이터를 처리하는 연구들이 많이 수행되고 있다. 다차원 속성의 스트림 데이터는 센서에서 주기적으로 수집되어 버퍼링 후 처리되기 때문에 기존의 투플 기반의 데이터 분류 기법에 적합하지 않다. 따라서 이 논문에서는 윈도우 기반의 스트림 데이터 분류를 위해 각 속성의 평균과 표준편차 값을 이용하여 투플 기반으로 변환하는 기법을 제안한다. 제안된 기법의 타당성은 투플 기반 데이터 분류 기법(의사결정트리, 단순 베이지안 분류기, 베이지안 신뢰 네트워크)에 의한 정확도 측정에 기반 한다. 로봇에서 수집된 센서 데이터를 이용한 실험 결과, 높은 정확도로 제안된 기법이 타당함을 증명하였으며 베이지안 신뢰 네트워크 기법이 다른 기법에 비해 우수함을 발견하였다.

  • PDF

Impact of Diverse Document-evaluation Measure-based Searching Methods in Big Data Search Accuracy (빅데이터 검색 정확도에 미치는 다양한 측정 방법 기반 검색 기법의 효과)

  • Kim, Ji young;Han, DaHyeon;Kim, Jongkwon
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.553-558
    • /
    • 2017
  • With the rapid growth of Big Data, research on extracting meaningful information is being pursued by both academia and industry. Especially, data characteristics derived from analysis, and researcher intention are key factors for search algorithms to obtain accurate output. Therefore, reflecting both data characteristics and researcher intention properly is the final goal of data analysis research. The data analyzed properly can help users to increase loyalty to the service provided by company, and to utilize information more effectively and efficiently. In this paper, we explore various methods of document-evaluation, so that we can improve the accuracy of searching article one of the most frequently searches used in real life. We also analyze the experiment result, and suggest the proper manners to use various methods.

A Study on development for image detection tool using two layer voting method (2단계 분류기법을 이용한 영상분류기 개발)

  • 김명관
    • Journal of the Korea Computer Industry Society
    • /
    • v.3 no.5
    • /
    • pp.605-610
    • /
    • 2002
  • In this paper, we propose a Internet filtering tool which allows parents to manage their children's Internet access, block access to Internet sites they deem inappropriate. The other filtering tools which like Cyber Patrol, NCA Patrol, Argus, Netfilter are oriented only URL filtering or keyword detection methods. Thease methods are used on limited fields application. But our approach is focus on image color space model. First we convert RGB color space to HLS(Hue Luminance Saturation). Next, this HLS histogram learned by our classification method tools which include cohesion factor, naive baysian, N-nearest neighbor. Then we use voting for result from various classification methods. Using 2,000 picture, we prove that 2-layer voting result have better accuracy than other methods.

  • PDF

A Study of using Emotional Features for Information Retrieval Systems (감정요소를 사용한 정보검색에 관한 연구)

  • Kim, Myung-Gwan;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.579-586
    • /
    • 2003
  • In this paper, we propose a novel approach to employ emotional features to document retrieval systems. Fine emotional features, such as HAPPY, SAD, ANGRY, FEAR, and DISGUST, have been used to represent Korean document. Users are allowed to use these features for retrieving their documents. Next, retrieved documents are learned by classification methods like cohesion factor, naive Bayesian, and, k-nearest neighbor approaches. In order to combine various approaches, voting method has been used. In addition, k-means clustering has been used for our experimentation. The performance of our approach proved to be better in accuracy than other methods, and be better in short texts rather than large documents.

Behavior Pattern Modeling based Game Bot detection (행동 패턴 모델을 이용한 게임 봇 검출 방법)

  • Park, Sang-Hyun;Jung, Hye-Wuk;Yoon, Tae-Bok;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.3
    • /
    • pp.422-427
    • /
    • 2010
  • Korean Game industry, especially MMORPG(Massively Multiplayer Online Game) has been rapidly expanding in these days. But As game industry is growing, lots of online game security incidents have also been increasing and getting prevailing. One of the most critical security incidents is 'Game Bots', which are programs to play MMORPG instead of human players. If player let the game bots play for them, they can get a lot of benefic game elements (experience points, items, etc.) without any effort, and it is considered unfair to other players. Plenty of game companies try to prevent bots, but it does not work well. In this paper, we propose a behavior pattern model for detecting bots. We analyzed behaviors of human players as well as bots and identified six game features to build the model to differentiate game bots from human players. Based on these features, we made a Naive Bayesian classifier to reasoning the game bot or not. To evaluated our method, we used 10 game bot data and 6 human Player data. As a result, we classify Game bot and human player with 88% accuracy.