• Title/Summary/Keyword: naive Bayesian classifier

Search Result 49, Processing Time 0.034 seconds

An Automatic Document Classification with Bayesian Learning (베이지안 학습을 이용한 문서의 자동분류)

  • Kim, Jin-Sang;Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.1
    • /
    • pp.19-30
    • /
    • 2000
  • As the number of online documents increases enormously with the expansion of information technology, the importance of automatic document classification is greatly enlarged. In this paper, an automatic document classification method is investigated and applied to UseNet 20 newsgroup articles to test its efficacy. The classification system uses Naive Bayes classification algorithm and the experimental result shows that a randomly selected newsgroup arcicle can be classified into its own category over 77% accuracy.

  • PDF

A Design of the Small File Grouping System Based on Naive Bayesian Classifier Model (나이브 베이지안 분류기 모델 기반의 소용량 파일 그룹화 시스템 설계)

  • Kim, Min-Jae;Kim, Kyung-Tae;Youn, Hee-Young
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.07a
    • /
    • pp.221-222
    • /
    • 2014
  • 빠른 웹의 성장으로 대용량 데이터를 효과적으로 처리할 수 있는 플랫폼 기술에 대한 관심이 높아지고 있다. 특히, HDFS는 이상적인 분산 파일 시스템으로 각광받고 있으며 대용량 파일의 처리를 목적으로 개발되었다. 하지만, 실제 파일들의 집합에서 소용량 파일이 차지하는 비중은 높은 편이다. 많은 수의 소용량 파일은 HDFS 성능 감소에 치명적인 원인이 된다. 많은 수의 소용량 파일들이 HDFS에 저장된다면 NameNode의 메모리 소비량이 증가하게 되며 많은 수의 소용량 파일은 많은 수의 DataNode와 NameNode를 요구하므로 상대적으로 처리시간이 많이 소모된다. 따라서 본 논문에서는 HDFS에서 소용량 파일의 저장과 액세스 효율성을 향상시키기 위하여 나이브 베이지안 분류기 알고리즘을 적용한 파일 그룹화 시스템을 설계하였다.

  • PDF

Data Mining Using Reversible Jump MCMC and Bayesian Network Learning (Reversible Jump MCMC와 베이지안망 학습에 의한 데이터마이닝)

  • 하선영;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.90-92
    • /
    • 2000
  • 데이터마이닝 문제는 데이터를 그 속성들에 따라 분류하여 예측하는 것뿐만 아니라 분류된 속성들간의 연관성에 대해 잘 설명할 수 있어야 한다. 일반적으로 변수들간의 연관성을 잘 설명할 수 있으면서도 높은 예측력을 가지는 방법으로는 베이지안 네트웍 분류자(Bayesian network classifier)가 있다. 그러나 이것은 데이터 마이닝과 같은 대용량 데이터에서는 성능이 떨어지는 단점이 있다. 이에 이 논문에서는 최근 RBF 신경망이 입력변수 선정문제에 성공적으로 적용된 Reversible Jump Markov Chain Monte Carlo 방법을 이용하여 최적의 입력변수들만을 선택하여 베이지안 네트웍을 학습하는 Selective BN Augmented Naive-Bayes Classifier를 새로운 방안으로 제안하고 이를 실제 데이터마이닝 문제에 적용한 결과를 제시한다.

  • PDF

Fingerprinting Bayesian Algorithm for Indoor Location Determination (실내 측위 결정을 위한 Fingerprinting Bayesian 알고리즘)

  • Lee, Jang-Jae;Kwon, Jang-Woo;Jung, Min-A;Lee, Seong-Ro
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.6B
    • /
    • pp.888-894
    • /
    • 2010
  • For the indoor positioning, wireless fingerprinting is most favorable because fingerprinting is most accurate among the technique for wireless network based indoor positioning which does not require any special equipments dedicated for positioning. The deployment of a fingerprinting method consists of off-line phase and on-line phase and more efficient and accurate methods have been studied. This paper proposes a bayesian algorithm for wireless fingerprinting and indoor location determination using fuzzy clustering with bayesian learning as a statistical learning theory.

eCRM Agent System for Articles Automatic Classification System based on Naive Bayesian Classifier (나이브 베이지안 분류기를 이용한 게시물 자동 분류를 위한 eCRM 에이전트 시스템)

  • Choi, Jung-Min;Lee, Byoung-Soo
    • Journal of IKEEE
    • /
    • v.8 no.2 s.15
    • /
    • pp.216-223
    • /
    • 2004
  • The customer's bulletin board is the important channel to get opinions from customers directly. The effective management of the bulletin board for the customer improves the reliance by providing the best replies and by accepting opinions of the customer and furthermore, that can raise the customer's reliance of the whole shopping mall is the important eCRM method. But, the present mostly customer's bulletin board is been replied without any classifying about many kinds of question. Consequently, The shopping mall should do systematic management of the best professional reply about many kinds of question. In order to resolve this problem, we implement a classifier called Naive Bayesian classifier is classified automatically bulletin board for eCRM of shopping mall.

  • PDF

Fuzzy Clustering Model using Principal Components Analysis and Naive Bayesian Classifier (주성분 분석과 나이브 베이지안 분류기를 이용한 퍼지 군집화 모형)

  • Jun, Sung-Hae
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.485-490
    • /
    • 2004
  • In data representation, the clustering performs a grouping process which combines given data into some similar clusters. The various similarity measures have been used in many researches. But, the validity of clustering results is subjective and ambiguous, because of difficulty and shortage about objective criterion of clustering. The fuzzy clustering provides a good method for subjective clustering problems. It performs clustering through the similarity matrix which has fuzzy membership value for assigning each object. In this paper, for objective fuzzy clustering, the clustering algorithm which joins principal components analysis as a dimension reduction model with bayesian learning as a statistical learning theory. For performance evaluation of proposed algorithm, Iris and Glass identification data from UCI Machine Learning repository are used. The experimental results shows a happy outcome of proposed model.

Relation Based Bayesian Network for NBNN

  • Sun, Mingyang;Lee, YoonSeok;Yoon, Sung-eui
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.4
    • /
    • pp.204-213
    • /
    • 2015
  • Under the conditional independence assumption among local features, the Naive Bayes Nearest Neighbor (NBNN) classifier has been recently proposed and performs classification without any training or quantization phases. While the original NBNN shows high classification accuracy without adopting an explicit training phase, the conditional independence among local features is against the compositionality of objects indicating that different, but related parts of an object appear together. As a result, the assumption of the conditional independence weakens the accuracy of classification techniques based on NBNN. In this work, we look into this issue, and propose a novel Bayesian network for an NBNN based classification to consider the conditional dependence among features. To achieve our goal, we extract a high-level feature and its corresponding, multiple low-level features for each image patch. We then represent them based on a simple, two-level layered Bayesian network, and design its classification function considering our Bayesian network. To achieve low memory requirement and fast query-time performance, we further optimize our representation and classification function, named relation-based Bayesian network, by considering and representing the relationship between a high-level feature and its low-level features into a compact relation vector, whose dimensionality is the same as the number of low-level features, e.g., four elements in our tests. We have demonstrated the benefits of our method over the original NBNN and its recent improvement, and local NBNN in two different benchmarks. Our method shows improved accuracy, up to 27% against the tested methods. This high accuracy is mainly due to consideration of the conditional dependences between high-level and its corresponding low-level features.

Frequent Pattern Bayesian Classification for ECG Pattern Diagnosis (심전도 패턴 판별을 위한 빈발 패턴 베이지안 분류)

  • Noh, Gi-Yeong;Kim, Wuon-Shik;Lee, Hun-Gyu;Lee, Sang-Tae;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1031-1040
    • /
    • 2004
  • Electrocardiogram being the recording of the heart's electrical activity provides valuable clinical information about heart's status. Many re-searches have been pursued for heart disease diagnosis using ECG so far. However, electrocardio-graph uses foreign diagnosis algorithm due to inaccuracy of diagnosis results for a heart disease. This paper suggests ECG data collection, data preprocessing and heart disease pattern classification using data mining. This classification technique is the FB(Frequent pattern Bayesian) classifier and is a combination of two data mining problems, naive bayesian and frequent pattern mining. FB uses Product Approximation construction that uses the discovered frequent patterns. Therefore, this method overcomes weakness of naive bayesian which makes the assumption of class conditional independence.

Nomogram comparison conducted by logistic regression and naïve Bayesian classifier using type 2 diabetes mellitus (T2D) (제 2형 당뇨병을 이용한 로지스틱과 베이지안 노모그램 구축 및 비교)

  • Park, Jae-Cheol;Kim, Min-Ho;Lee, Jea-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.573-585
    • /
    • 2018
  • In this study, we fit the logistic regression model and naïve Bayesian classifier model using 11 risk factors to predict the incidence rate probability for type 2 diabetes mellitus. We then introduce how to construct a nomogram that can help people visually understand it. We use data from the 2013-2015 Korean National Health and Nutrition Examination Survey (KNHANES). We take 3 interactions in the logistic regression model to improve the quality of the analysis and facilitate the application of the left-aligned method to the Bayesian nomogram. Finally, we compare the two nomograms and examine their utility. Then we verify the nomogram using the ROC curve.

Nomogram building to predict dyslipidemia using a naïve Bayesian classifier model (순수 베이지안 분류기 모델을 사용하여 이상지질혈증을 예측하는 노모 그램 구축)

  • Kim, Min-Ho;Seo, Ju-Hyun;Lee, Jea-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.619-630
    • /
    • 2019
  • Dyslipidemia is a representative chronic disease affecting Koreans that requires continuous management. It is also a known risk factor for cardiovascular disease such as hypertension and diabetes. However, it is difficult to diagnose vascular disease without a medical examination. This study identifies risk factors for the recognition and prevention of dyslipidemia. By integrating them, we construct a statistical instrumental nomogram that can predict the incidence rate while visualizing. Data were from the Korean National Health and Nutrition Examination Survey (KNHANES) for 2013-2016. First, a chi-squared test identified twelve risk factors of dyslipidemia. We used a naïve Bayesian classifier model to construct a nomogram for the dyslipidemia. The constructed nomogram was verified using a receiver operating characteristics curve and calibration plot. Finally, we compared the logistic nomogram previously presented with the Bayesian nomogram proposed in this study.