• Title/Summary/Keyword: 베이즈 분류기

Search Result 63, Processing Time 0.023 seconds

Utilizing Unlabeled Documents in Automatic Classification with Inter-document Similarities (문헌간 유사도를 이용한 자동분류에서 미분류 문헌의 활용에 관한 연구)

  • Kim, Pan-Jun;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.251-271
    • /
    • 2007
  • This paper studies the problem of classifying documents with labeled and unlabeled learning data, especially with regards to using document similarity features. The problem of using unlabeled data is practically important because in many information systems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. There are two steps In general semi-supervised learning algorithm. First, it trains a classifier using the available labeled documents, and classifies the unlabeled documents. Then, it trains a new classifier using all the training documents which were labeled either manually or automatically. We suggested two types of semi-supervised learning algorithm with regards to using document similarity features. The one is one step semi-supervised learning which is using unlabeled documents only to generate document similarity features. And the other is two step semi-supervised learning which is using unlabeled documents as learning examples as well as similarity features. Experimental results, obtained using support vector machines and naive Bayes classifier, show that we can get improved performance with small labeled and large unlabeled documents then the performance of supervised learning which uses labeled-only data. When considering the efficiency of a classifier system, the one step semi-supervised learning algorithm which is suggested in this study could be a good solution for improving classification performance with unlabeled documents.

Modulation classification for BPSK and QPSK signals over rayleigh fading channel (Payleigh 페이딩 채널에서 BPSK와 QPSK 신호의 변조 분류)

  • 윤동원;한영열
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.4
    • /
    • pp.1019-1026
    • /
    • 1996
  • A modulation type classifier based on statistical moments has been successfully employed to classify PSK signals. Previously, developed Classifiers were analyzed in AWGN channel only. In this paper, a moments-based modulation type classifier to classify BPSK and QPSK signals over Rayleigh fading channel is proposed and analyzed. The moments of received signal are evaluated with the exact distribution of the received signal and a moments-based classifier is proposed. The performance evaluation of the proposed classifier in terms of the misclassification probability for BPSK and QPSK is investigated under Rayleigh fading environment.

  • PDF

Comparison of Korean Classification Models' Korean Essay Score Range Prediction Performance (한국어 학습 모델별 한국어 쓰기 답안지 점수 구간 예측 성능 비교)

  • Cho, Heeryon;Im, Hyeonyeol;Yi, Yumi;Cha, Junwoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.133-140
    • /
    • 2022
  • We investigate the performance of deep learning-based Korean language models on a task of predicting the score range of Korean essays written by foreign students. We construct a data set containing a total of 304 essays, which include essays discussing the criteria for choosing a job ('job'), conditions of a happy life ('happ'), relationship between money and happiness ('econ'), and definition of success ('succ'). These essays were labeled according to four letter grades (A, B, C, and D), and a total of eleven essay score range prediction experiments were conducted (i.e., five for predicting the score range of 'job' essays, five for predicting the score range of 'happiness' essays, and one for predicting the score range of mixed topic essays). Three deep learning-based Korean language models, KoBERT, KcBERT, and KR-BERT, were fine-tuned using various training data. Moreover, two traditional probabilistic machine learning classifiers, naive Bayes and logistic regression, were also evaluated. Experiment results show that deep learning-based Korean language models performed better than the two traditional classifiers, with KR-BERT performing the best with 55.83% overall average prediction accuracy. A close second was KcBERT (55.77%) followed by KoBERT (54.91%). The performances of naive Bayes and logistic regression classifiers were 52.52% and 50.28% respectively. Due to the scarcity of training data and the imbalance in class distribution, the overall prediction performance was not high for all classifiers. Moreover, the classifiers' vocabulary did not explicitly capture the error features that were helpful in correctly grading the Korean essay. By overcoming these two limitations, we expect the score range prediction performance to improve.

Korean speech recognition using deep learning (딥러닝 모형을 사용한 한국어 음성인식)

  • Lee, Suji;Han, Seokjin;Park, Sewon;Lee, Kyeongwon;Lee, Jaeyong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.213-227
    • /
    • 2019
  • In this paper, we propose an end-to-end deep learning model combining Bayesian neural network with Korean speech recognition. In the past, Korean speech recognition was a complicated task due to the excessive parameters of many intermediate steps and needs for Korean expertise knowledge. Fortunately, Korean speech recognition becomes manageable with the aid of recent breakthroughs in "End-to-end" model. The end-to-end model decodes mel-frequency cepstral coefficients directly as text without any intermediate processes. Especially, Connectionist Temporal Classification loss and Attention based model are a kind of the end-to-end. In addition, we combine Bayesian neural network to implement the end-to-end model and obtain Monte Carlo estimates. Finally, we carry out our experiments on the "WorimalSam" online dictionary dataset. We obtain 4.58% Word Error Rate showing improved results compared to Google and Naver API.

Ensemble Learning of Region Based Classifiers (지역 기반 분류기의 앙상블 학습)

  • Choi, Sung-Ha;Lee, Byung-Woo;Yang, Ji-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.14B no.4
    • /
    • pp.303-310
    • /
    • 2007
  • In machine learning, the ensemble classifier that is a set of classifiers have been introduced for higher accuracy than individual classifiers. We propose a new ensemble learning method that employs a set of region based classifiers. To show the performance of the proposed method. we compared its performance with that of bagging and boosting, which ard existing ensemble methods. Since the distribution of data can be different in different regions in the feature space, we split the data and generate classifiers based on each region and apply a weighted voting among the classifiers. We used 11 data sets from the UCI Machine Learning Repository to compare the performance of our new ensemble method with that of individual classifiers as well as existing ensemble methods such as bagging and boosting. As a result, we found that our method produced improved performance, particularly when the base learner is Naive Bayes or SVM.

A Model to Infer Users' Behavior Patterns for Personalized Recommendation Service based Context-Awareness (컨텍스트 인식 기반 개인화 추천 서비스를 위한 사용자 행동패턴 추론 모델)

  • Seo, Hyo-Seok;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.10 no.2
    • /
    • pp.293-297
    • /
    • 2012
  • In order to provide with personalized recommendation service in context-awareness environment, the collected context data should be analyzed fast and the objective of user should be able to inferred effectively. But, the context collected from the mobile devices is not suitable for applying the existing inference algorithms as they are due to the omission or uncertainty of information and the efficient algorithms are required for mobile environment. In this paper, the behavior pattern was classified using naive bayes classification for minimize the loss caused by the omission or error of information. And pattern matching was used to effectively learn of the users inclination and infer the behavior purpose. The accuracy of the suggested inference model was evaluated by applying to the application recommendation service in the smart phones.

Classification of Transient Signals in Ocean Background Noise Using Bayesian Classifier (베이즈 분류기를 이용한 수중 배경소음하의 과도신호 분류)

  • Kim, Ju-Ho;Bok, Tae-Hoon;Paeng, Dong-Guk;Bae, Jin-Ho;Lee, Chong-Hyun;Kim, Seong-Il
    • Journal of Ocean Engineering and Technology
    • /
    • v.26 no.4
    • /
    • pp.57-63
    • /
    • 2012
  • In this paper, a Bayesian classifier based on PCA (principle component analysis) is proposed to classify underwater transient signals using $16^{th}$ order LPC (linear predictive coding) coefficients as feature vector. The proposed classifier is composed of two steps. The mechanical signals were separated from biological signals in the first step, and then each type of the mechanical signal was recognized in the second step. Three biological transient signals and two mechanical signals were used to conduct experiments. The classification ratios for the feature vectors of biological signals and mechanical signals were 94.75% and 97.23%, respectively, when all 16 order LPC vector were used. In order to determine the effect of underwater noise on the classification performance, underwater ambient noise was added to the test signals and the classification ratio according to SNR (signal-to-noise ratio) was compared by changing dimension of feature vector using PCA. The classification ratios of the biological and mechanical signals under ocean ambient noise at 10dB SNR, were 0.51% and 100% respectively. However, the ratios were changed to 53.07% and 83.14% when the dimension of feature vector was converted to three by applying PCA. For correct, classification, it is required SNR over 10 dB for three dimension feature vector and over 30dB SNR for seven dimension feature vector under ocean ambient noise environment.

The performance of Bayesian network classifiers for predicting discrete data (이산형 자료 예측을 위한 베이지안 네트워크 분류분석기의 성능 비교)

  • Park, Hyeonjae;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.3
    • /
    • pp.309-320
    • /
    • 2020
  • Bayesian networks, also known as directed acyclic graphs (DAG), are used in many areas of medicine, meteorology, and genetics because relationships between variables can be modeled with graphs and probabilities. In particular, Bayesian network classifiers, which are used to predict discrete data, have recently become a new method of data mining. Bayesian networks can be grouped into different models that depend on structured learning methods. In this study, Bayesian network models are learned with various properties of structure learning. The models are compared to the simplest method, the naïve Bayes model. Classification results are compared by applying learned models to various real data. This study also compares the relationships between variables in the data through graphs that appear in each model.

Automatic Classification of Blog Posts (블로그 포스트의 자동 분류 시스템)

  • Jho, Hee-Sun;Kim, Su-Ah;Lee, Hyun-Ah
    • Annual Conference on Human and Language Technology
    • /
    • 2013.10a
    • /
    • pp.160-162
    • /
    • 2013
  • 편리한 블로그 사용과 블로그에서의 정보 탐색을 위해서는 내용에 기반한 분류가 필요하다. 대부분의 블로그 사이트에서는 내용 기반 분류를 제공하고 있으나, 블로거들은 자신이 작성한 블로그에 대한 수동 분류를 입력하지 않는 경우가 많다. 본 논문에서는 분류가 제공되는 블로그 사이트에서 각 분류별 문서를 수집하고, 어휘빈도와 문서빈도, 분류별 빈도를 활용하여 문서 내 어휘의 자질 가중치를 부여하고, 다양한 학습기를 이용하여 분류 모델을 생성한 뒤 블로그의 특성에 적합한 자질 추출 알고리즘과 분류 알고리즘을 찾아낸다. 실험에서는 본 논문에서 고안한 CTF-IECDF와 나이브 베이즈 멀티노미얼로 조합한 분류 모델이 75.40%의 분류 정확률을 보였다.

  • PDF

Fast Fingerprint Classification Using the Probabilistic Integration of Structural Features (구조적 특징의 확률적 결합을 이용한 빠른 지문 분류)

  • Cho Ung-Keun;Hong Jin-Hyuk;Cho Sung-Bae
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.757-759
    • /
    • 2005
  • Henry의 지문분류법이 창안된 후, 지문분류에 대한 여러 가지 접근 방법이 연구되고 있다. 특이점에 의한 분류는 가장 많이 연구되고 있는 방법이지만, 지문영상의 품질에 민감하기 때문에 정확한 분류가 쉽지 않다. 의사 융선은 특이점과 더불어 지문을 분류하기 위한 특징으로, 특이점의 불완전함을 보완하는데 이용한다. 본 논문에서는 나이브 베이즈 분류기를 이용하여 특이점과 의사 융선 정보의 확률적인 분류 방법을 제안한다. NIST DB 4에 대해 제안하는 방법을 실험한 결과 5클래스 분류에 대해 $85.4\%$의 분류율을 획득하였으며, 제안하는 방법이 신경망, 최근접 이웃에 의한 분류에 비해 더 빠르다는 것을 확인하였다.

  • PDF