• Title/Summary/Keyword: bayesian classifier

Search Result 149, Processing Time 0.024 seconds

Large-Scale Text Classification with Deep Neural Networks (깊은 신경망 기반 대용량 텍스트 데이터 분류 기술)

  • Jo, Hwiyeol;Kim, Jin-Hwa;Kim, Kyung-Min;Chang, Jeong-Ho;Eom, Jae-Hong;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.322-327
    • /
    • 2017
  • The classification problem in the field of Natural Language Processing has been studied for a long time. Continuing forward with our previous research, which classifies large-scale text using Convolutional Neural Networks (CNN), we implemented Recurrent Neural Networks (RNN), Long-Short Term Memory (LSTM) and Gated Recurrent Units (GRU). The experiment's result revealed that the performance of classification algorithms was Multinomial Naïve Bayesian Classifier < Support Vector Machine (SVM) < LSTM < CNN < GRU, in order. The result can be interpreted as follows: First, the result of CNN was better than LSTM. Therefore, the text classification problem might be related more to feature extraction problem than to natural language understanding problems. Second, judging from the results the GRU showed better performance in feature extraction than LSTM. Finally, the result that the GRU was better than CNN implies that text classification algorithms should consider feature extraction and sequential information. We presented the results of fine-tuning in deep neural networks to provide some intuition regard natural language processing to future researchers.

An Automatic Classification System of Korean Documents Using Weight for Keywords of Document and Word Cluster (문서의 주제어별 가중치 부여와 단어 군집을 이용한 한국어 문서 자동 분류 시스템)

  • Hur, Jun-Hui;Choi, Jun-Hyeog;Lee, Jung-Hyun;Kim, Joong-Bae;Rim, Kee-Wook
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.447-454
    • /
    • 2001
  • The automatic document classification is a method that assigns unlabeled documents to the existing classes. The automatic document classification can be applied to a classification of news group articles, a classification of web documents, showing more precise results of Information Retrieval using a learning of users. In this paper, we use the weighted Bayesian classifier that weights with keywords of a document to improve the classification accuracy. If the system cant classify a document properly because of the lack of the number of words as the feature of a document, it uses relevance word cluster to supplement the feature of a document. The clusters are made by the automatic word clustering from the corpus. As the result, the proposed system outperformed existing classification system in the classification accuracy on Korean documents.

  • PDF

Performance Comparison of Machine Learning Algorithms for TAB Digit Recognition (타브 숫자 인식을 위한 기계 학습 알고리즘의 성능 비교)

  • Heo, Jaehyeok;Lee, Hyunjung;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.1
    • /
    • pp.19-26
    • /
    • 2019
  • In this paper, the classification performance of learning algorithms is compared for TAB digit recognition. The TAB digits that are segmented from TAB musical notes contain TAB lines and musical symbols. The labeling method and non-linear filter are designed and applied to extract fret digits only. The shift operation of the 4 directions is applied to generate more data. The selected models are Bayesian classifier, support vector machine, prototype based learning, multi-layer perceptron, and convolutional neural network. The result shows that the mean accuracy of the Bayesian classifier is about 85.0% while that of the others reaches more than 99.0%. In addition, the convolutional neural network outperforms the others in terms of generalization and the step of the data preprocessing.

Game Recommendation System Based on User Ratings (사용자 평점 기반 게임 추천 시스템)

  • Kim, JongHyen;Jo, HyeonJeong;Kim, Byeong Man
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.6
    • /
    • pp.9-19
    • /
    • 2018
  • As the recent developments in the game industry and people's interest in game streaming become more popular, non-professional gamers are also interested in games and buying them. However, it is difficult to judge which game is the most enjoyable among the games released in dozens every day. Although the game sales platform is equipped with the game recommendation function, it is not accurate because it is used as a means of increasing their sales and recommending users with a focus on their discount products or new products. For this reason, in this paper, we propose a game recommendation system based on the users ratings, which raises the recommendation satisfaction level of users and appropriately reflect their experience. In the system, we implement the rate prediction function using collaborative filtering and the game recommendation function using Naive Bayesian classifier to provide users with quick and accurate recommendations. As the result, the rate prediction algorithm achieved a throughput of 2.4 seconds and an average of 72.1 percent accuracy. For the game recommendation algorithm, we obtained 75.187 percent accuracy and were able to provide users with fast and accurate recommendations.

Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis (최근접 이웃 규칙 기반 프로토타입 선택과 편의-분산을 이용한 성능 평가)

  • Shim, Se-Yong;Hwang, Doo-Sung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.73-81
    • /
    • 2015
  • The paper proposes a prototype selection method and evaluates the generalization performance of standard algorithms and prototype based classification learning. The proposed prototype classifier defines multidimensional spheres with variable radii within class areas and generates a small set of training data. The nearest-neighbor classifier uses the new training set for predicting the class of test data. By decomposing bias and variance of the mean expected error value, we compare the generalization errors of k-nearest neighbor, Bayesian classifier, prototype selection using fixed radius and the proposed prototype selection method. In experiments, the bias-variance changing trends of the proposed prototype classifier are similar to those of nearest neighbor classifiers with all training data and the prototype selection rates are under 27.0% on average.

A Study of Short-Term Load Forecasting System Using Data Mining (데이터 마이닝을 이용한 단기 부하 예측 시스템 연구)

  • Joo, Young-Hoon;Jung, Keun-Ho;Kim, Do-Wan;Park, Jin-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.130-135
    • /
    • 2004
  • This paper presents a new design methods of the short-term load forecasting system (STLFS) using the data mining. The structure of the proposed STLFS is divided into two parts: the Takagi-Sugeno (T-S) fuzzy model-based classifier and predictor The proposed classifier is composed of the Gaussian fuzzy sets in the premise part and the linearized Bayesian classifier in the consequent part. The related parameters of the classifier are easily obtained from the statistic information of the training set. The proposed predictor takes form of the convex combination of the linear time series predictors for each inputs. The problem of estimating the consequent parameters is formulated by the convex optimization problem, which is to minimize the norm distance between the real load and the output of the linear time series estimator. The problem of estimating the premise parameters is to find the parameter value minimizing the error between the real load and the overall output. Finally, to show the feasibility of the proposed method, this paper provides the short-term load forecasting example.

A Study on Anomalous Propagation Echo Identification using Naive Bayesian Classifier (나이브 베이지안 분류기를 이용한 이상전파에코 식별방법에 대한 연구)

  • Lee, Hansoo;Kim, Sungshin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.89-90
    • /
    • 2016
  • Anomalous propagation echo is a kind of abnormal radar signal occurred by irregularly refracted radar beam caused by temperature or humidity. The echo frequently appears in ground-based weather radar. In order to improve accuracy of weather forecasting, it is important to analyze radar data precisely. Therefore, there are several ongoing researches about identifying the anomalous propagation echo all over the world. This paper conducts researches about a classification method which can distinguish anomalous propagation echo in the radar data using naive Bayes classifier and unique attributes of the echo such as reflectivity, altitude, and so on. It is confirmed that the fine classification results are derived by verifying the suggested naive Bayes classifier using actual appearance cases of the echo.

  • PDF

Spatial-Temporal Drought Analysis of South Korea Based On Neural Networks (신경망을 이용한 우리나라의 시공간적 가뭄의 해석)

  • Sin, Hyeon-Seok;Park, Mu-Jong
    • Journal of Korea Water Resources Association
    • /
    • v.32 no.1
    • /
    • pp.15-29
    • /
    • 1999
  • A new methodology to analyze and quantify regional meteorological drought based on annual precipitation data has been introduced in this paper In this study, based on posterior probability estimator and Bayesian classifier in Spatial Analysis Neural Network (SANN), point drought probabilities categorized as extreme, severe, mild, and non drought events has been defined, and a Bayesian Drought Severity Index (BPSI) has been introduced to classify the region of interest into four drought severities. In addition, to estimate the regional drought severity for the entire region, regional extreme, severe, mild, and non drought probabilities which are the areal averages of point drought probabilities over the region has been computed and applied. In this study, the proposed methodology has been applied to analyze the regional drought of South Korea during 1967-1996 years. The drought severity for the whole South Korea was defined spatially at each year and each year was classified in a drought severity criterion. The results may be useful for water manager to understand the South Korean drought with respect to the spatial and temporal variation.

  • PDF

A Study on Sex Classification of a Name using Naive Bayesian (나이브 베이지안을 사용한 성명에 대한 성별 구분 연구)

  • Lim, Myung-Jae;Jung, Jin-Pyo;Kim, Myung-Gwan
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.6
    • /
    • pp.155-159
    • /
    • 2013
  • This article employs Naive Bayesian Classifier to realize a system that can distinguish the sex of a name. Unlike foreign names, in Korean names, the pronoun referring to a person shows discordance with sex. With the characteristics of Korean names, however, the study distinguishes names frequently used for men and for women. And as it also includes names of which sex is rather ambiguous such as proper nouns, the accuracy of it is somewhat low. The result of the experiment conducted in this article indicates 84% accuracy for Korean men and 88% for Korean women; thus, the total accuracy equals 86%. Meanwhile, about foreign names, men show 80% accuracy, and women 84%, so the total accuracy equals 83%.

On-line Signature Verification using Segment Matching and LDA Method (구간분할 매칭방법과 선형판별분석기법을 융합한 온라인 서명 검증)

  • Lee, Dae-Jong;Go, Hyoun-Joo;Chun, Myung-Geun
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1065-1074
    • /
    • 2007
  • Among various methods to compare reference signatures with an input signature, the segment-to-segment matching method has more advantages than global and point-to-point methods. However, the segment-to-segment matching method has the problem of having lower recognition rate according to the variation of partitioning points. To resolve this drawback, this paper proposes a signature verification method by considering linear discriminant analysis as well as segment-to-segment matching method. For the final decision step, we adopt statistical based Bayesian classifier technique to effectively combine two individual systems. Under the various experiments, the proposed method shows better performance than segment-to-segment based matching method.