• Title/Summary/Keyword: Naive Bayes Classifier

Search Result 94, Processing Time 0.029 seconds

Preference Prediction System using Similarity Weight granted Bayesian estimated value and Associative User Clustering (베이지안 추정치가 부여된 유사도 가중치와 연관 사용자 군집을 이용한 선호도 예측 시스템)

  • 정경용;최성용;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.316-325
    • /
    • 2003
  • A user preference prediction method using an exiting collaborative filtering technique has used the nearest-neighborhood method based on the user preference about items and has sought the user's similarity from the Pearson correlation coefficient. Therefore, it does not reflect any contents about items and also solve the problem of the sparsity. This study suggests the preference prediction system using the similarity weight granted Bayesian estimated value and the associative user clustering to complement problems of an exiting collaborative preference prediction method. This method suggested in this paper groups the user according to the Genre by using Association Rule Hypergraph Partitioning Algorithm and the new user is classified into one of these Genres by Naive Bayes classifier to slove the problem of sparsity in the collaborative filtering system. Besides, for get the similarity between users belonged to the classified genre and new users, this study allows the different estimated value to item which user vote through Naive Bayes learning. If the preference with estimated value is applied to the exiting Pearson correlation coefficient, it is able to promote the precision of the prediction by reducing the error of the prediction because of missing value. To estimate the performance of suggested method, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.

Performance Improvement of Collaborative Filtering System Using Associative User′s Clustering Analysis for the Recalculation of Preference and Representative Attribute-Neighborhood (선호도 재계산을 위한 연관 사용자 군집 분석과 Representative Attribute -Neighborhood를 이용한 협력적 필터링 시스템의 성능향상)

  • Jung, Kyung-Yong;Kim, Jin-Su;Kim, Tae-Yong;Lee, Jung-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.287-296
    • /
    • 2003
  • There has been much research focused on collaborative filtering technique in Recommender System. However, these studies have shown the First-Rater Problem and the Sparsity Problem. The main purpose of this Paper is to solve these Problems. In this Paper, we suggest the user's predicting preference method using Bayesian estimated value and the associative user clustering for the recalculation of preference. In addition to this method, to complement a shortcoming, which doesn't regard the attribution of item, we use Representative Attribute-Neighborhood method that is used for the prediction when we find the similar neighborhood through extracting the representative attribution, which most affect the preference. We improved the efficiency by using the associative user's clustering analysis in order to calculate the preference of specific item within the cluster item vector to the collaborative filtering algorithm. Besides, for the problem of the Sparsity and First-Rater, through using Association Rule Hypergraph Partitioning algorithm associative users are clustered according to the genre. New users are classified into one of these genres by Naive Bayes classifier. In addition, in order to get the similarity value between users belonged to the classified genre and new users, and this paper allows the different estimated value to item which user evaluated through Naive Bayes learning. As applying the preference granted the estimated value to Pearson correlation coefficient, it can make the higher accuracy because the errors that cause the missing value come less. We evaluate our method on a large collaborative filtering database of user rating and it significantly outperforms previous proposed method.

Data mining Algorithms for the Development of Sasang Type Diagnosis (사상체질 진단검사를 위한 데이터마이닝 알고리즘 연구)

  • Hong, Jin-Woo;Kim, Young-In;Park, So-Jung;Kim, Byoung-Chul;Eom, Il-Kyu;Hwang, Min-Woo;Shin, Sang-Woo;Kim, Byung-Joo;Kwon, Young-Kyu;Chae, Han
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.23 no.6
    • /
    • pp.1234-1240
    • /
    • 2009
  • This study was to compare the effectiveness and validity of various data-mining algorithm for Sasang type diagnostic test. We compared the sensitivity and specificity index of nine attribute selection and eleven class classification algorithms with 31 data-set characterizing Sasang typology and 10-fold validation methods installed in Waikato Environment Knowledge Analysis (WEKA). The highest classification validity score can be acquired as follows; 69.9 as Percentage Correctly Predicted index with Naive Bayes Classifier, 80 as sensitivity index with LWL/Tae-Eum type, 93.5 as specificity index with Naive Bayes Classifier/So-Eum type. The classification algorithm with highest PCP index of 69.62 after attribute selection was Naive Bayes Classifier. In this study we can find that the best-fit algorithm for traditional medicine is case sensitive and that characteristics of clinical circumstances, and data-mining algorithms and study purpose should be considered to get the highest validity even with the well defined data sets. It is also confirmed that we can't find one-fits-all algorithm and there should be many studies with trials and errors. This study will serve as a pivotal foundation for the development of medical instruments for Pattern Identification and Sasang type diagnosis on the basis of traditional Korean Medicine.

Study on Anomaly Detection Method of Improper Foods using Import Food Big data (수입식품 빅데이터를 이용한 부적합식품 탐지 시스템에 관한 연구)

  • Cho, Sanggoo;Choi, Gyunghyun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.19-33
    • /
    • 2018
  • Owing to the increase of FTA, food trade, and versatile preferences of consumers, food import has increased at tremendous rate every year. While the inspection check of imported food accounts for about 20% of the total food import, the budget and manpower necessary for the government's import inspection control is reaching its limit. The sudden import food accidents can cause enormous social and economic losses. Therefore, predictive system to forecast the compliance of food import with its preemptive measures will greatly improve the efficiency and effectiveness of import safety control management. There has already been a huge data accumulated from the past. The processed foods account for 75% of the total food import in the import food sector. The analysis of big data and the application of analytical techniques are also used to extract meaningful information from a large amount of data. Unfortunately, not many studies have been done regarding analyzing the import food and its implication with understanding the big data of food import. In this context, this study applied a variety of classification algorithms in the field of machine learning and suggested a data preprocessing method through the generation of new derivative variables to improve the accuracy of the model. In addition, the present study compared the performance of the predictive classification algorithms with the general base classifier. The Gaussian Naïve Bayes prediction model among various base classifiers showed the best performance to detect and predict the nonconformity of imported food. In the future, it is expected that the application of the abnormality detection model using the Gaussian Naïve Bayes. The predictive model will reduce the burdens of the inspection of import food and increase the non-conformity rate, which will have a great effect on the efficiency of the food import safety control and the speed of import customs clearance.

Utilizing Unlabeled Documents in Automatic Classification with Inter-document Similarities (문헌간 유사도를 이용한 자동분류에서 미분류 문헌의 활용에 관한 연구)

  • Kim, Pan-Jun;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.251-271
    • /
    • 2007
  • This paper studies the problem of classifying documents with labeled and unlabeled learning data, especially with regards to using document similarity features. The problem of using unlabeled data is practically important because in many information systems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. There are two steps In general semi-supervised learning algorithm. First, it trains a classifier using the available labeled documents, and classifies the unlabeled documents. Then, it trains a new classifier using all the training documents which were labeled either manually or automatically. We suggested two types of semi-supervised learning algorithm with regards to using document similarity features. The one is one step semi-supervised learning which is using unlabeled documents only to generate document similarity features. And the other is two step semi-supervised learning which is using unlabeled documents as learning examples as well as similarity features. Experimental results, obtained using support vector machines and naive Bayes classifier, show that we can get improved performance with small labeled and large unlabeled documents then the performance of supervised learning which uses labeled-only data. When considering the efficiency of a classifier system, the one step semi-supervised learning algorithm which is suggested in this study could be a good solution for improving classification performance with unlabeled documents.

Web Mining Using Fuzzy Integration of Multiple Structure Adaptive Self-Organizing Maps (다중 구조적응 자기구성지도의 퍼지결합을 이용한 웹 마이닝)

  • 김경중;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.1
    • /
    • pp.61-70
    • /
    • 2004
  • It is difficult to find an appropriate web site because exponentially growing web contains millions of web documents. Personalization of web search can be realized by recommending proper web sites using user profile but more efficient method is needed for estimating preference because user's evaluation on web contents presents many aspects of his characteristics. As user profile has a property of non-linearity, estimation by classifier is needed and combination of classifiers is necessary to anticipate diverse properties. Structure adaptive self-organizing map (SASOM) that is suitable for Pattern classification and visualization is an enhanced model of SOM and might be useful for web mining. Fuzzy integral is a combination method using classifiers' relevance that is defined subjectively. In this paper, estimation of user profile is conducted by using ensemble of SASOM's teamed independently based on fuzzy integral and evaluated by Syskill & Webert UCI benchmark data. Experimental results show that the proposed method performs better than previous naive Bayes classifier as well as voting of SASOM's.

Word Sense Similarity Clustering Based on Vector Space Model and HAL (벡터 공간 모델과 HAL에 기초한 단어 의미 유사성 군집)

  • Kim, Dong-Sung
    • Korean Journal of Cognitive Science
    • /
    • v.23 no.3
    • /
    • pp.295-322
    • /
    • 2012
  • In this paper, we cluster similar word senses applying vector space model and HAL (Hyperspace Analog to Language). HAL measures corelation among words through a certain size of context (Lund and Burgess 1996). The similarity measurement between a word pair is cosine similarity based on the vector space model, which reduces distortion of space between high frequency words and low frequency words (Salton et al. 1975, Widdows 2004). We use PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) to reduce a large amount of dimensions caused by similarity matrix. For sense similarity clustering, we adopt supervised and non-supervised learning methods. For non-supervised method, we use clustering. For supervised method, we use SVM (Support Vector Machine), Naive Bayes Classifier, and Maximum Entropy Method.

  • PDF

A Study on Classification of Medical Information Documents using Word Correlation (색인어 연관성을 이용한 의료정보문서 분류에 관한 연구)

  • Lim, Hyeong-Geon;Jang, Duk-Sung
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.469-476
    • /
    • 2001
  • As the service of information through web system increases in modern society, many questions and consultations are going on through Home page and E-mail in the hospital. But there are some burdens for the management and postponements for answering the questions. In this paper, we investigate the document classification methods as a primary research of the auto-answering system. On the basis of 1200 documents which are questions of patients, 66% are used for the learning documents and 34% for test documents. All of are also used for the document classification using NBC (Naive Bayes Classifier), common words and coefficient of correlation. As the result of the experiments, the two methods proposed in this paper, that is, common words and coefficient of correlation are higher as much as 3% and 5% respectively than the basic NBC methods. This result shows that the correlation between indexes and categories is more effective than the word frequency in the document classification.

  • PDF

Malware Detection Method using Opcode and windows API Calls (Opcode와 Windows API를 사용한 멀웨어 탐지)

  • Ahn, Tae-Hyun;Oh, Sang-Jin;Kwon, Young-Man
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.6
    • /
    • pp.11-17
    • /
    • 2017
  • We proposed malware detection method, which use the feature vector that consist of Opcode(operation code) and Windows API Calls extracted from executable files. And, we implemented our feature vector and measured the performance of it by using Bernoulli Naïve Bayes and K-Nearest Neighbor classifier. In experimental result, when using the K-NN classifier with the proposed method, we obtain 95.21% malware detection accuracy. It was better than existing methods using only either Opcode or Windows API Calls.

Natural Object Recognition for Augmented Reality Applications (증강현실 응용을 위한 자연 물체 인식)

  • Anjan, Kumar Paul;Mohammad, Khairul Islam;Min, Jae-Hong;Kim, Young-Bum;Baek, Joong-Hwan
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.2
    • /
    • pp.143-150
    • /
    • 2010
  • Markerless augmented reality system must have the capability to recognize and match natural objects both in indoor and outdoor environment. In this paper, a novel approach is proposed for extracting features and recognizing natural objects using visual descriptors and codebooks. Since the augmented reality applications are sensitive to speed of operation and real time performance, our work mainly focused on recognition of multi-class natural objects and reduce the computing time for classification and feature extraction. SIFT(scale invariant feature transforms) and SURF(speeded up robust feature) are used to extract features from natural objects during training and testing, and their performance is compared. Then we form visual codebook from the high dimensional feature vectors using clustering algorithm and recognize the objects using naive Bayes classifier.