• Title/Summary/Keyword: naive bayes

Search Result 238, Processing Time 0.025 seconds

Performance Improvement of Collaborative Filtering System Using Associative User′s Clustering Analysis for the Recalculation of Preference and Representative Attribute-Neighborhood (선호도 재계산을 위한 연관 사용자 군집 분석과 Representative Attribute -Neighborhood를 이용한 협력적 필터링 시스템의 성능향상)

  • Jung, Kyung-Yong;Kim, Jin-Su;Kim, Tae-Yong;Lee, Jung-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.287-296
    • /
    • 2003
  • There has been much research focused on collaborative filtering technique in Recommender System. However, these studies have shown the First-Rater Problem and the Sparsity Problem. The main purpose of this Paper is to solve these Problems. In this Paper, we suggest the user's predicting preference method using Bayesian estimated value and the associative user clustering for the recalculation of preference. In addition to this method, to complement a shortcoming, which doesn't regard the attribution of item, we use Representative Attribute-Neighborhood method that is used for the prediction when we find the similar neighborhood through extracting the representative attribution, which most affect the preference. We improved the efficiency by using the associative user's clustering analysis in order to calculate the preference of specific item within the cluster item vector to the collaborative filtering algorithm. Besides, for the problem of the Sparsity and First-Rater, through using Association Rule Hypergraph Partitioning algorithm associative users are clustered according to the genre. New users are classified into one of these genres by Naive Bayes classifier. In addition, in order to get the similarity value between users belonged to the classified genre and new users, and this paper allows the different estimated value to item which user evaluated through Naive Bayes learning. As applying the preference granted the estimated value to Pearson correlation coefficient, it can make the higher accuracy because the errors that cause the missing value come less. We evaluate our method on a large collaborative filtering database of user rating and it significantly outperforms previous proposed method.

Forecasting of Various Air Pollutant Parameters in Bangalore Using Naïve Bayesian

  • Shivkumar M;Sudhindra K R;Pranesha T S;Chate D M;Beig G
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.3
    • /
    • pp.196-200
    • /
    • 2024
  • Weather forecasting is considered to be of utmost important among various important sectors such as flood management and hydro-electricity generation. Although there are various numerical methods for weather forecasting but majority of them are reported to be Mechanistic computationally demanding due to their complexities. Therefore, it is necessary to develop and build models for accurately predicting the weather conditions which are faster as well as efficient in comparison to the prevalent meteorological models. The study has been undertaken to forecast various atmospheric parameters in the city of Bangalore using Naïve Bayes algorithms. The individual parameters analyzed in the study consisted of wind speed (WS), wind direction (WD), relative humidity (RH), solar radiation (SR), black carbon (BC), radiative forcing (RF), air temperature (AT), bar pressure (BP), PM10 and PM2.5 of the Bangalore city collected from Air Quality Monitoring Station for a period of 5 years from January 2015 to May 2019. The study concluded that Naive Bayes is an easy and efficient classifier that is centered on Bayes theorem, is quite efficient in forecasting the various air pollution parameters of the city of Bangalore.

A Study on Anomalous Propagation Echo Identification using Naive Bayesian Classifier (나이브 베이지안 분류기를 이용한 이상전파에코 식별방법에 대한 연구)

  • Lee, Hansoo;Kim, Sungshin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.89-90
    • /
    • 2016
  • Anomalous propagation echo is a kind of abnormal radar signal occurred by irregularly refracted radar beam caused by temperature or humidity. The echo frequently appears in ground-based weather radar. In order to improve accuracy of weather forecasting, it is important to analyze radar data precisely. Therefore, there are several ongoing researches about identifying the anomalous propagation echo all over the world. This paper conducts researches about a classification method which can distinguish anomalous propagation echo in the radar data using naive Bayes classifier and unique attributes of the echo such as reflectivity, altitude, and so on. It is confirmed that the fine classification results are derived by verifying the suggested naive Bayes classifier using actual appearance cases of the echo.

  • PDF

The Method of Effective Inference Using Rough Set and Fuzzy Naive Bayes Theory (러프집합과 퍼지 네이브 베이스 이론을 이용한 효율적인 추론 방법)

  • Hwang Jeong-Sik;Son Chang-Sik;Chung Hwan-Mook
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.117-120
    • /
    • 2005
  • 퍼지 규칙 기반 시스템에서 분류 및 경계를 결정하기 위한 방법으로 퍼지 규칙을 학습하는 다양한 방법들이 제안되고 있다. 그리고 추론 규칙간의 상관성을 고려하여 불필요한 속성을 제거함으로써 좀 더 효율적인 추론 결과를 얻을 수 있다. 따라서 본 논문에서는 퍼지 규칙 기반 시스템에서 각 규칙에 따른 결정 테이블를 작성하고 러프집합을 이용하여 불필요한 속성을 제거하였으며 규칙의 확신도에 퍼지 네이브 베이스 이론을 적용한 추론 방법을 제안한다.

  • PDF

Classification Accuracy Improvement for Decision Tree (의사결정트리의 분류 정확도 향상)

  • Rezene, Mehari Marta;Park, Sanghyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.787-790
    • /
    • 2017
  • Data quality is the main issue in the classification problems; generally, the presence of noisy instances in the training dataset will not lead to robust classification performance. Such instances may cause the generated decision tree to suffer from over-fitting and its accuracy may decrease. Decision trees are useful, efficient, and commonly used for solving various real world classification problems in data mining. In this paper, we introduce a preprocessing technique to improve the classification accuracy rates of the C4.5 decision tree algorithm. In the proposed preprocessing method, we applied the naive Bayes classifier to remove the noisy instances from the training dataset. We applied our proposed method to a real e-commerce sales dataset to test the performance of the proposed algorithm against the existing C4.5 decision tree classifier. As the experimental results, the proposed method improved the classification accuracy by 8.5% and 14.32% using training dataset and 10-fold crossvalidation, respectively.

An Automatic Document Classification with Bayesian Learning (베이지안 학습을 이용한 문서의 자동분류)

  • Kim, Jin-Sang;Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.1
    • /
    • pp.19-30
    • /
    • 2000
  • As the number of online documents increases enormously with the expansion of information technology, the importance of automatic document classification is greatly enlarged. In this paper, an automatic document classification method is investigated and applied to UseNet 20 newsgroup articles to test its efficacy. The classification system uses Naive Bayes classification algorithm and the experimental result shows that a randomly selected newsgroup arcicle can be classified into its own category over 77% accuracy.

  • PDF

User Preference Prediction Method Using Associative User Clustering and Bayesian Classification (연관 사용자 군집과 베이지안 분류를 이용한 사용자 선호도 예측 방법)

  • 정경용;김진현;이정현
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.109-111
    • /
    • 2001
  • 기존의 협력적 필터링 기술을 이용한 사용자 선호도 예측 방법에서는 아이템에 대한 사용자의 선호도를 기반으로 이웃 선정 방법(Nearest-Neighborhood Method)을 사용하고, 피어슨 상관 계수에 의해 사용자의 유사도를 구하므로 아이템에 대한 내용을 반영하지 못할 뿐만 아니라 희박성 문제를 해결하지 못하였다. 본 논문에서는 기존의 사용자 선호도 예측 방법의 문제점을 보완하기 위하여 연관 사용자 군집과 베이지안 분류를 이음한 사용자 선호도 예측 방법을 제안한다. 제안한 방법에서는 협력적 필터링 시스템에서의 희박성(Sparsity)문제를 해결하기 위하여 ARHP 알고리즘을 사용하여 사용자를 장르별로 군집하며 새로운 사용자는 Naive Bayes 분류자에 의해 이들 장르 중 하나로 분류된다. 또한, 분류된 장르 내에 속한 사용자들과 새로운 사용자의 유사도출 구하기 위해 Naive Bayes 학습을 통해 사용자가 평가한 아이템에 추정치를 달리 부여한다. 추정치가 부여된 선호도를 기존의 피어슨 상관 관계에 적용할 경우 결측치(Missing Value)로 인한 예측의 오류를 적게 하여 예측의 정확도를 높일 수 있다. 제안된 방법의 성능을 평가하기 위해서 기존의 협력적 필터링 기술과 비교 평가하였다.

  • PDF

The Study of Chronic Kidney Disease Classification using KHANES data (국민건강영양조사 자료를 이용한 만성신장질환 분류기법 연구)

  • Lee, Hong-Ki;Myoung, Sungmin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.01a
    • /
    • pp.271-272
    • /
    • 2020
  • Data mining is known useful in medical area when no availability of evidence favoring a particular treatment option is found. Huge volume of structured/unstructured data is collected by the healthcare field in order to find unknown information or knowledge for effective diagnosis and clinical decision making. The data of 5,179 records considered for analysis has been collected from Korean National Health and Nutrition Examination Survey(KHANES) during 2-years. Data splitting, referred as the training and test sets, was applied to predict to fit the model. We analyzed to predict chronic kidney disease (CKD) using data mining method such as naive Bayes, logistic regression, CART and artificial neural network(ANN). This result present to select significant features and data mining techniques for the lifestyle factors related CKD.

  • PDF

Performance analysis and comparison of various machine learning algorithms for early stroke prediction

  • Vinay Padimi;Venkata Sravan Telu;Devarani Devi Ningombam
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1007-1021
    • /
    • 2023
  • Stroke is the leading cause of permanent disability in adults, and it can cause permanent brain damage. According to the World Health Organization, 795 000 Americans experience a new or recurrent stroke each year. Early detection of medical disorders, for example, strokes, can minimize the disabling effects. Thus, in this paper, we consider various risk factors that contribute to the occurrence of stoke and machine learning algorithms, for example, the decision tree, random forest, and naive Bayes algorithms, on patient characteristics survey data to achieve high prediction accuracy. We also consider the semisupervised self-training technique to predict the risk of stroke. We then consider the near-miss undersampling technique, which can select only instances in larger classes with the smaller class instances. Experimental results demonstrate that the proposed method obtains an accuracy of approximately 98.83% at low cost, which is significantly higher and more reliable compared with the compared techniques.