• Title/Summary/Keyword: Naive Bayes algorithm

Search Result 73, Processing Time 0.025 seconds

Object Detection and Classification Using Extended Descriptors for Video Surveillance Applications (비디오 감시 응용에서 확장된 기술자를 이용한 물체 검출과 분류)

  • Islam, Mohammad Khairul;Jahan, Farah;Min, Jae-Hong;Baek, Joong-Hwan
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.4
    • /
    • pp.12-20
    • /
    • 2011
  • In this paper, we propose an efficient object detection and classification algorithm for video surveillance applications. Previous researches mainly concentrated either on object detection or classification using particular type of feature e.g., Scale Invariant Feature Transform (SIFT) or Speeded Up Robust Feature (SURF) etc. In this paper we propose an algorithm that mutually performs object detection and classification. We combinedly use heterogeneous types of features such as texture and color distribution from local patches to increase object detection and classification rates. We perform object detection using spatial clustering on interest points, and use Bag of Words model and Naive Bayes classifier respectively for image representation and classification. Experimental results show that our combined feature is better than the individual local descriptor in object classification rate.

Discovery of User Preference in Recommendation System through Combining Collaborative Filtering and Content based Filtering (협력적 여과와 내용 기반 여과의 병합을 통한 추천 시스템에서의 사용자 선호도 발견)

  • Ko, Su-Jeong;Kim, Jin-Su;Kim, Tae-Yong;Choi, Jun-Hyeog;Lee, Jung-Hyun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.6
    • /
    • pp.684-695
    • /
    • 2001
  • Recent recommender system uses a method of combining collaborative filtering system and content based filtering system in order to solve sparsity and first rater problem in collaborative filtering system. Collaborative filtering systems use a database about user preferences to predict additional topics. Content based filtering systems provide recommendations by matching user interests with topic attributes. In this paper, we describe a method for discovery of user preference through combining two techniques for recommendation that allows the application of machine learning algorithm. The proposed collaborative filtering method clusters user using genetic algorithm based on items categorized by Naive Bayes classifier and the content based filtering method builds user profile through extracting user interest using relevance feedback. We evaluate our method on a large database of user ratings for web document and it significantly outperforms previously proposed methods.

  • PDF

Urdu News Classification using Application of Machine Learning Algorithms on News Headline

  • Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.2
    • /
    • pp.229-237
    • /
    • 2021
  • Our modern 'information-hungry' age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Naïve Bayes (Bernoulli NB) and Multinomial Naïve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.

Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting (점진적 특징 가중치 기법을 이용한 나이브 베이즈 문서분류기의 성능 개선)

  • Kim, Han-Joon;Chang, Jae-Young
    • The KIPS Transactions:PartB
    • /
    • v.15B no.5
    • /
    • pp.457-464
    • /
    • 2008
  • In the real-world operational environment, most of text classification systems have the problems of insufficient training documents and no prior knowledge of feature space. In this regard, $Na{\ddot{i}ve$ Bayes is known to be an appropriate algorithm of operational text classification since the classification model can be evolved easily by incrementally updating its pre-learned classification model and feature space. This paper proposes the improving technique of $Na{\ddot{i}ve$ Bayes classifier through feature weighting strategy. The basic idea is that parameter estimation of $Na{\ddot{i}ve$ Bayes considers the degree of feature importance as well as feature distribution. We can develop a more accurate classification model by incorporating feature weights into Naive Bayes learning algorithm, not performing a learning process with a reduced feature set. In addition, we have extended a conventional feature update algorithm for incremental feature weighting in a dynamic operational environment. To evaluate the proposed method, we perform the experiments using the various document collections, and show that the traditional $Na{\ddot{i}ve$ Bayes classifier can be significantly improved by the proposed technique.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

Synopsis-Based Classification of Movie Genres Using Machine Learning Techniques (기계학습을 이용한 시놉시스 기반 영화장르 분류 기법)

  • Jae-Eon Lee;Gum-Won Hong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.82-85
    • /
    • 2008
  • 고객의 기호와 요구에 부응하는 서비스의 제공을 위해 영화 요소 중 정확한 장르의 분류는 고객의 선택에 있어 중요한 문제이다. 기존의 수작업에 의한 장르 분류는 시간과 비용, 신뢰성 등에서 비효율적이다. 이러한 문제의 해결을 위해 영화 시놉시스(Synopsis) 기반의 기계학습 방법은 효율적인 대안이 될 수 있다. 본 논문에서는 대다수 영화서비스 주체가 보유하고 있는 시놉시스 정보를 기반으로 하여 기계학습을 이용한 영화장르 분류에 관한 하나의 정형화된 방법을 제시한다. 기계학습 Algorithm 중 LibSVM, RandomComittee, LMT, NaiveBayes, PART Algorithm 을 이용하여 Algorithm 별, 장르별 분류 정확도를 측정하여 비교한다.

Prediction model of peptic ulcer diseases in middle-aged and elderly adults based on machine learning (머신러닝 기반 중노년층의 기능성 위장장애 예측 모델 구현)

  • Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.4
    • /
    • pp.289-294
    • /
    • 2020
  • Peptic ulcer disease is a gastrointestinal disorder caused by Helicobacter pylori infection and the use of nonsteroid anti-inflammatory drugs. While many studies have been conducted to find the risk factors of peptic ulcers, there are no studies on the suggestion of peptic ulcer prediction models for Koreans. Therefore, the purpose of this study is to implement peptic ulcer prediction model using machine learning based on demographic information, obesity information, blood information, and nutritional information for middle-aged and elderly people. For model building, wrapper-based variable selection method and naive Bayes algorithm were used. The classification accuracy of the female prediction model was the area under the receiver operating characteristics curve (AUC) of 0.712, and males showed an AUC of 0.674, which is lower than that of females. These results can be used for prediction and prevention of peptic ulcers in the middle and elderly people.

Prediction model of hypercholesterolemia using body fat mass based on machine learning (머신러닝 기반 체지방 측정정보를 이용한 고콜레스테롤혈증 예측모델)

  • Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.4
    • /
    • pp.413-420
    • /
    • 2019
  • The purpose of the present study is to develop a model for predicting hypercholesterolemia using an integrated set of body fat mass variables based on machine learning techniques, beyond the study of the association between body fat mass and hypercholesterolemia. For this study, a total of six models were created using two variable subset selection methods and machine learning algorithms based on the Korea National Health and Nutrition Examination Survey (KNHANES) data. Among the various body fat mass variables, we found that trunk fat mass was the best variable for predicting hypercholesterolemia. Furthermore, we obtained the area under the receiver operating characteristic curve value of 0.739 and the Matthews correlation coefficient value of 0.36 in the model using the correlation-based feature subset selection and naive Bayes algorithm. Our findings are expected to be used as important information in the field of disease prediction in large-scale screening and public health research.

Comparison Thai Word Sense Disambiguation Method

  • Modhiran, Teerapong;Kruatrachue, Boontee;Supnithi, Thepchai
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1307-1312
    • /
    • 2004
  • Word sense disambiguation is one of the most important problems in natural language processing research topics such as information retrieval and machine translation. Many approaches can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledge-based, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy. The purpose of this paper is to compare three famous machine learning techniques, Snow, SVM and Naive Bayes in Word-Sense Disambiguation on Thai language. 10 ambiguous words are selected to test with word and POS features. The results show that SVM algorithm gives the best results in solving of Thai WSD and the accuracy rate is approximately 83-96%.

  • PDF

Improving Text Categorization with High Quality Bigrams (고품질 바이그램을 이용한 문서 범주화 성능 향상)

  • Lee, Chan-Do;Tan, Chade-Meng;Wang, Yuan-Fang
    • The KIPS Transactions:PartB
    • /
    • v.9B no.4
    • /
    • pp.415-420
    • /
    • 2002
  • This paper presents an efficient text categorization algorithm that generates high quality bigrams by using the information gain metric, combined with various frequency thresholds. The bigrams, along with unigrams, are then given as features to a Naive Bayes classifier. The experimental results suggest that the bigrams, while small in number, can substantially contribute to improving text categorization. Upon close examination of the results, we conclude that the algorithm is most successful in correctly classifying more positive documents, but may cause more negative documents to be classified incorrectly.