• 제목/요약/키워드: MultinomialNB

검색결과 3건 처리시간 0.016초

Urdu News Classification using Application of Machine Learning Algorithms on News Headline

  • Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제21권2호
    • /
    • pp.229-237
    • /
    • 2021
  • Our modern 'information-hungry' age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Naïve Bayes (Bernoulli NB) and Multinomial Naïve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.

Classifications of Hadiths based on Supervised Learning Techniques

  • AbdElaal, Hammam M.;Bouallegue, Belgacem;Elshourbagy, Motasem;Matter, Safaa S.;AbdElghfar, Hany A.;Khattab, Mahmoud M.;Ahmed, Abdelmoty M.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권11호
    • /
    • pp.1-10
    • /
    • 2022
  • This study aims to build a model is capable of classifying the categories of hadith, according to the reliability of hadith' narrators (sahih, hassan, da'if, maudu) and according to what was attributed to the Prophet Muhammad (saying, doing, describing, reporting ) using the supervised learning algorithms, with a view to discover a relationship between these classifications, based on the outputs of this model, which might be useful to avoid the controversy and useless debate on automatic classifications of hadith, using some of the statistical methods such as chi-square, information gain and association rules. The experimental results showed that there is a relation between these classifications, most of Sahih hadiths are belong to saying class, and most of maudu hadiths are belong to reporting class. Also the best classifier had given high accuracy was MultinomialNB, it achieved higher accuracy reached up to 0.9708 %, for his ability to process high dimensional problems and identifying the most important features that are relevant to target data in training stage. Followed by LinearSVC classifier, reached up to 0.9655, and finally, KNeighborsClassifier reached up to 0.9644.

Levenshtein 거리를 이용한 영화평 감성 분류 (Sentiment Classification of Movie Reviews using Levenshtein Distance)

  • 안광모;김윤석;김영훈;서영훈
    • 디지털콘텐츠학회 논문지
    • /
    • 제14권4호
    • /
    • pp.581-587
    • /
    • 2013
  • 본 논문에서는 레빈쉬타인 거리(Levenshtein distance)를 이용한 감성 분류 방법을 제안한다. 감성 자질에 레빈쉬타인 거리를 적용하여 BOW(Back-Of-Word)를 생성하고 이를 학습 자질로 사용한다. 학습 모델은 지지벡터기계(support vector machines, SVMs)와 나이브 베이즈(Naive Bayes)를 이용하였다. 실험 데이터로는 다음 영화 사이트로부터 영화평을 수집하였으며, 수집한 영화평은 총 2,385건이다. 수집된 영화평으로부터 감성 어휘를 수작업을 통해 수집하였으며 총 778개 어휘가 선별되었다. 실험에서는 감성 어휘에 레빈쉬타인 거리를 적용한 BOW를 이용하여 기계학습을 수행하였으며, 10-fold-cross validation 방식으로 분류기의 성능을 평가하였다. 평가 결과는 레빈쉬타인 거리가 3일 때 다항 나이브 베이즈(Muitinomial Naive Bayes) 분류기에서 85.46%의 가장 높은 정확도를 보였다. 실험을 통하여 본 논문에서 제안하는 방법이 문서 내의 철자 오류에 대해서도 분류 성능에 영향을 적게 받음을 알 수 있었다.