• Title/Summary/Keyword: Gaussian Naive Bayes

Search Result 8, Processing Time 0.022 seconds

Study on Anomaly Detection Method of Improper Foods using Import Food Big data (수입식품 빅데이터를 이용한 부적합식품 탐지 시스템에 관한 연구)

  • Cho, Sanggoo;Choi, Gyunghyun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.19-33
    • /
    • 2018
  • Owing to the increase of FTA, food trade, and versatile preferences of consumers, food import has increased at tremendous rate every year. While the inspection check of imported food accounts for about 20% of the total food import, the budget and manpower necessary for the government's import inspection control is reaching its limit. The sudden import food accidents can cause enormous social and economic losses. Therefore, predictive system to forecast the compliance of food import with its preemptive measures will greatly improve the efficiency and effectiveness of import safety control management. There has already been a huge data accumulated from the past. The processed foods account for 75% of the total food import in the import food sector. The analysis of big data and the application of analytical techniques are also used to extract meaningful information from a large amount of data. Unfortunately, not many studies have been done regarding analyzing the import food and its implication with understanding the big data of food import. In this context, this study applied a variety of classification algorithms in the field of machine learning and suggested a data preprocessing method through the generation of new derivative variables to improve the accuracy of the model. In addition, the present study compared the performance of the predictive classification algorithms with the general base classifier. The Gaussian Naïve Bayes prediction model among various base classifiers showed the best performance to detect and predict the nonconformity of imported food. In the future, it is expected that the application of the abnormality detection model using the Gaussian Naïve Bayes. The predictive model will reduce the burdens of the inspection of import food and increase the non-conformity rate, which will have a great effect on the efficiency of the food import safety control and the speed of import customs clearance.

Android Malware Detection Using Permission-Based Machine Learning Approach (머신러닝을 이용한 권한 기반 안드로이드 악성코드 탐지)

  • Kang, Seongeun;Long, Nguyen Vu;Jung, Souhwan
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.3
    • /
    • pp.617-623
    • /
    • 2018
  • This study focuses on detection of malicious code through AndroidManifest permissoion feature extracted based on Android static analysis. Features are built on the permissions of AndroidManifest, which can save resources and time for analysis. Malicious app detection model consisted of SVM (support vector machine), NB (Naive Bayes), Gradient Boosting Classifier (GBC) and Logistic Regression model which learned 1,500 normal apps and 500 malicious apps and 98% detection rate. In addition, malicious app family identification is implemented by multi-classifiers model using algorithm SVM, GPC (Gaussian Process Classifier) and GBC (Gradient Boosting Classifier). The learned family identification machine learning model identified 92% of malicious app families.

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

Autism Spectrum Disorder Detection in Children using the Efficacy of Machine Learning Approaches

  • Tariq Rafiq;Zafar Iqbal;Tahreem Saeed;Yawar Abbas Abid;Muneeb Tariq;Urooj Majeed;Akasha
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.179-186
    • /
    • 2023
  • For the future prosperity of any society, the sound growth of children is essential. Autism Spectrum Disorder (ASD) is a neurobehavioral disorder which has an impact on social interaction of autistic child and has an undesirable effect on his learning, speaking, and responding skills. These children have over or under sensitivity issues of touching, smelling, and hearing. Its symptoms usually appear in the child of 4- to 11-year-old but parents did not pay attention to it and could not detect it at early stages. The process to diagnose in recent time is clinical sessions that are very time consuming and expensive. To complement the conventional method, machine learning techniques are being used. In this way, it improves the required time and precision for diagnosis. We have applied TFLite model on image based dataset to predict the autism based on facial features of child. Afterwards, various machine learning techniques were trained that includes Logistic Regression, KNN, Gaussian Naïve Bayes, Random Forest and Multi-Layer Perceptron using Autism Spectrum Quotient (AQ) dataset to improve the accuracy of the ASD detection. On image based dataset, TFLite model shows 80% accuracy and based on AQ dataset, we have achieved 100% accuracy from Logistic Regression and MLP models.

Variational Bayesian multinomial probit model with Gaussian process classification on mice protein expression level data (가우시안 과정 분류에 대한 변분 베이지안 다항 프로빗 모형: 쥐 단백질 발현 데이터에의 적용)

  • Donghyun Son;Beom Seuk Hwang
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.115-127
    • /
    • 2023
  • Multinomial probit model is a popular model for multiclass classification and choice model. Markov chain Monte Carlo (MCMC) method is widely used for estimating multinomial probit model, but its computational cost is high. However, it is well known that variational Bayesian approximation is more computationally efficient than MCMC, because it uses subsets of samples. In this study, we describe multinomial probit model with Gaussian process classification and how to employ variational Bayesian approximation on the model. This study also compares the results of variational Bayesian multinomial probit model to the results of naive Bayes, K-nearest neighbors and support vector machine for the UCI mice protein expression level data.

Fault Diagnosis of Drone Using Machine Learning (머신러닝을 이용한 드론의 고장진단에 관한 연구)

  • Park, Soo-Hyun;Do, Jae-Seok;Choi, Seong-Dae;Hur, Jang-Wook
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.20 no.9
    • /
    • pp.28-34
    • /
    • 2021
  • The Fourth Industrial Revolution has led to the development of drones for commercial and private applications. Therefore, the malfunction of drones has become a prominent problem. Failure mode and effect analysis was used in this study to analyze the primary cause of drone failure, and blade breakage was observed to have the highest frequency of failure. This was tested using a vibration sensor placed on drones along the breakage length of the blades. The data exhibited a significant increase in vibration within the drone body for blade fracture length. Principal component analysis was used to reduce the data dimension and classify the state with machine learning algorithms such as support vector machine, k-nearest neighbor, Gaussian naive Bayes, and random forest. The performance of machine learning was higher than 0.95 for the four algorithms in terms of accuracy, precision, recall, and f1-score. A follow-up study on failure prediction will be conducted based on the results of fault diagnosis.

Identification of Pb-Zn ore under the condition of low count rate detection of slim hole based on PGNAA technology

  • Haolong Huang;Pingkun Cai;Wenbao Jia;Yan Zhang
    • Nuclear Engineering and Technology
    • /
    • v.55 no.5
    • /
    • pp.1708-1717
    • /
    • 2023
  • The grade analysis of lead-zinc ore is the basis for the optimal development and utilization of deposits. In this study, a method combining Prompt Gamma Neutron Activation Analysis (PGNAA) technology and machine learning is proposed for lead-zinc mine borehole logging, which can identify lead-zinc ores of different grades and gangue in the formation, providing real-time grade information qualitatively and semi-quantitatively. Firstly, Monte Carlo simulation is used to obtain a gamma-ray spectrum data set for training and testing machine learning classification algorithms. These spectra are broadened, normalized and separated into inelastic scattering and capture spectra, and then used to fit different classifier models. When the comprehensive grade boundary of high- and low-grade ores is set to 5%, the evaluation metrics calculated by the 5-fold cross-validation show that the SVM (Support Vector Machine), KNN (K-Nearest Neighbor), GNB (Gaussian Naive Bayes) and RF (Random Forest) models can effectively distinguish lead-zinc ore from gangue. At the same time, the GNB model has achieved the optimal accuracy of 91.45% when identifying high- and low-grade ores, and the F1 score for both types of ores is greater than 0.9.

Ensemble Machine Learning Model Based YouTube Spam Comment Detection (앙상블 머신러닝 모델 기반 유튜브 스팸 댓글 탐지)

  • Jeong, Min Chul;Lee, Jihyeon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.5
    • /
    • pp.576-583
    • /
    • 2020
  • This paper proposes a technique to determine the spam comments on YouTube, which have recently seen tremendous growth. On YouTube, the spammers appeared to promote their channels or videos in popular videos or leave comments unrelated to the video, as it is possible to monetize through advertising. YouTube is running and operating its own spam blocking system, but still has failed to block them properly and efficiently. Therefore, we examined related studies on YouTube spam comment screening and conducted classification experiments with six different machine learning techniques (Decision tree, Logistic regression, Bernoulli Naive Bayes, Random Forest, Support vector machine with linear kernel, Support vector machine with Gaussian kernel) and ensemble model combining these techniques in the comment data from popular music videos - Psy, Katy Perry, LMFAO, Eminem and Shakira.