• Title/Summary/Keyword: kNN classifier

Search Result 101, Processing Time 0.029 seconds

A Comparison of Artificial Neural Networks and Statistical Pattern Recognition Methods for Rotation Machine Condition Classification (회전기계 고장 진단에 적용한 인공 신경회로망과 통계적 패턴 인식 기법의 비교 연구)

  • Kim, Chang-Gu;Park, Kwang-Ho;Kee, Chang-Doo
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.16 no.12
    • /
    • pp.119-125
    • /
    • 1999
  • This paper gives an overview of the various approaches to designing statistical pattern recognition scheme based on Bayes discrimination rule and the artificial neural networks for rotating machine condition classification. Concerning to Bayes discrimination rule, this paper contains the linear discrimination rule applied to classification into several multivariate normal distributions with common covariance matrices, the quadratic discrimination rule under different covariance matrices. Also we discribes k-nearest neighbor method to directly estimate a posterior probability of each class. Five features are extracted in time domain vibration signals. Employing these five features, statistical pattern classifier and neural networks have been established to detect defects on rotating machine. Four different cases of rotation machine were observed. The effects of k number and neural networks structures on monitoring performance have also been investigated. For the comparison of diagnosis performance of these two method, their recognition success rates are calculated form the test data. The result of experiment which classifies the rotating machine conditions using each method presents that the neural networks shows the highest recognition rate.

  • PDF

Hybrid Learning Architectures for Advanced Data Mining:An Application to Binary Classification for Fraud Management (개선된 데이터마이닝을 위한 혼합 학습구조의 제시)

  • Kim, Steven H.;Shin, Sung-Woo
    • Journal of Information Technology Application
    • /
    • v.1
    • /
    • pp.173-211
    • /
    • 1999
  • The task of classification permeates all walks of life, from business and economics to science and public policy. In this context, nonlinear techniques from artificial intelligence have often proven to be more effective than the methods of classical statistics. The objective of knowledge discovery and data mining is to support decision making through the effective use of information. The automated approach to knowledge discovery is especially useful when dealing with large data sets or complex relationships. For many applications, automated software may find subtle patterns which escape the notice of manual analysis, or whose complexity exceeds the cognitive capabilities of humans. This paper explores the utility of a collaborative learning approach involving integrated models in the preprocessing and postprocessing stages. For instance, a genetic algorithm effects feature-weight optimization in a preprocessing module. Moreover, an inductive tree, artificial neural network (ANN), and k-nearest neighbor (kNN) techniques serve as postprocessing modules. More specifically, the postprocessors act as second0order classifiers which determine the best first-order classifier on a case-by-case basis. In addition to the second-order models, a voting scheme is investigated as a simple, but efficient, postprocessing model. The first-order models consist of statistical and machine learning models such as logistic regression (logit), multivariate discriminant analysis (MDA), ANN, and kNN. The genetic algorithm, inductive decision tree, and voting scheme act as kernel modules for collaborative learning. These ideas are explored against the background of a practical application relating to financial fraud management which exemplifies a binary classification problem.

  • PDF

Malware Detection Method using Opcode and windows API Calls (Opcode와 Windows API를 사용한 멀웨어 탐지)

  • Ahn, Tae-Hyun;Oh, Sang-Jin;Kwon, Young-Man
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.6
    • /
    • pp.11-17
    • /
    • 2017
  • We proposed malware detection method, which use the feature vector that consist of Opcode(operation code) and Windows API Calls extracted from executable files. And, we implemented our feature vector and measured the performance of it by using Bernoulli Naïve Bayes and K-Nearest Neighbor classifier. In experimental result, when using the K-NN classifier with the proposed method, we obtain 95.21% malware detection accuracy. It was better than existing methods using only either Opcode or Windows API Calls.

A Study on the Performance Evaluation of Machine Learning for Predicting the Number of Movie Audiences (영화 관객 수 예측을 위한 기계학습 기법의 성능 평가 연구)

  • Jeong, Chan-Mi;Min, Daiki
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.2
    • /
    • pp.49-63
    • /
    • 2020
  • The accurate prediction of box office in the early stage is crucial for film industry to make better managerial decision. With aims to improve the prediction performance, the purpose of this paper is to evaluate the use of machine learning methods. We tested both classification and regression based methods including k-NN, SVM and Random Forest. We first evaluate input variables, which show that reputation-related information generated during the first two-week period after release is significant. Prediction test results show that regression based methods provides lower prediction error, and Random Forest particularly outperforms other machine learning methods. Regression based method has better prediction power when films have small box office earnings. On the other hand, classification based method works better for predicting large box office earnings.

Deep Learning based Emotion Classification using Multi Modal Bio-signals (다중 모달 생체신호를 이용한 딥러닝 기반 감정 분류)

  • Lee, JeeEun;Yoo, Sun Kook
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.2
    • /
    • pp.146-154
    • /
    • 2020
  • Negative emotion causes stress and lack of attention concentration. The classification of negative emotion is important to recognize risk factors. To classify emotion status, various methods such as questionnaires and interview are used and it could be changed by personal thinking. To solve the problem, we acquire multi modal bio-signals such as electrocardiogram (ECG), skin temperature (ST), galvanic skin response (GSR) and extract features. The neural network (NN), the deep neural network (DNN), and the deep belief network (DBN) is designed using the multi modal bio-signals to analyze emotion status. As a result, the DBN based on features extracted from ECG, ST and GSR shows the highest accuracy (93.8%). It is 5.7% higher than compared to the NN and 1.4% higher than compared to the DNN. It shows 12.2% higher accuracy than using only single bio-signal (GSR). The multi modal bio-signal acquisition and the deep learning classifier play an important role to classify emotion.

Induction Motor Diagnosis System by Effective Frequency Selection and Linear Discriminant Analysis (유효 주파수 선택과 선형판별분석기법을 이용한 유도전동기 고장진단 시스템)

  • Lee, Dae-Jong;Cho, Jae-Hoon;Yun, Jong-Hwan;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.3
    • /
    • pp.380-387
    • /
    • 2010
  • For the fault diagnosis of three-phase induction motors, we propose a diagnosis algorithm based on mutual information and linear discriminant analysis (LDA). The experimental unit consists of machinery module for induction motor drive and data acquisition module to obtain the fault signal. As the first step for diagnosis procedure, DFT is performed to transform the acquired current signal into frequency domain. And then, frequency components are selected according to discriminate order calculated by mutual information As the next step, feature extraction is performed by LDA, and then diagnosis is evaluated by k-NN classifier. The results to verify the usability of the proposed algorithm showed better performance than various conventional methods.

A Study on the Signal Processing for Content-Based Audio Genre Classification (내용기반 오디오 장르 분류를 위한 신호 처리 연구)

  • 윤원중;이강규;박규식
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.271-278
    • /
    • 2004
  • In this paper, we propose a content-based audio genre classification algorithm that automatically classifies the query audio into five genres such as Classic, Hiphop, Jazz, Rock, Speech using digital sign processing approach. From the 20 seconds query audio file, the audio signal is segmented into 23ms frame with non-overlapped hamming window and 54 dimensional feature vectors, including Spectral Centroid, Rolloff, Flux, LPC, MFCC, is extracted from each query audio. For the classification algorithm, k-NN, Gaussian, GMM classifier is used. In order to choose optimum features from the 54 dimension feature vectors, SFS(Sequential Forward Selection) method is applied to draw 10 dimension optimum features and these are used for the genre classification algorithm. From the experimental result, we can verify the superior performance of the proposed method that provides near 90% success rate for the genre classification which means 10%∼20% improvements over the previous methods. For the case of actual user system environment, feature vector is extracted from the random interval of the query audio and it shows overall 80% success rate except extreme cases of beginning and ending portion of the query audio file.

An Optimizing Hyperrectangle method for Nearest Hyperrectangle Learning (초월평면 최적화를 이용한 최근접 초월평면 학습법의 성능 향상 방법)

  • Lee, Hyeong-Il
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.3
    • /
    • pp.328-333
    • /
    • 2003
  • NGE (Nested Generalized Exemplars) proposed by Salzberg improved the storage requirement and classification rate of the Memory Based Reasoning. It constructs hyperrectangles during training and performs classification tasks. It worked not bad in many area, however, the major drawback of NGE is constructing hyperrectangles because its hyperrectangle is extended so as to cover the error data and the way of maintaining the feature weight vector. We proposed the OH (Optimizing Hyperrectangle) algorithm which use the feature weight vectors and the ED(Exemplar Densimeter) to optimize resulting Hyperrectangles. The proposed algorithm, as well as the EACH, required only approximately 40% of memory space that is needed in k-NN classifier, and showed a superior classification performance to the EACH. Also, by reducing the number of stored patterns, it showed excellent results in terms of classification when we compare it to the k-NN and the EACH.

A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction (교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.6
    • /
    • pp.205-220
    • /
    • 2018
  • Software defect prediction is helpful for allocating valuable project resources effectively for software quality assurance activities thanks to focusing on the identified fault-prone modules. If historical data collected within a company is sufficient, a Within-Project Defect Prediction (WPDP) can be utilized for accurate fault-prone module prediction. In case a company does not maintain historical data, it may be helpful to build a classifier towards predicting comprehensible fault prediction based on Cross-Project Defect Prediction (CPDP). Since CPDP employs different project data collected from other organization to build a classifier, the main obstacle to build an accurate classifier is that distributions between source and target projects are not similar. To address the problem, because it is crucial to identify effective similarity measure techniques to obtain high performance for CPDP, In this paper, we aim to identify them. We compare various similarity measure techniques. The effectiveness of similarity weights calculated by those similarity measure techniques are evaluated. The results are verified using the statistical significance test and the effect size test. The results show k-Nearest Neighbor (k-NN), LOcal Correlation Integral (LOCI), and Range methods are the top three performers. The experimental results show that predictive performances using the three methods are comparable to those of WPDP.

Cyberbullying Detection in Twitter Using Sentiment Analysis

  • Theng, Chong Poh;Othman, Nur Fadzilah;Abdullah, Raihana Syahirah;Anawar, Syarulnaziah;Ayop, Zakiah;Ramli, Sofia Najwa
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • Cyberbullying has become a severe issue and brought a powerful impact on the cyber world. Due to the low cost and fast spreading of news, social media has become a tool that helps spread insult, offensive, and hate messages or opinions in a community. Detecting cyberbullying from social media is an intriguing research topic because it is vital for law enforcement agencies to witness how social media broadcast hate messages. Twitter is one of the famous social media and a platform for users to tell stories, give views, express feelings, and even spread news, whether true or false. Hence, it becomes an excellent resource for sentiment analysis. This paper aims to detect cyberbully threats based on Naïve Bayes, support vector machine (SVM), and k-nearest neighbour (k-NN) classifier model. Sentiment analysis will be applied based on people's opinions on social media and distribute polarity to them as positive, neutral, or negative. The accuracy for each classifier will be evaluated.