• Title/Summary/Keyword: k-NN classification

Search Result 188, Processing Time 0.032 seconds

Hybrid Approach to SVM Error Reduction in Document Classification (문서 분류에서의 SVM 오류 감소를 위한 하이브리드 방법)

  • Lee Jun-Seok;Kim Sang-Soo;Park Seong-Bae;Lee Sang-jo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.544-546
    • /
    • 2005
  • 본 논문에서는 문서 분류(document classification) 성능을 높이기 위해 다음과 같은 방법을 제안한다. 먼저 패턴 분류 문제에 있어서 우수한 성능을 보이는 SVM(Support Vector Machine)을 사용하여 분류 하고, 마진을 만족하는 데이터를 다시 k-NN 으로 분류를 한다. 단순히 SVM만을 사용한것보다. k-NN을 함께 사용한것이 더 높은 성능을 보였다.

  • PDF

An Optimizing Hyperrectangle method for Nearest Hyperrectangle Learning (초월평면 최적화를 이용한 최근접 초월평면 학습법의 성능 향상 방법)

  • Lee, Hyeong-Il
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.3
    • /
    • pp.328-333
    • /
    • 2003
  • NGE (Nested Generalized Exemplars) proposed by Salzberg improved the storage requirement and classification rate of the Memory Based Reasoning. It constructs hyperrectangles during training and performs classification tasks. It worked not bad in many area, however, the major drawback of NGE is constructing hyperrectangles because its hyperrectangle is extended so as to cover the error data and the way of maintaining the feature weight vector. We proposed the OH (Optimizing Hyperrectangle) algorithm which use the feature weight vectors and the ED(Exemplar Densimeter) to optimize resulting Hyperrectangles. The proposed algorithm, as well as the EACH, required only approximately 40% of memory space that is needed in k-NN classifier, and showed a superior classification performance to the EACH. Also, by reducing the number of stored patterns, it showed excellent results in terms of classification when we compare it to the k-NN and the EACH.

Hybrid Learning Architectures for Advanced Data Mining:An Application to Binary Classification for Fraud Management (개선된 데이터마이닝을 위한 혼합 학습구조의 제시)

  • Kim, Steven H.;Shin, Sung-Woo
    • Journal of Information Technology Application
    • /
    • v.1
    • /
    • pp.173-211
    • /
    • 1999
  • The task of classification permeates all walks of life, from business and economics to science and public policy. In this context, nonlinear techniques from artificial intelligence have often proven to be more effective than the methods of classical statistics. The objective of knowledge discovery and data mining is to support decision making through the effective use of information. The automated approach to knowledge discovery is especially useful when dealing with large data sets or complex relationships. For many applications, automated software may find subtle patterns which escape the notice of manual analysis, or whose complexity exceeds the cognitive capabilities of humans. This paper explores the utility of a collaborative learning approach involving integrated models in the preprocessing and postprocessing stages. For instance, a genetic algorithm effects feature-weight optimization in a preprocessing module. Moreover, an inductive tree, artificial neural network (ANN), and k-nearest neighbor (kNN) techniques serve as postprocessing modules. More specifically, the postprocessors act as second0order classifiers which determine the best first-order classifier on a case-by-case basis. In addition to the second-order models, a voting scheme is investigated as a simple, but efficient, postprocessing model. The first-order models consist of statistical and machine learning models such as logistic regression (logit), multivariate discriminant analysis (MDA), ANN, and kNN. The genetic algorithm, inductive decision tree, and voting scheme act as kernel modules for collaborative learning. These ideas are explored against the background of a practical application relating to financial fraud management which exemplifies a binary classification problem.

  • PDF

Off-line PD Model Classification of Traction Motor Stator Coil Using BP

  • Park Seong-Hee;Jang Dong-Uk;Kang Seong-Hwa;Lim Kee-Joe
    • KIEE International Transactions on Electrophysics and Applications
    • /
    • v.5C no.6
    • /
    • pp.223-227
    • /
    • 2005
  • Insulation failure of traction motor stator coil depends on the continuous stress imposed on it and knowing its insulation condition is an issue of significance for proper safety operation. In this paper, application of the NN (Neural Network) as a scheme of the off-line PD (partial discharge) diagnosis method that occurs at the stator coil of a traction motor was studied. For PD data acquisition, three defective models were made; internal void discharge model, slot discharge model and surface discharge model. PD data for recognition were acquired from a PD detector. Statistical distributions and parameters were calculated to perform recognition between model discharge sources. These statistical distribution parameters are applied to classify PD sources by the NN with a good recognition rate on the discharge sources.

Research on Fault Diagnosis of Wind Power Generator Blade Based on SC-SMOTE and kNN

  • Peng, Cheng;Chen, Qing;Zhang, Longxin;Wan, Lanjun;Yuan, Xinpan
    • Journal of Information Processing Systems
    • /
    • v.16 no.4
    • /
    • pp.870-881
    • /
    • 2020
  • Because SCADA monitoring data of wind turbines are large and fast changing, the unbalanced proportion of data in various working conditions makes it difficult to process fault feature data. The existing methods mainly introduce new and non-repeating instances by interpolating adjacent minority samples. In order to overcome the shortcomings of these methods which does not consider boundary conditions in balancing data, an improved over-sampling balancing algorithm SC-SMOTE (safe circle synthetic minority oversampling technology) is proposed to optimize data sets. Then, for the balanced data sets, a fault diagnosis method based on improved k-nearest neighbors (kNN) classification for wind turbine blade icing is adopted. Compared with the SMOTE algorithm, the experimental results show that the method is effective in the diagnosis of fan blade icing fault and improves the accuracy of diagnosis.

HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research

  • Kim, Jin-Suk;Choe, Ho-Seop;You, Beom-Jong;Seo, Jeong-Hyun;Lee, Suk-Hoon;Ra, Dong-Yul
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.3
    • /
    • pp.165-180
    • /
    • 2009
  • The HKIB, or Hankookilbo, test collections are two archives of Korean newswire stories manually categorized with semi-hierarchical or hierarchical category taxonomies. The base newswire stories were made available by the Hankook Ilbo (The Korea Daily) for research purposes. At first, Chungnam National University and KISTI collaborated to manually tag 40,075 news stories with categories by semi-hierarchical and balanced three-level classification scheme, where each news story has only one level-3 category (single-labeling). We refer to this original data set as HKIB-40075 test collection. And then Yonsei University and KISTI collaborated to select 20,000 newswire stories from the HKIB-40075 test collection, to rearrange the classification scheme to be fully hierarchical but unbalanced, and to assign one or more categories to each news story (multi-labeling). We refer to this modified data set as HKIB-20000 test collection. We benchmark a k-NN categorization algorithm both on HKIB-20000 and on HKIB-40075, illustrating properties of the collections, providing baseline results for future studies, and suggesting new directions for further research on Korean text categorization problem.

RECOGNIZING SIX EMOTIONAL STATES USING SPEECH SIGNALS

  • Kang, Bong-Seok;Han, Chul-Hee;Youn, Dae-Hee;Lee, Chungyong
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2000.04a
    • /
    • pp.366-369
    • /
    • 2000
  • This paper examines three algorithms to recognize speaker's emotion using the speech signals. Target emotions are happiness, sadness, anger, fear, boredom and neutral state. MLB(Maximum-Likeligood Bayes), NN(Nearest Neighbor) and HMM (Hidden Markov Model) algorithms are used as the pattern matching techniques. In all cases, pitch and energy are used as the features. The feature vectors for MLB and NN are composed of pitch mean, pitch standard deviation, energy mean, energy standard deviation, etc. For HMM, vectors of delta pitch with delta-delta pitch and delta energy with delta-delta energy are used. We recorded a corpus of emotional speech data and performed the subjective evaluation for the data. The subjective recognition result was 56% and was compared with the classifiers' recognition rates. MLB, NN, and HMM classifiers achieved recognition rates of 68.9%, 69.3% and 89.1% respectively, for the speaker dependent, and context-independent classification.

  • PDF

Improving Weighted k Nearest Neighbor Classification Through The Analytic Hierarchy Process Aiding

  • Park, Cheol-Soo;Ingoo Han
    • Proceedings of the Korea Database Society Conference
    • /
    • 1999.06a
    • /
    • pp.187-194
    • /
    • 1999
  • Case-Based Reasoning(CBR) systems support ill structured decision-making. The measure of the success of a CBR system depends on its ability to retrieve the most relevant previous cases in support of the solution of a new case. One of the methodologies widely used in existing CBR systems to retrieve previous cases is that of the Nearest Neighbor(NN) matching function. The NN matching function is based on assumptions of the independence of attributes in previous case and the availability of rules and procedures for matching.(omitted)

  • PDF

A Study on the Performance Evaluation of Machine Learning for Predicting the Number of Movie Audiences (영화 관객 수 예측을 위한 기계학습 기법의 성능 평가 연구)

  • Jeong, Chan-Mi;Min, Daiki
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.2
    • /
    • pp.49-63
    • /
    • 2020
  • The accurate prediction of box office in the early stage is crucial for film industry to make better managerial decision. With aims to improve the prediction performance, the purpose of this paper is to evaluate the use of machine learning methods. We tested both classification and regression based methods including k-NN, SVM and Random Forest. We first evaluate input variables, which show that reputation-related information generated during the first two-week period after release is significant. Prediction test results show that regression based methods provides lower prediction error, and Random Forest particularly outperforms other machine learning methods. Regression based method has better prediction power when films have small box office earnings. On the other hand, classification based method works better for predicting large box office earnings.

NN Saturation and FL Deadzone Compensation of Robot Systems (로봇 시스템의 신경망 포화 및 퍼지 데드존 보상)

  • Jang, Jun-Oh
    • Proceedings of the KIEE Conference
    • /
    • 2008.10b
    • /
    • pp.187-192
    • /
    • 2008
  • A saturation and deadzone compensator is designed for robot systems using fuzzy logic (FL) and neural network (NN). The classification property of FL system and the function approximation ability of the NN make them the natural candidate for the rejection of errors induced by the saturation and deadzone. The tuning algorithms are given for the fuzzy logic parameters and the NN weights, so that the saturation and deadzone compensation scheme becomes adaptive, guaranteeing small tracking errors and bounded parameter estimates. Formal nonlinear stability proofs are given to show that the tracking error is small. The NN saturation and FL deadzone compensator is simulated on a robot system to show its efficacy.

  • PDF