• Title/Summary/Keyword: 최근접 이웃 분류기

Search Result 35, Processing Time 0.027 seconds

Cancer Diagnosis System using Genetic Algorithm and Multi-boosting Classifier (Genetic Algorithm과 다중부스팅 Classifier를 이용한 암진단 시스템)

  • Ohn, Syng-Yup;Chi, Seung-Do
    • Journal of the Korea Society for Simulation
    • /
    • v.20 no.2
    • /
    • pp.77-85
    • /
    • 2011
  • It is believed that the anomalies or diseases of human organs are identified by the analysis of the patterns. This paper proposes a new classification technique for the identification of cancer disease using the proteome patterns obtained from two-dimensional polyacrylamide gel electrophoresis(2-D PAGE). In the new classification method, three different classification methods such as support vector machine(SVM), multi-layer perceptron(MLP) and k-nearest neighbor(k-NN) are extended by multi-boosting method in an array of subclassifiers and the results of each subclassifier are merged by ensemble method. Genetic algorithm was applied to obtain optimal feature set in each subclassifier. We applied our method to empirical data set from cancer research and the method showed the better accuracy and more stable performance than single classifier.

Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus (k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류)

  • Bang Sun-Iee;Yang Jae-Dong;Yang Hyung-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1204-1217
    • /
    • 2004
  • Numerous statistical and machine learning techniques have been studied for automatic text classification. However, because they train the classifiers using only feature vectors of documents, ambiguity between two possible categories significantly degrades precision of classification. To remedy the drawback, we propose a new method which incorporates relationship information of categories into extant classifiers. In this paper, we first perform the document classification using the k-NN classifier which is generally known for relatively good performance in spite of its simplicity. We employ the relationship information from an object-based thesaurus to reduce the ambiguity. By referencing various relationships in the thesaurus corresponding to the structured categories, the precision of k-NN classification is drastically improved, removing the ambiguity. Experiment result shows that this method achieves the precision up to 13.86% over the k-NN classification, preserving its recall.

Acoustic Emission Source Characterization and Fracture Behavior of Finite-width Plate with a Circular Hole Defect using Artificial Neural Network (인공신경회로망을 이용한 원공결함을 갖는 유한 폭 판재의 음향방출 음원특성과 파괴거동에 관한 연구)

  • Rhee, Zhang-Kyu;Woo, Chang-Ki
    • Transactions of the Korean Society of Machine Tool Engineers
    • /
    • v.18 no.2
    • /
    • pp.170-177
    • /
    • 2009
  • The objective of this study is to evaluate an acoustic emission (AE) source characterization and fracture behavior of the SM45C steel by using back-propagation neural network (BPN). In previous research Ref. [8] about k-nearest neighbor classifier (k-NNC) continuity, we used K-means clustering method as an unsupervised learning method for obtaining multi-variate AE main data sets, such as AE counts, energy, amplitude, risetime, duration and counts to peak. Similarly, we applied k-NNC and BPN as a supervised learning method for obtaining multi-variate AE working data sets. According to the error of convergence for determinant criterion Wilk's ${\lambda}$, heuristic criteria D&B(Rij) and Tou values are discussed. As a result, in k-NNC before fracture signal is detected or when fracture signal is detected, showed that produce some empty classes in BPN. And we confirmed that could save trouble in AE signal processing if suitable error of convergence or acceptable encoding error give to BPN.

Classifying Cancer Using Partially Correlated Genes Selected by Forward Selection Method (전진선택법에 의해 선택된 부분 상관관계의 유전자들을 이용한 암 분류)

  • 유시호;조성배
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.3
    • /
    • pp.83-92
    • /
    • 2004
  • Gene expression profile is numerical data of gene expression level from organism measured on the microarray. Generally, each specific tissue indicates different expression levels in related genes, so that we can classify cancer with gene expression profile. Because not all the genes are related to classification, it is needed to select related genes that is called feature selection. This paper proposes a new gene selection method using forward selection method in regression analysis. This method reduces redundant information in the selected genes to have more efficient classification. We used k-nearest neighbor as a classifier and tested with colon cancer dataset. The results are compared with Pearson's coefficient and Spearman's coefficient methods and the proposed method showed better performance. It showed 90.3% accuracy in classification. The method also successfully applied to lymphoma cancer dataset.

Variational Bayesian multinomial probit model with Gaussian process classification on mice protein expression level data (가우시안 과정 분류에 대한 변분 베이지안 다항 프로빗 모형: 쥐 단백질 발현 데이터에의 적용)

  • Donghyun Son;Beom Seuk Hwang
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.115-127
    • /
    • 2023
  • Multinomial probit model is a popular model for multiclass classification and choice model. Markov chain Monte Carlo (MCMC) method is widely used for estimating multinomial probit model, but its computational cost is high. However, it is well known that variational Bayesian approximation is more computationally efficient than MCMC, because it uses subsets of samples. In this study, we describe multinomial probit model with Gaussian process classification and how to employ variational Bayesian approximation on the model. This study also compares the results of variational Bayesian multinomial probit model to the results of naive Bayes, K-nearest neighbors and support vector machine for the UCI mice protein expression level data.

Bayesian Network-Based Analysis on Clinical Data of Infertility Patients (베이지안 망에 기초한 불임환자 임상데이터의 분석)

  • Jung, Yong-Gyu;Kim, In-Cheol
    • The KIPS Transactions:PartB
    • /
    • v.9B no.5
    • /
    • pp.625-634
    • /
    • 2002
  • In this paper, we conducted various experiments with Bayesian networks in order to analyze clinical data of infertility patients. With these experiments, we tried to find out inter-dependencies among important factors playing the key role in clinical pregnancy, and to compare 3 different kinds of Bayesian network classifiers (including NBN, BAN, GBN) in terms of classification performance. As a result of experiments, we found the fact that the most important features playing the key role in clinical pregnancy (Clin) are indication (IND), stimulation, age of female partner (FA), number of ova (ICT), and use of Wallace (ETM), and then discovered inter-dependencies among these features. And we made sure that BAN and GBN, which are more general Bayesian network classifiers permitting inter-dependencies among features, show higher performance than NBN. By comparing Bayesian classifiers based on probabilistic representation and reasoning with other classifiers such as decision trees and k-nearest neighbor methods, we found that the former show higher performance than the latter due to inherent characteristics of clinical domain. finally, we suggested a feature reduction method in which all features except only some ones within Markov blanket of the class node are removed, and investigated by experiments whether such feature reduction can increase the performance of Bayesian classifiers.

Performance comparison of machine learning classification methods for decision of disc cutter replacement of shield TBM (쉴드 TBM 디스크 커터 교체 유무 판단을 위한 머신러닝 분류기법 성능 비교)

  • Kim, Yunhee;Hong, Jiyeon;Kim, Bumjoo
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.22 no.5
    • /
    • pp.575-589
    • /
    • 2020
  • In recent years, Shield TBM construction has been continuously increasing in domestic tunnels. The main excavation tool in the shield TBM construction is a disc cutter which naturally wears during the excavation process and significantly degrades the excavation efficiency. Therefore, it is important to know the appropriate time of the disc cutter replacement. In this study, it is proposed a predictive model that can determine yes/no of disc cutter replacement using machine learning algorithm. To do this, the shield TBM machine data which is highly correlated to the disc cutter wears and the disc cutter replacement from the shield TBM field which is already constructed are used as the input data in the model. Also, the algorithms used in the study were the support vector machine, k-nearest neighbor algorithm, and decision tree algorithm are all classification methods used in machine learning. In order to construct an optimal predictive model and to evaluate the performance of the model, the classification performance evaluation index was compared and analyzed.

Facial Local Region Based Deep Convolutional Neural Networks for Automated Face Recognition (자동 얼굴인식을 위한 얼굴 지역 영역 기반 다중 심층 합성곱 신경망 시스템)

  • Kim, Kyeong-Tae;Choi, Jae-Young
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.4
    • /
    • pp.47-55
    • /
    • 2018
  • In this paper, we propose a novel face recognition(FR) method that takes advantage of combining weighted deep local features extracted from multiple Deep Convolutional Neural Networks(DCNNs) learned with a set of facial local regions. In the proposed method, the so-called weighed deep local features are generated from multiple DCNNs each trained with a particular face local region and the corresponding weight represents the importance of local region in terms of improving FR performance. Our weighted deep local features are applied to Joint Bayesian metric learning in conjunction with Nearest Neighbor(NN) Classifier for the purpose of FR. Systematic and comparative experiments show that our proposed method is robust to variations in pose, illumination, and expression. Also, experimental results demonstrate that our method is feasible for improving face recognition performance.

Optimal Band Selection Techniques for Hyperspectral Image Pixel Classification using Pooling Operations & PSNR (초분광 이미지 픽셀 분류를 위한 풀링 연산과 PSNR을 이용한 최적 밴드 선택 기법)

  • Chang, Duhyeuk;Jung, Byeonghyeon;Heo, Junyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.5
    • /
    • pp.141-147
    • /
    • 2021
  • In this paper, in order to improve the utilization of hyperspectral large-capacity data feature information by reducing complex computations by dimension reduction of neural network inputs in embedded systems, the band selection algorithm is applied in each subset. Among feature extraction and feature selection techniques, the feature selection aim to improve the optimal number of bands suitable for datasets, regardless of wavelength range, and the time and performance, more than others algorithms. Through this experiment, although the time required was reduced by 1/3 to 1/9 times compared to the others band selection technique, meaningful results were improved by more than 4% in terms of performance through the K-neighbor classifier. Although it is difficult to utilize real-time hyperspectral data analysis now, it has confirmed the possibility of improvement.

Performance Comparison of Automatic Classification Using Word Embeddings of Book Titles (단행본 서명의 단어 임베딩에 따른 자동분류의 성능 비교)

  • Yong-Gu Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.4
    • /
    • pp.307-327
    • /
    • 2023
  • To analyze the impact of word embedding on book titles, this study utilized word embedding models (Word2vec, GloVe, fastText) to generate embedding vectors from book titles. These vectors were then used as classification features for automatic classification. The classifier utilized the k-nearest neighbors (kNN) algorithm, with the categories for automatic classification based on the DDC (Dewey Decimal Classification) main class 300 assigned by libraries to books. In the automatic classification experiment applying word embeddings to book titles, the Skip-gram architectures of Word2vec and fastText showed better results in the automatic classification performance of the kNN classifier compared to the TF-IDF features. In the optimization of various hyperparameters across the three models, the Skip-gram architecture of the fastText model demonstrated overall good performance. Specifically, better performance was observed when using hierarchical softmax and larger embedding dimensions as hyperparameters in this model. From a performance perspective, fastText can generate embeddings for substrings or subwords using the n-gram method, which has been shown to increase recall. The Skip-gram architecture of the Word2vec model generally showed good performance at low dimensions(size 300) and with small sizes of negative sampling (3 or 5).