• Title/Summary/Keyword: 최근접 데이터 선택

Search Result 28, Processing Time 0.025 seconds

Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis (최근접 이웃 규칙 기반 프로토타입 선택과 편의-분산을 이용한 성능 평가)

  • Shim, Se-Yong;Hwang, Doo-Sung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.73-81
    • /
    • 2015
  • The paper proposes a prototype selection method and evaluates the generalization performance of standard algorithms and prototype based classification learning. The proposed prototype classifier defines multidimensional spheres with variable radii within class areas and generates a small set of training data. The nearest-neighbor classifier uses the new training set for predicting the class of test data. By decomposing bias and variance of the mean expected error value, we compare the generalization errors of k-nearest neighbor, Bayesian classifier, prototype selection using fixed radius and the proposed prototype selection method. In experiments, the bias-variance changing trends of the proposed prototype classifier are similar to those of nearest neighbor classifiers with all training data and the prototype selection rates are under 27.0% on average.

Performance Improvement of Nearest-neighbor Classification Learning through Prototype Selections (프로토타입 선택을 이용한 최근접 분류 학습의 성능 개선)

  • Hwang, Doo-Sung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.2
    • /
    • pp.53-60
    • /
    • 2012
  • Nearest-neighbor classification predicts the class of an input data with the most frequent class among the near training data of the input data. Even though nearest-neighbor classification doesn't have a training stage, all of the training data are necessary in a predictive stage and the generalization performance depends on the quality of training data. Therefore, as the training data size increase, a nearest-neighbor classification requires the large amount of memory and the large computation time in prediction. In this paper, we propose a prototype selection algorithm that predicts the class of test data with the new set of prototypes which are near-boundary training data. Based on Tomek links and distance metric, the proposed algorithm selects boundary data and decides whether the selected data is added to the set of prototypes by considering classes and distance relationships. In the experiments, the number of prototypes is much smaller than the size of original training data and we takes advantages of storage reduction and fast prediction in a nearest-neighbor classification.

Prototype-Based Classification Using Class Hyperspheres (클래스 초월구를 이용한 프로토타입 기반 분류)

  • Lee, Hyun-Jong;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.10
    • /
    • pp.483-488
    • /
    • 2016
  • In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data with hyperspheres, and a hypersphere must cover the data from the same class. The radius of a hypersphere is computed by the mid point of the two distances to the farthest same class point and the nearest other class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that cover all the training data. The proposed prototype selection method is designed by a greedy algorithm and applicable to process a large-scale training set in parallel. The prediction rule is the nearest-neighbor rule and the new training data is the set of prototypes. In experiments, the generalization performance of the proposed method is superior to existing methods.

Prototype based Classification by Generating Multidimensional Spheres per Class Area (클래스 영역의 다차원 구 생성에 의한 프로토타입 기반 분류)

  • Shim, Seyong;Hwang, Doosung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.21-28
    • /
    • 2015
  • In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data into spheres within which the data exist from the same class. Prototypes are the center of spheres and their radii are computed by the mid-point of the two distances to the farthest same class point and the nearest another class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that include all the training data. The proposed prototype selection method is based on a greedy algorithm that is applicable to the training data per class. The complexity of the proposed method is not complicated and the possibility of its parallel implementation is high. The prototype-based classification learning takes up the set of prototypes and predicts the class of test data by the nearest neighbor rule. In experiments, the generalization performance of our prototype classifier is superior to those of the nearest neighbor, Bayes classifier, and another prototype classifier.

The Nearest Neighbor Query for Trajectory of Moving Objects (이동 객체 궤적에 대한 최근접 질의)

  • Choi, Bo-Yoon;Chi, Jeong-Hee;Kim, Sang-Ho;Ryu, Keun-Ho
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 2003.11a
    • /
    • pp.169-174
    • /
    • 2003
  • 이동 객체에 대한 기존 최근접(nearest neighbor, NN) 질의 처리 기법들은 질의 궤적에 대해 연속적으로 정확하게, 질의와 가장 가까운 위치를 유지하면서 움직이는 최근접 객체를 선택할 수 있는 충분한 기준을 가지고 있지 못하다. 이 논문은 질의 객체와 데이터 객체가 모두 이동 객체인 경우에 가장 적합하게 사용되는 객체 궤적에 대한 연속적인 질의 처리를 통해 정확한 결과를 얻을 수 있는 새로운 최근접 질의 처리 기법, 연속 궤적 최근접 질의(CTNN, continuous trajectory nearest neighbor query)를 제안한다. 우리는 두 가지 Approximate, Exact CTNN 기법을 제안하며 이들은 모두 항해 시스템, 교통 통제 시스템, 물류정보 시스템 등 각종 위치 기반 서비스(L8S: location based services) 상에서 다양하게 사용될 수 있다. 이들은 이동 객체 궤적이 미리 알려져 있는 경우 그리고 질의와 데이터 객체가 모두 이동 객체인 경우에 가장 적합하다.

  • PDF

The Method of Nearest Neighbor Search for Trajectory of Moving Objects (이동 객체의 궤적에 대한 최근접 탐색 기법)

  • Choi, Bo-Yoon;Shin, Hyun-Ho;Chi, Jeong-Hee;Kim, Sang-Ho;Ryu, Keun-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05c
    • /
    • pp.1595-1598
    • /
    • 2003
  • 이 논문은 질의와 검색 대상 객체가 모두 이동 객체인 경우, 즉 3 차원 폴리라인(polyline) 형태의 경로를 가지는 객체들 간의 연속(continuous) 최근접 질의 처리에 유용한 기법을 제안한다. 질의경로를 따라 객체를 탐색해가면서 질의에 대한 최근접 정보가 변하는 시점을 찾는 것이 목적인 연속 최근접 질의 처리는 전체 질의 경로에 올바른 최근접 정보 리스트를 제공하지만, 기존의 방법들은 검색 대상 객체가 동적인 경우에 적용되기에는 시간에 따라 움직이는 객체의 위치변화를 처리하지 못하고, 질의 시점과 대상 객체간의 시점을 연관시키기 어렵다는 문제점들을 가지고 있다. 따라서 이 논문에서는 데이터 객체들의 궤적 정보는 STR 트리로 유지하고, 질의 경로 세그먼트와 질의의 시간 인터벌에 포함되는 데이터 객체 세그먼트 모두에 대해 추출시간(sampling time) 선택, 스윕라인(sweep line) 적용, 위치 추정 함수 이용 등의 단계를 처리함으로써, 이 문제를 해결하고 질의 경로 전체에 정확한 최근접 객체 정보 리스트를 제공한다. 제안된 기법은 물류정보시스템, 국방정보시스템, 기상, 교통 등 시공간 이동 객체의 질의를 다루는 시스템에 적용할 수 있다.

  • PDF

k-Nearest Neighbor-Based Approach for the Estimation of Mutual Information (상호정보 추정을 위한 k-최근접이웃 기반방법)

  • Cha, Woon-Ock;Huh, Moon-Yul
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.6
    • /
    • pp.977-991
    • /
    • 2008
  • This study is about the k-nearest neighbor-based approach for the estimation of mutual information when the type of target variable is categorical and continuous. The results of Monte-Carlo simulation and experiments with real-world data show that k=1 is preferable. In practical application with real world data, our study shows that jittering and bootstrapping is needed.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

Classifying Cancer Using Partially Correlated Genes Selected by Forward Selection Method (전진선택법에 의해 선택된 부분 상관관계의 유전자들을 이용한 암 분류)

  • 유시호;조성배
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.3
    • /
    • pp.83-92
    • /
    • 2004
  • Gene expression profile is numerical data of gene expression level from organism measured on the microarray. Generally, each specific tissue indicates different expression levels in related genes, so that we can classify cancer with gene expression profile. Because not all the genes are related to classification, it is needed to select related genes that is called feature selection. This paper proposes a new gene selection method using forward selection method in regression analysis. This method reduces redundant information in the selected genes to have more efficient classification. We used k-nearest neighbor as a classifier and tested with colon cancer dataset. The results are compared with Pearson's coefficient and Spearman's coefficient methods and the proposed method showed better performance. It showed 90.3% accuracy in classification. The method also successfully applied to lymphoma cancer dataset.

A Hashing Method Using PCA-based Clustering (PCA 기반 군집화를 이용한 해슁 기법)

  • Park, Cheong Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.6
    • /
    • pp.215-218
    • /
    • 2014
  • In hashing-based methods for approximate nearest neighbors(ANN) search, by mapping data points to k-bit binary codes, nearest neighbors are searched in a binary embedding space. In this paper, we present a hashing method using a PCA-based clustering method, Principal Direction Divisive Partitioning(PDDP). PDDP is a clustering method which repeatedly partitions the cluster with the largest variance into two clusters by using the first principal direction. The proposed hashing method utilizes the first principal direction as a projective direction for binary coding. Experimental results demonstrate that the proposed method is competitive compared with other hashing methods.