• Title/Summary/Keyword: Nearest neighbor (NN) algorithm

Search Result 106, Processing Time 0.03 seconds

Fuzzy Kernel K-Nearest Neighbor Algorithm for Image Segmentation (영상 분할을 위한 퍼지 커널 K-nearest neighbor 알고리즘)

  • Choi Byung-In;Rhee Chung-Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.7
    • /
    • pp.828-833
    • /
    • 2005
  • Kernel methods have shown to improve the performance of conventional linear classification algorithms for complex distributed data sets, as mapping the data in input space into a higher dimensional feature space(7). In this paper, we propose a fuzzy kernel K-nearest neighbor(fuzzy kernel K-NN) algorithm, which applies the distance measure in feature space based on kernel functions to the fuzzy K-nearest neighbor(fuzzy K-NN) algorithm. In doing so, the proposed algorithm can enhance the Performance of the conventional algorithm, by choosing an appropriate kernel function. Results on several data sets and segmentation results for real images are given to show the validity of our proposed algorithm.

k-Nearest Neighbor Classifier using Local Values of k (지역적 k값을 사용한 k-Nearest Neighbor Classifier)

  • 이상훈;오경환
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.193-195
    • /
    • 2003
  • 본 논문에서는 k-Nearest Neighbor(k-NN) 알고리즘을 최적화하기 위해 지역적으로 다른 k(고려할 neighbor의 개수)를 사용하는 새로운 방법을 제안한다. 인스턴스 공간(instance space)에서 노이즈(noise)의 분포가 지역적(local)으로 다를 경우, 각 지점에서 고려해야 할 최적의 이웃 인스턴스(neighbor)의 수는 해당 지점에서의 국부적인 노이즈 분포에 따라 다르다. 그러나 기존의 방법은 전체 인스턴스 공간에 대해 동일한 k를 사용하기 때문에 이러한 인스턴스 공간의 지역적인 특성을 고려하지 못한다. 따라서 본 논문에서는 지역적으로 분포가 다른 노이즈 문제를 해결하기 위해 인스턴스 공간을 여러 개의 부분으로 나누고, 각 부분에 최적화된 k의 값을 사용하여 kNN을 수행하는 새로운 방법인 Local-k Nearest Neighbor 알고리즘(LkNN Algorithm)을 제안한다. LkNN을 통해 생성된 k의 집합은 인스턴스 공간의 각 부분을 대표하는 값으로, 해당 지역의 인스턴스가 고려해야 할 이웃(neighbor)의 수를 결정지어준다. 제안한 알고리즘에 적합한 데이터의 도메인(domain)과 그것의 향상된 성능은 UCI ML Data Repository 데이터를 사용한 실험을 통해 검증하였다.

  • PDF

Feature Selection for Multiple K-Nearest Neighbor classifiers using GAVaPS (GAVaPS를 이용한 다수 K-Nearest Neighbor classifier들의 Feature 선택)

  • Lee, Hee-Sung;Lee, Jae-Hun;Kim, Eun-Tai
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.871-875
    • /
    • 2008
  • This paper deals with the feature selection for multiple k-nearest neighbor (k-NN) classifiers using Genetic Algorithm with Varying reputation Size (GAVaPS). Because we use multiple k-NN classifiers, the feature selection problem for them is vary hard and has large search region. To solve this problem, we employ the GAVaPS which outperforms comparison with simple genetic algorithm (SGA). Further, we propose the efficient combining method for multiple k-NN classifiers using GAVaPS. Experiments are performed to demonstrate the efficiency of the proposed method.

Calculating Attribute Weights in K-Nearest Neighbor Algorithms using Information Theory (정보이론을 이용한 K-최근접 이웃 알고리즘에서의 속성 가중치 계산)

  • Lee Chang-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.9
    • /
    • pp.920-926
    • /
    • 2005
  • Nearest neighbor algorithms classify an unseen input instance by selecting similar cases and use the discovered membership to make predictions about the unknown features of the input instance. The usefulness of the nearest neighbor algorithms have been demonstrated sufficiently in many real-world domains. In nearest neighbor algorithms, it is an important issue to assign proper weights to the attributes. Therefore, in this paper, we propose a new method which can automatically assigns to each attribute a weight of its importance with respect to the target attribute. The method has been implemented as a computer program and its effectiveness has been tested on a number of machine learning databases publicly available.

Enhancement of Text Classification Method (텍스트 분류 기법의 발전)

  • Shin, Kwang-Seong;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.155-156
    • /
    • 2019
  • Traditional machine learning based emotion analysis methods such as Classification and Regression Tree (CART), Support Vector Machine (SVM), and k-nearest neighbor classification (kNN) are less accurate. In this paper, we propose an improved kNN classification method. Improved methods and data normalization achieve the goal of improving accuracy. Then, three classification algorithms and an improved algorithm were compared based on experimental data.

  • PDF

A Method for k Nearest Neighbor Query of Line Segment in Obstructed Spaces

  • Zhang, Liping;Li, Song;Guo, Yingying;Hao, Xiaohong
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.406-420
    • /
    • 2020
  • In order to make up the deficiencies of the existing research results which cannot effectively deal with the nearest neighbor query based on the line segments in obstacle space, the k nearest neighbor query method of line segment in obstacle space is proposed and the STA_OLkNN algorithm under the circumstance of static obstacle data set is put forward. The query process is divided into two stages, including the filtering process and refining process. In the filtration process, according to the properties of the line segment Voronoi diagram, the corresponding pruning rules are proposed and the filtering algorithm is presented. In the refining process, according to the relationship of the position between the line segments, the corresponding distance expression method is put forward and the final result is obtained by comparing the distance. Theoretical research and experimental results show that the proposed algorithm can effectively deal with the problem of k nearest neighbor query of the line segment in the obstacle environment.

k-NN Join Based on LSH in Big Data Environment

  • Ji, Jiaqi;Chung, Yeongjee
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.2
    • /
    • pp.99-105
    • /
    • 2018
  • k-Nearest neighbor join (k-NN Join) is a computationally intensive algorithm that is designed to find k-nearest neighbors from a dataset S for every object in another dataset R. Most related studies on k-NN Join are based on single-computer operations. As the data dimensions and data volume increase, running the k-NN Join algorithm on a single computer cannot generate results quickly. To solve this scalability problem, we introduce the locality-sensitive hashing (LSH) k-NN Join algorithm implemented in Spark, an approach for high-dimensional big data. LSH is used to map similar data onto the same bucket, which can reduce the data search scope. In order to achieve parallel implementation of the algorithm on multiple computers, the Spark framework is used to accelerate the computation of distances between objects in a cluster. Results show that our proposed approach is fast and accurate for high-dimensional and big data.

Data Classification Using the Robbins-Monro Stochastic Approximation Algorithm (로빈스-몬로 확률 근사 알고리즘을 이용한 데이터 분류)

  • Lee, Jae-Kook;Ko, Chun-Taek;Choi, Won-Ho
    • Proceedings of the KIPE Conference
    • /
    • 2005.07a
    • /
    • pp.624-627
    • /
    • 2005
  • This paper presents a new data classification method using the Robbins Monro stochastic approximation algorithm k-nearest neighbor and distribution analysis. To cluster the data set, we decide the centroid of the test data set using k-nearest neighbor algorithm and the local area of data set. To decide each class of the data, the Robbins Monro stochastic approximation algorithm is applied to the decided local area of the data set. To evaluate the performance, the proposed classification method is compared to the conventional fuzzy c-mean method and k-nn algorithm. The simulation results show that the proposed method is more accurate than fuzzy c-mean method, k-nn algorithm and discriminant analysis algorithm.

  • PDF

A K-Nearest Neighbor Algorithm for Categorical Sequence Data (범주형 시퀀스 데이터의 K-Nearest Neighbor알고리즘)

  • Oh Seung-Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.2 s.34
    • /
    • pp.215-221
    • /
    • 2005
  • TRecently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. In this Paper, we study how to classify these sequence datasets. There are several kinds techniques for data classification such as decision tree induction, Bayesian classification and K-NN etc. In our approach, we use a K-NN algorithm for classifying sequences. In addition, we propose a new similarity measure to compute the similarity between two sequences and an efficient method for measuring similarity.

  • PDF

Density Adaptive Grid-based k-Nearest Neighbor Regression Model for Large Dataset (대용량 자료에 대한 밀도 적응 격자 기반의 k-NN 회귀 모형)

  • Liu, Yiqi;Uk, Jung
    • Journal of Korean Society for Quality Management
    • /
    • v.49 no.2
    • /
    • pp.201-211
    • /
    • 2021
  • Purpose: This paper proposes a density adaptive grid algorithm for the k-NN regression model to reduce the computation time for large datasets without significant prediction accuracy loss. Methods: The proposed method utilizes the concept of the grid with centroid to reduce the number of reference data points so that the required computation time is much reduced. Since the grid generation process in this paper is based on quantiles of original variables, the proposed method can fully reflect the density information of the original reference data set. Results: Using five real-life datasets, the proposed k-NN regression model is compared with the original k-NN regression model. The results show that the proposed density adaptive grid-based k-NN regression model is superior to the original k-NN regression in terms of data reduction ratio and time efficiency ratio, and provides a similar prediction error if the appropriate number of grids is selected. Conclusion: The proposed density adaptive grid algorithm for the k-NN regression model is a simple and effective model which can help avoid a large loss of prediction accuracy with faster execution speed and fewer memory requirements during the testing phase.