• Title/Summary/Keyword: k-nearest neighbor method (kNN)

Search Result 95, Processing Time 0.033 seconds

Density Adaptive Grid-based k-Nearest Neighbor Regression Model for Large Dataset (대용량 자료에 대한 밀도 적응 격자 기반의 k-NN 회귀 모형)

  • Liu, Yiqi;Uk, Jung
    • Journal of Korean Society for Quality Management
    • /
    • v.49 no.2
    • /
    • pp.201-211
    • /
    • 2021
  • Purpose: This paper proposes a density adaptive grid algorithm for the k-NN regression model to reduce the computation time for large datasets without significant prediction accuracy loss. Methods: The proposed method utilizes the concept of the grid with centroid to reduce the number of reference data points so that the required computation time is much reduced. Since the grid generation process in this paper is based on quantiles of original variables, the proposed method can fully reflect the density information of the original reference data set. Results: Using five real-life datasets, the proposed k-NN regression model is compared with the original k-NN regression model. The results show that the proposed density adaptive grid-based k-NN regression model is superior to the original k-NN regression in terms of data reduction ratio and time efficiency ratio, and provides a similar prediction error if the appropriate number of grids is selected. Conclusion: The proposed density adaptive grid algorithm for the k-NN regression model is a simple and effective model which can help avoid a large loss of prediction accuracy with faster execution speed and fewer memory requirements during the testing phase.

Data Classification Using the Robbins-Monro Stochastic Approximation Algorithm (로빈스-몬로 확률 근사 알고리즘을 이용한 데이터 분류)

  • Lee, Jae-Kook;Ko, Chun-Taek;Choi, Won-Ho
    • Proceedings of the KIPE Conference
    • /
    • 2005.07a
    • /
    • pp.624-627
    • /
    • 2005
  • This paper presents a new data classification method using the Robbins Monro stochastic approximation algorithm k-nearest neighbor and distribution analysis. To cluster the data set, we decide the centroid of the test data set using k-nearest neighbor algorithm and the local area of data set. To decide each class of the data, the Robbins Monro stochastic approximation algorithm is applied to the decided local area of the data set. To evaluate the performance, the proposed classification method is compared to the conventional fuzzy c-mean method and k-nn algorithm. The simulation results show that the proposed method is more accurate than fuzzy c-mean method, k-nn algorithm and discriminant analysis algorithm.

  • PDF

an Automatic Calculation Method of Feature Weights in k Nearest Neighbor Algorithms (kNN 알고리즘에서의 속성 가중치 자동계산 방법)

  • Lee, Kang-Il;Lee, Chang-Hwan
    • Annual Conference of KIPS
    • /
    • 2005.05a
    • /
    • pp.423-426
    • /
    • 2005
  • 기억기반학습의 일종인 최근접 이웃(k nearest neighbor) 알고리즘은 과거의 데이터들 중에서 새로운 개체와 유사한 데이터들을 이용해서 새로운 개체의 목적 값을 예측하는 것이다. 이 경우 속성의 가중치를 계산하는 방식은 kNN의 성능을 결정하는 중요한 요소가 된다. 본 논문에서는 기존의 다른 이론들과 달리 정보이론에서 사용되는 엔트로피 개념을 이용해서 속성의 가중치를 이론적이고, 효과적으로 계산하는 새로운 방법을 제시하고자한다. 제안된 방법은 각 속성이 목적속성에 제공하는 정보의 양에 따라 가중치를 자동으로 계산하여 kNN의 성능을 향상시킨다. 마지막으로 이러한 방식의 성능을 다수의 실험을 통해 비교하였다.

  • PDF

A K-Nearest Neighbor Algorithm for Categorical Sequence Data (범주형 시퀀스 데이터의 K-Nearest Neighbor알고리즘)

  • Oh Seung-Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.2 s.34
    • /
    • pp.215-221
    • /
    • 2005
  • TRecently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. In this Paper, we study how to classify these sequence datasets. There are several kinds techniques for data classification such as decision tree induction, Bayesian classification and K-NN etc. In our approach, we use a K-NN algorithm for classifying sequences. In addition, we propose a new similarity measure to compute the similarity between two sequences and an efficient method for measuring similarity.

  • PDF

A Efficient Method of Extracting Split Points for Continuous k Nearest Neighbor Search Without Order (무순위 연속 k 최근접 객체 탐색을 위한 효율적인 분할점 추출기법)

  • Kim, Jin-Deog
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.927-930
    • /
    • 2010
  • Recently, continuous k-nearest neighbor query(CkNN) which is defined as a query to find the nearest points of interest to all the points on a given path is widely used in the LBS(Location Based Service) and ITS(Intelligent Transportation System) applications. It is necessary to acquire results quickly in the above applications and be applicable to spatial network databases. This paper proposes a new method to search nearest POIs(Point Of Interest) for moving query objects on the spatial networks. The method produces a set of split points and their corresponding k-POIs as results. There is no order between the POIs. The analysis show that the proposed method outperforms the existing methods.

  • PDF

Pattern Classification Methods for Keystroke Identification (키스트로크 인식을 위한 패턴분류 방법)

  • Cho Tai-Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.5
    • /
    • pp.956-961
    • /
    • 2006
  • Keystroke time intervals can be a discriminating feature in the verification and identification of computer users. This paper presents a comparison result obtained using several classification methods including k-NN (k-Nearest Neighbor), back-propagation neural networks, and Bayesian classification for keystroke identification. Performance of k-NN classification was best with small data samples available per user, while Bayesian classification was the most superior to others with large data samples per user. Thus, for web-based on-line identification of users, it seems to be appropriate to selectively use either k-NN or Bayesian method according to the number of keystroke samples accumulated by each user.

Estimation of Aboveground Forest Biomass Carbon Stock by Satellite Remote Sensing - A Comparison between k-Nearest Neighbor and Regression Tree Analysis - (위성영상을 활용한 지상부 산림바이오매스 탄소량 추정 - k-Nearest Neighbor 및 Regression Tree Analysis 방법의 비교 분석 -)

  • Jung, Jaehoon;Nguyen, Hieu Cong;Heo, Joon;Kim, Kyoungmin;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.30 no.5
    • /
    • pp.651-664
    • /
    • 2014
  • Recently, the demands of accurate forest carbon stock estimation and mapping are increasing in Korea. This study investigates the feasibility of two methods, k-Nearest Neighbor (kNN) and Regression Tree Analysis (RTA), for carbon stock estimation of pilot areas, Gongju and Sejong cities. The 3rd and 5th ~ 6th NFI data were collected together with Landsat TM acquired in 1992, 2010 and Aster in 2009. Additionally, various vegetation indices and tasseled cap transformation were created for better estimation. Comparison between two methods was conducted by evaluating carbon statistics and visualizing carbon distributions on the map. The comparisons indicated clear strengths and weaknesses of two methods: kNN method has produced more consistent estimates regardless of types of satellite images, but its carbon maps were somewhat smooth to represent the dense carbon areas, particularly for Aster 2009 case. Meanwhile, RTA method has produced better performance on mean bias results and representation of dense carbon areas, but they were more subject to types of satellite images, representing high variability in spatial patterns of carbon maps. Finally, in order to identify the increases in carbon stock of study area, we created the difference maps by subtracting the 1992 carbon map from the 2009 and 2010 carbon maps. Consequently, it was found that the total carbon stock in Gongju and Sejong cities was drastically increased during that period.

Client-Side Caching for Nearest Neighbor Queries

  • Park Kwangjin;Hwang Chong-Sun
    • Journal of Communications and Networks
    • /
    • v.7 no.4
    • /
    • pp.417-428
    • /
    • 2005
  • The Voronoi diagram (VD) is the most suitable mechanism to find the nearest neighbor (NN) for mobile clients. In NN query processing, it is important to reduce the query response time, since a late query response may contain out-of-date information. In this paper, we study the issue of location dependent information services (LDISs) using a VD. To begin our study, we first introduce a broadcast-based spatial query processing methods designed to support NN query processing. In further sections, we introduce a generic method for location-dependent sequential prefetching and caching. The performance of this scheme is studied in different simulated environments. The core contribution of this research resides in our analytical proof and experimental results.

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

Nearest Neighbor Query Processing using the Direction of Mobile Object (모바일 객체의 방향성을 고려한 최근접 질의 처리)

  • Lee, Eung-Jae;Jung, Young-Jin;Choi, Hyon-Mi;Ryu, Keun-Ho;Lee, Seong-Ho
    • Journal of Korea Spatial Information System Society
    • /
    • v.6 no.1 s.11
    • /
    • pp.59-71
    • /
    • 2004
  • Nearest neighbor query retrieves nearest located target objects, and is very frequently used in mobile environment. In this paper we propose a novel neatest neighbor query processing technique that is able to retrieve nearest located target object from the user who is continuously moving with a direction. The proposed method retrieves objects using the direction property of moving object as well as euclidean distance to target object. The proposed method is applicable to traffic information system, travel information system, and location-based recommendation system which require retrieving nearest located object.

  • PDF