• Title/Summary/Keyword: K-Nearest Neighbor 알고리즘

Search Result 204, Processing Time 0.027 seconds

Grid-based Index Generation and k-nearest-neighbor Join Query-processing Algorithm using MapReduce (맵리듀스를 이용한 그리드 기반 인덱스 생성 및 k-NN 조인 질의 처리 알고리즘)

  • Jang, Miyoung;Chang, Jae Woo
    • Journal of KIISE
    • /
    • v.42 no.11
    • /
    • pp.1303-1313
    • /
    • 2015
  • MapReduce provides high levels of system scalability and fault tolerance for large-size data processing. A MapReduce-based k-nearest-neighbor(k-NN) join algorithm seeks to produce the k nearest-neighbors of each point of a dataset from another dataset. The algorithm has been considered important in bigdata analysis. However, the existing k-NN join query-processing algorithm suffers from a high index-construction cost that makes it unsuitable for the processing of bigdata. To solve the corresponding problems, we propose a new grid-based, k-NN join query-processing algorithm. Our algorithm retrieves only the neighboring data from a query cell and sends them to each MapReduce task, making it possible to improve the overhead data transmission and computation. Our performance analysis shows that our algorithm outperforms the existing scheme by up to seven-fold in terms of the query-processing time, while also achieving high extent of query-result accuracy.

Fast Nearest Neighbor Search on General Size Images (일반적인 그림 데이터에서의 빠른 최인접 검색)

  • Hwang, Yoon-Ho;Ahn, Hee-Kap
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.417-418
    • /
    • 2012
  • 우리는 유클리드 공간에서 그림 데이터의 평균화 분산을 이용한 비선형 변환을 이용하여, 그림 데이터에서 최인접검색(nearest neighbor search)을 빠르게 할 수 있는 알고리즘을 제시한다. 기존의 평균과 분산을 이용한 최인접검색 알고리즘은 고차원 그림 데이터를 그보다 낮은 차원의 유클리드 공간의 데이터로 변환하고, 낮은 차원에서의 비교를 통해 최인접검색의 해가 될 수 없는 그림 데이터를 빠르게 제외하는 방법을 사용한다. 우리는 기존의 방법이 균일하게 나누어지는 크기의 그림 데이터에서만 가능하던 기존방법에 대한 해결책을 이 논문에서 제시하여 일반적인 그림 데이터에서도 평균과 분산을 이용하는 최인접검색을 가능하게 한다.

Acoustic Emission Source Classification of Finite-width Plate with a Circular Hole Defect using k-Nearest Neighbor Algorithm (k-최근접 이웃 알고리즘을 이용한 원공결함을 갖는 유한 폭 판재의 음향방출 음원분류에 대한 연구)

  • Rhee, Zhang-Kyu;Oh, Jin-Soo
    • Journal of the Korea Safety Management & Science
    • /
    • v.11 no.1
    • /
    • pp.27-33
    • /
    • 2009
  • A study of fracture to material is getting interest in nuclear and aerospace industry as a viewpoint of safety. Acoustic emission (AE) is a non-destructive testing and new technology to evaluate safety on structures. In previous research continuously, all tensile tests on the pre-defected coupons were performed using the universal testing machine, which machine crosshead was move at a constant speed of 5mm/min. This study is to evaluate an AE source characterization of SM45C steel by using k-nearest neighbor classifier, k-NNC. For this, we used K-means clustering as an unsupervised learning method for obtained multi -variate AE main data sets, and we applied k-NNC as a supervised learning pattern recognition algorithm for obtained multi-variate AE working data sets. As a result, the criteria of Wilk's $\lambda$, D&B(Rij) & Tou are discussed.

Distributed Grid Scheme using S-GRID for Location Information Management of a Large Number of Moving Objects (대용량 이동객체의 위치정보 관리를 위한 S-GRID를 이용한 분산 그리드 기법)

  • Kim, Young-Chang;Kim, Young-Jin;Chang, Jae-Woo
    • Journal of Korea Spatial Information System Society
    • /
    • v.10 no.4
    • /
    • pp.11-19
    • /
    • 2008
  • Recently, advances in mobile devices and wireless communication technologies require research on various location-based services. As a result, many studies on processing k-nearest neighbor query, which is most im portant one in location-based services, have been done. Most of existing studies use pre-computation technique to improve retrieval performance by computing network distance between POIs and nodes beforehand in spatial networks. However, they have a drawback that they can not deal with effectively the update of POIs to be searched. In this paper, we propose a distributed grid scheme using S-GRID to overcome the disadvantage of the existing work as well as to manage the location information of a large number of moving objects in efficient way. In addition, we describe a k-nearest neighbor(k-NN) query processing algorithm for the proposed distributed grid scheme. Finally, we show the efficiency of our distributed grid scheme by making a performance comparison between the k-NN query processing algorithm of our scheme and that of S-GRID.

  • PDF

Location Estimation Method Employing Fingerprinting Scheme based on K-Nearest Neighbor Algorithm under WLAN Environment of Ship (선박의 WLAN 환경에서 K-최근접 이웃 알고리즘 기반 Fingerprinting 방식을 적용한 위치 추정 방법)

  • Kim, Beom-Mu;Jeong, Min A;Lee, Seong Ro
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.10
    • /
    • pp.2530-2536
    • /
    • 2014
  • Many studies have been made on location estimation under indoor environments which GPS signals do not reach, and, as a result, a variety of estimation methods have been proposed. In this paper, we deeply consider a problem of location estimation in a ship with a multi-story structure, and investigate a location estimation method using the fingerprint scheme based on the K-Nearest Neighbor algorithm. A reliable DB is constructed by measuring 100 received signals at each of 39 RPs in order to employ the fingerprint scheme, and, based on the DB, a simulation to estimate the location of a randomly-positioned terminal is performed. The simulation result confirms that the performance of location estimation by the fingerprint scheme is quite satisfactory.

Estimation of Aboveground Forest Biomass Carbon Stock by Satellite Remote Sensing - A Comparison between k-Nearest Neighbor and Regression Tree Analysis - (위성영상을 활용한 지상부 산림바이오매스 탄소량 추정 - k-Nearest Neighbor 및 Regression Tree Analysis 방법의 비교 분석 -)

  • Jung, Jaehoon;Nguyen, Hieu Cong;Heo, Joon;Kim, Kyoungmin;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.30 no.5
    • /
    • pp.651-664
    • /
    • 2014
  • Recently, the demands of accurate forest carbon stock estimation and mapping are increasing in Korea. This study investigates the feasibility of two methods, k-Nearest Neighbor (kNN) and Regression Tree Analysis (RTA), for carbon stock estimation of pilot areas, Gongju and Sejong cities. The 3rd and 5th ~ 6th NFI data were collected together with Landsat TM acquired in 1992, 2010 and Aster in 2009. Additionally, various vegetation indices and tasseled cap transformation were created for better estimation. Comparison between two methods was conducted by evaluating carbon statistics and visualizing carbon distributions on the map. The comparisons indicated clear strengths and weaknesses of two methods: kNN method has produced more consistent estimates regardless of types of satellite images, but its carbon maps were somewhat smooth to represent the dense carbon areas, particularly for Aster 2009 case. Meanwhile, RTA method has produced better performance on mean bias results and representation of dense carbon areas, but they were more subject to types of satellite images, representing high variability in spatial patterns of carbon maps. Finally, in order to identify the increases in carbon stock of study area, we created the difference maps by subtracting the 1992 carbon map from the 2009 and 2010 carbon maps. Consequently, it was found that the total carbon stock in Gongju and Sejong cities was drastically increased during that period.

Malware Detection Method using Opcode and windows API Calls (Opcode와 Windows API를 사용한 멀웨어 탐지)

  • Ahn, Tae-Hyun;Oh, Sang-Jin;Kwon, Young-Man
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.6
    • /
    • pp.11-17
    • /
    • 2017
  • We proposed malware detection method, which use the feature vector that consist of Opcode(operation code) and Windows API Calls extracted from executable files. And, we implemented our feature vector and measured the performance of it by using Bernoulli Naïve Bayes and K-Nearest Neighbor classifier. In experimental result, when using the K-NN classifier with the proposed method, we obtain 95.21% malware detection accuracy. It was better than existing methods using only either Opcode or Windows API Calls.

Prototype based Classification by Generating Multidimensional Spheres per Class Area (클래스 영역의 다차원 구 생성에 의한 프로토타입 기반 분류)

  • Shim, Seyong;Hwang, Doosung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.21-28
    • /
    • 2015
  • In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data into spheres within which the data exist from the same class. Prototypes are the center of spheres and their radii are computed by the mid-point of the two distances to the farthest same class point and the nearest another class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that include all the training data. The proposed prototype selection method is based on a greedy algorithm that is applicable to the training data per class. The complexity of the proposed method is not complicated and the possibility of its parallel implementation is high. The prototype-based classification learning takes up the set of prototypes and predicts the class of test data by the nearest neighbor rule. In experiments, the generalization performance of our prototype classifier is superior to those of the nearest neighbor, Bayes classifier, and another prototype classifier.

A Batch Processing Algorithm for Moving k-Nearest Neighbor Queries in Dynamic Spatial Networks

  • Cho, Hyung-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.63-74
    • /
    • 2021
  • Location-based services (LBSs) are expected to process a large number of spatial queries, such as shortest path and k-nearest neighbor queries that arrive simultaneously at peak periods. Deploying more LBS servers to process these simultaneous spatial queries is a potential solution. However, this significantly increases service operating costs. Recently, batch processing solutions have been proposed to process a set of queries using shareable computation. In this study, we investigate the problem of batch processing moving k-nearest neighbor (MkNN) queries in dynamic spatial networks, where the travel time of each road segment changes frequently based on the traffic conditions. LBS servers based on one-query-at-a-time processing often fail to process simultaneous MkNN queries because of the significant number of redundant computations. We aim to improve the efficiency algorithmically by processing MkNN queries in batches and reusing sharable computations. Extensive evaluation using real-world roadmaps shows the superiority of our solution compared with state-of-the-art methods.

Data Classification Using the Robbins-Monro Stochastic Approximation Algorithm (로빈스-몬로 확률 근사 알고리즘을 이용한 데이터 분류)

  • Lee, Jae-Kook;Ko, Chun-Taek;Choi, Won-Ho
    • Proceedings of the KIPE Conference
    • /
    • 2005.07a
    • /
    • pp.624-627
    • /
    • 2005
  • This paper presents a new data classification method using the Robbins Monro stochastic approximation algorithm k-nearest neighbor and distribution analysis. To cluster the data set, we decide the centroid of the test data set using k-nearest neighbor algorithm and the local area of data set. To decide each class of the data, the Robbins Monro stochastic approximation algorithm is applied to the decided local area of the data set. To evaluate the performance, the proposed classification method is compared to the conventional fuzzy c-mean method and k-nn algorithm. The simulation results show that the proposed method is more accurate than fuzzy c-mean method, k-nn algorithm and discriminant analysis algorithm.

  • PDF