• Title/Summary/Keyword: k-NN search algorithm

Search Result 22, Processing Time 0.036 seconds

A Density-Based K-Nearest Neighbors Search Method

  • Jang I. S.;Min K.W.;Choi W.S
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.260-262
    • /
    • 2004
  • Spatial database system provides many query types and most of them are required frequent disk I/O and much CPU time. k-NN search is to find k-th closest object from the query point and up to now, several k-NN search methods have been proposed. Among these, MINMAX distance method has an aim not to visit unnecessary node by applying pruning technique. But this method access more disk than necessary while pruning unnecessary node. In this paper, we propose new k-NN search algorithm based on density of object. With this method, we predict the radius to be expected to contain k-NN object using density of data set and search those objects within this radius and then adjust radius if failed. Experimental results show that this method outperforms the previous MINMAX distance method. This algorithm visit fewer disks than MINMAX method by the factor of maximum $22\%\;and\;average\;6\%.$

  • PDF

Fast k-NN based Malware Analysis in a Massive Malware Environment

  • Hwang, Jun-ho;Kwak, Jin;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6145-6158
    • /
    • 2019
  • It is a challenge for the current security industry to respond to a large number of malicious codes distributed indiscriminately as well as intelligent APT attacks. As a result, studies using machine learning algorithms are being conducted as proactive prevention rather than post processing. The k-NN algorithm is widely used because it is intuitive and suitable for handling malicious code as unstructured data. In addition, in the malicious code analysis domain, the k-NN algorithm is easy to classify malicious codes based on previously analyzed malicious codes. For example, it is possible to classify malicious code families or analyze malicious code variants through similarity analysis with existing malicious codes. However, the main disadvantage of the k-NN algorithm is that the search time increases as the learning data increases. We propose a fast k-NN algorithm which improves the computation speed problem while taking the value of the k-NN algorithm. In the test environment, the k-NN algorithm was able to perform with only the comparison of the average of similarity of 19.71 times for 6.25 million malicious codes. Considering the way the algorithm works, Fast k-NN algorithm can also be used to search all data that can be vectorized as well as malware and SSDEEP. In the future, it is expected that if the k-NN approach is needed, and the central node can be effectively selected for clustering of large amount of data in various environments, it will be possible to design a sophisticated machine learning based system.

k-NN Join Based on LSH in Big Data Environment

  • Ji, Jiaqi;Chung, Yeongjee
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.2
    • /
    • pp.99-105
    • /
    • 2018
  • k-Nearest neighbor join (k-NN Join) is a computationally intensive algorithm that is designed to find k-nearest neighbors from a dataset S for every object in another dataset R. Most related studies on k-NN Join are based on single-computer operations. As the data dimensions and data volume increase, running the k-NN Join algorithm on a single computer cannot generate results quickly. To solve this scalability problem, we introduce the locality-sensitive hashing (LSH) k-NN Join algorithm implemented in Spark, an approach for high-dimensional big data. LSH is used to map similar data onto the same bucket, which can reduce the data search scope. In order to achieve parallel implementation of the algorithm on multiple computers, the Spark framework is used to accelerate the computation of distances between objects in a cluster. Results show that our proposed approach is fast and accurate for high-dimensional and big data.

An Improved Genetic Algorithm for Fast Face Detection Using Neural Network as Classifier

  • Sugisaka, Masanori;Fan, Xinjian
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.1034-1038
    • /
    • 2005
  • This paper presents a novel method to speed up neural network (NN) based face detection systems. NN-based face detection can be viewed as a classification and search problem. The proposed method formulates the search problem as an integer nonlinear optimization problem (INLP) and develops an improved genetic algorithm (IGA) to solve it. Each individual in the IGA represents a subwindow in an input image. The subwindows are evaluated by how well they match a NN-based face filter. A face is indicated when the filter response of the best particle is above a given threshold. Experimental results show that the proposed method leads to a speedup of 83 on $320{\times}240$ images compared to the traditional exhaustive search method.

  • PDF

k-NN Query Processing Algorithm based on the Matrix of Shortest Distances between Border-point of Voronoi Diagram (보로노이 다이어그램의 경계지점 최소거리 행렬 기반 k-최근접점 탐색 알고리즘)

  • Um, Jung-Ho;Chang, Jae-Woo
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.1
    • /
    • pp.105-114
    • /
    • 2009
  • Recently, location-based services which provides k nearest POIs, e.g., gas stations, restaurants and banks, are essential such applications as telematics, ITS(Intelligent Transport Systems) and kiosk. For this, the Voronoi Diagram k-NN(Nearest Neighbor) search algorithm has been proposed. It retrieves k-NNs by using a file storing pre-computed network distances of POIs in Voronoi diagram. However, this algorithm causes the cost problem when expanding a Voronoi diagram. Therefore, in this paper, we propose an algorithm which generates a matrix of the shortest distance between border points of a Voronoi diagram. The shortest distance is measured each border point to all of the rest border points of a Voronoi Diagram. To retrieve desired k nearest POIs, we also propose a k-NN search algorithm using the matrix of the shortest distance. The proposed algorithms can m inim ize the cost of expanding the Voronoi diagram by accessing the pre-computed matrix of the shortest distances between border points. In addition, we show that the proposed algorithm has better performance in terms of retrieval time, compared with existing works.

  • PDF

SOMk-NN Search Algorithm for Content-Based Retrieval (내용기반 검색을 위한 SOMk-NN탐색 알고리즘)

  • O, Gun-Seok;Kim, Pan-Gu
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.358-366
    • /
    • 2002
  • Feature-based similarity retrieval become an important research issue in image database systems. The features of image data are useful to discrimination of images. In this paper, we propose the high speed k-Nearest Neighbor search algorithm based on Self-Organizing Maps. Self-Organizing Maps(SOM) provides a mapping from high dimensional feature vectors onto a two-dimensional space and generates a topological feature map. A topological feature map preserves the mutual relations (similarities) in feature spaces of input data, and clusters mutually similar feature vectors in a neighboring nodes. Therefore each node of the topological feature map holds a node vector and similar images that is closest to each node vector. We implemented a k-NN search for similar image classification as to (1) access to topological feature map, and (2) apply to pruning strategy of high speed search. We experiment on the performance of our algorithm using color feature vectors extracted from images. Promising results have been obtained in experiments.

A Density-based k-Nearest Neighbors Query Method (밀도 기반의 k-최근접 질의 처리)

  • Jang, In-Sung;Han, Eun-Young;Cho, Dae-Soo
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.6 no.4
    • /
    • pp.59-70
    • /
    • 2003
  • Spatial data base system provides many query types and most of them are required frequent disk I/O and much CPU time. k-NN search is to find k-th closest object from the query point and up to now, several k-NN search methods have been proposed. Among these, MINMAX distance method has an aim not to access unnecessary node by adapting pruning technique. But this method accesses more disks than necessary while pruning unnecessary nodes. In this paper, we propose new k-NN search algorithm based on density of object. With this method, we predict the radius to be expected to contain k-NN objects using density of data set and search those objects within this radius and then adjust radius if failed. Experimental results show that this method outperforms the previous MINMAX distance method. This algorithm visit less disks than MINMAX method by the factor of maximum 22% and average 7%.

  • PDF

Feature Selection for Multiple K-Nearest Neighbor classifiers using GAVaPS (GAVaPS를 이용한 다수 K-Nearest Neighbor classifier들의 Feature 선택)

  • Lee, Hee-Sung;Lee, Jae-Hun;Kim, Eun-Tai
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.871-875
    • /
    • 2008
  • This paper deals with the feature selection for multiple k-nearest neighbor (k-NN) classifiers using Genetic Algorithm with Varying reputation Size (GAVaPS). Because we use multiple k-NN classifiers, the feature selection problem for them is vary hard and has large search region. To solve this problem, we employ the GAVaPS which outperforms comparison with simple genetic algorithm (SGA). Further, we propose the efficient combining method for multiple k-NN classifiers using GAVaPS. Experiments are performed to demonstrate the efficiency of the proposed method.

kNN Query Processing Algorithm based on the Encrypted Index for Hiding Data Access Patterns (데이터 접근 패턴 은닉을 지원하는 암호화 인덱스 기반 kNN 질의처리 알고리즘)

  • Kim, Hyeong-Il;Kim, Hyeong-Jin;Shin, Youngsung;Chang, Jae-woo
    • Journal of KIISE
    • /
    • v.43 no.12
    • /
    • pp.1437-1457
    • /
    • 2016
  • In outsourced databases, the cloud provides an authorized user with querying services on the outsourced database. However, sensitive data, such as financial or medical records, should be encrypted before being outsourced to the cloud. Meanwhile, k-Nearest Neighbor (kNN) query is the typical query type which is widely used in many fields and the result of the kNN query is closely related to the interest and preference of the user. Therefore, studies on secure kNN query processing algorithms that preserve both the data privacy and the query privacy have been proposed. However, existing algorithms either suffer from high computation cost or leak data access patterns because retrieved index nodes and query results are disclosed. To solve these problems, in this paper we propose a new kNN query processing algorithm on the encrypted database. Our algorithm preserves both data privacy and query privacy. It also hides data access patterns while supporting efficient query processing. To achieve this, we devise an encrypted index search scheme which can perform data filtering without revealing data access patterns. Through the performance analysis, we verify that our proposed algorithm shows better performance than the existing algorithms in terms of query processing times.

A Method of Highspeed Similarity Retrieval based on Self-Organizing Maps (자기 조직화 맵 기반 유사화상 검색의 고속화 수법)

  • Oh, Kun-Seok;Yang, Sung-Ki;Bae, Sang-Hyun;Kim, Pan-Koo
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.515-522
    • /
    • 2001
  • Feature-based similarity retrieval become an important research issue in image database systems. The features of image data are useful to discrimination of images. In this paper, we propose the highspeed k-Nearest Neighbor search algorithm based on Self-Organizing Maps. Self-Organizing Map(SOM) provides a mapping from high dimensional feature vectors onto a two-dimensional space. A topological feature map preserves the mutual relations (similarity) in feature spaces of input data, and clusters mutually similar feature vectors in a neighboring nodes. Each node of the topological feature map holds a node vector and similar images that is closest to each node vector. We implemented about k-NN search for similar image classification as to (1) access to topological feature map, and (2) apply to pruning strategy of high speed search. We experiment on the performance of our algorithm using color feature vectors extracted from images. Promising results have been obtained in experiments.

  • PDF