• Title/Summary/Keyword: NN search

Search Result 56, Processing Time 0.026 seconds

A Distributed High Dimensional Indexing Structure for Content-based Retrieval of Large Scale Data (대용량 데이터의 내용 기반 검색을 위한 분산 고차원 색인 구조)

  • Cho, Hyun-Hwa;Lee, Mi-Young;Kim, Young-Chang;Chang, Jae-Woo;Lee, Kyu-Chul
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.228-237
    • /
    • 2010
  • Although conventional index structures provide various nearest-neighbor search algorithms for high-dimensional data, there are additional requirements to increase search performances as well as to support index scalability for large scale data. To support these requirements, we propose a distributed high-dimensional indexing structure based on cluster systems, called a Distributed Vector Approximation-tree (DVA-tree), which is a two-level structure consisting of a hybrid spill-tree and VA-files. We also describe the algorithms used for constructing the DVA-tree over multiple machines and performing distributed k-nearest neighbors (NN) searches. To evaluate the performance of the DVA-tree, we conduct an experimental study using both real and synthetic datasets. The results show that our proposed method contributes to significant performance advantages over existing index structures on difference kinds of datasets.

Spatial Locality Preservation Metric for Constructing Histogram Sequences (히스토그램 시퀀스 구성을 위한 공간 지역성 보존 척도)

  • Lee, Jeonggon;Kim, Bum-Soo;Moon, Yang-Sae;Choi, Mi-Jung
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.1
    • /
    • pp.79-91
    • /
    • 2013
  • This paper proposes a systematic methodology that could be used to decide which one shows the best performance among space filling curves (SFCs) in applying lower-dimensional transformations to histogram sequences. A histogram sequence represents a time-series converted from an image by the given SFC. Due to the high-dimensionality nature, histogram sequences are very difficult to be stored and searched in their original form. To solve this problem, we generally use lower-dimensional transformations, which produce lower bounds among high dimensional sequences, but the tightness of those lower-bounds is highly affected by the types of SFC. In this paper, we attack a challenging problem of evaluating which SFC shows the better performance when we apply the lower-dimensional transformation to histogram sequences. For this, we first present a concept of spatial locality, which comes from an intuition of "if the entries are adjacent in a histogram sequence, their corresponding cells should also be adjacent in its original image." We also propose spatial locality preservation metric (slpm in short) that quantitatively evaluates spatial locality and present its formal computation method. We then evaluate five SFCs from the perspective of slpm and verify that this evaluation result concurs with the performance evaluation of lower-dimensional transformations in real image matching. Finally, we perform k-NN (k-nearest neighbors) search based on lower-dimensional transformations and validate accuracy of the proposed slpm by providing that the Hilbert-order with the highest slpm also shows the best performance in k-NN search.

A Method for Continuous k Nearest Neighbor Search With Partial Order (부분순위 연속 k 최근접 객체 탐색 기법)

  • Kim, Jin-Deog
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.1
    • /
    • pp.126-132
    • /
    • 2011
  • In the application areas of LBS(Location Based Service) and ITS(Intelligent Transportation System), continuous k-nearest neighbor query(CkNN) which is defined as a query to find the nearest points of interest to all the points on a given path is widely used. It is necessary to acquire results quickly in the above applications and be applicable to spatial network databases. It is also able to cope successfully with frequent updates of POI objects. This paper proposes a new method to search nearest POIs for moving query objects on the spatial networks. The method produces a set of split points and their corresponding k-POIs as results with partial order among k-POIs. The results obtained from experiments with real dataset show that the proposed method outperforms the existing methods. The proposed method achieves very short processing time(15%) compared with the existing method.

An Advanced Scheme for Searching Spatial Objects and Identifying Hidden Objects (숨은 객체 식별을 위한 향상된 공간객체 탐색기법)

  • Kim, Jongwan;Cho, Yang-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.7
    • /
    • pp.1518-1524
    • /
    • 2014
  • In this paper, a new method of spatial query, which is called Surround Search (SuSe) is suggested. This method makes it possible to search for the closest spatial object of interest to the user from a query point. SuSe is differentiated from the existing spatial object query schemes, because it locates the closest spatial object of interest around the query point. While SuSe searches the surroundings, the spatial object is saved on an R-tree, and MINDIST, the distance between the query location and objects, is measured by considering an angle that the existing spatial object query methods have not previously considered. The angle between targeted-search objects is found from a query point that is hidden behind another object in order to distinguish hidden objects from them. The distinct feature of this proposed scheme is that it can search the faraway or hidden objects, in contrast to the existing method. SuSe is able to search for spatial objects more precisely, and users can be confident that this scheme will have superior performance to its predecessor.

k-NN Query Optimization Scheme Based on Machine Learning Using a DNN Model (DNN 모델을 이용한 기계 학습 기반 k-최근접 질의 처리 최적화 기법)

  • We, Ji-Won;Choi, Do-Jin;Lee, Hyeon-Byeong;Lim, Jong-Tae;Lim, Hun-Jin;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.715-725
    • /
    • 2020
  • In this paper, we propose an optimization scheme for a k-Nearest Neighbor(k-NN) query, which finds k objects closest to the query in the high dimensional feature vectors. The k-NN query is converted and processed into a range query based on the range that is likely to contain k data. In this paper, we propose an optimization scheme using DNN model to derive an optimal range that can reduce processing cost and accelerate search speed. The entire system of the proposed scheme is composed of online and offline modules. In the online module, a query is actually processed when it is issued from a client. In the offline module, an optimal range is derived for the query by using the DNN model and is delivered to the online module. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Slab Region Localization for Text Extraction using SIFT Features (문자열 검출을 위한 슬라브 영역 추정)

  • Choi, Jong-Hyun;Choi, Sung-Hoo;Yun, Jong-Pil;Koo, Keun-Hwi;Kim, Sang-Woo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.5
    • /
    • pp.1025-1034
    • /
    • 2009
  • In steel making production line, steel slabs are given a unique identification number. This identification number, Slab management number(SMN), gives information about the use of the slab. Identification of SMN has been done by humans for several years, but this is expensive and not accurate and it has been a heavy burden on the workers. Consequently, to improve efficiency, automatic recognition system is desirable. Generally, a recognition system consists of text localization, text extraction, character segmentation, and character recognition. For exact SMN identification, all the stage of the recognition system must be successful. In particular, the text localization is great important stage and difficult to process. However, because of many text-like patterns in a complex background and high fuzziness between the slab and background, directly extracting text region is difficult to process. If the slab region including SMN can be detected precisely, text localization algorithm will be able to be developed on the more simple method and the processing time of the overall recognition system will be reduced. This paper describes about the slab region localization using SIFT(Scale Invariant Feature Transform) features in the image. First, SIFT algorithm is applied the captured background and slab image, then features of two images are matched by Nearest Neighbor(NN) algorithm. However, correct matching rate can be low when two images are matched. Thus, to remove incorrect match between the features of two images, geometric locations of the matched two feature points are used. Finally, search rectangle method is performed in correct matching features, and then the top boundary and side boundaries of the slab region are determined. For this processes, we can reduce search region for extraction of SMN from the slab image. Most cases, to extract text region, search region is heuristically fixed [1][2]. However, the proposed algorithm is more analytic than other algorithms, because the search region is not fixed and the slab region is searched in the whole image. Experimental results show that the proposed algorithm has a good performance.

Continuous Nearest Neighbor Query Processing on Trajectory of Moving Objects (이동객체의 궤적에 대한 연속 최근접 질의 처리)

  • 지정희;최보윤;김상호;류근호
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.492-504
    • /
    • 2004
  • Recently, as growing of interest for LBS(location-based services) techniques, lots of works on moving objects that continuously change their information over time, have been performed briskly. Also, researches for NN(nearest neighbor) query which has often been used in LBS, are progressed variously However, the results of conventional NN Query processing techniques may be invalidated as the query and data objects move. Therefore, they are usually meaningless in moving object management system such as LBS. To solve these problems, in this paper we propose a new nearest neighbor query processing technique, called CTNN, which is possible to meet accurate and continuous query processing for moving objects. Our techniques include an Approximate CTNN(ACTNN) technique, which has quick response time, and an Exact CTNN(ECTNN) technique, which makes it possible to search nearest neighbor objects accurately. In order to evaluate the proposed techniques, we experimented with various datasets. Experimental results showed that the ECTNN technique has high accuracy, but has a little low performance for response time. Also the ACTNN technique has low accuracy comparing with the ECTNN, but has quick response time The proposed techniques can be applied to navigation system, traffic control system, distribution information system, etc., and specially are most suitable when both data and query are moving objects and when we already know their trajectory.

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

  • Choi, Hyun-Hwa;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.151-160
    • /
    • 2012
  • Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.

Analysis of Morton Code Conversion for 32 Bit IEEE 754 Floating Point Variables (IEEE 754 부동 소수점 32비트 float 변수의 Morton Code 변환 분석)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.165-172
    • /
    • 2016
  • Morton codes play important roles in many parallel GPU applications for the nearest neighbor (NN) search in huge data and queries with its applications growing. This paper discusses and analyzes the meaning of Tero Karras's 32-bit 'unsigned int' Morton code algorithm for three-dimensional spatial information in $[0,1]^3$ and its geometric implications. Based on this, this paper proposes 64-bit 'unsigned long long' version of Morton code and compares the results in both CPU vs. GPU and 32-bit vs. 64-bit versions. The proposed GPU algorithm runs around 1000 times faster than the CPU version.