Browse > Article
http://dx.doi.org/10.5626/JOK.2015.42.2.242

Efficient Multi-Step k-NN Search Methods Using Multidimensional Indexes in Large Databases  

Lee, Sanghun (Kangwon National Univ.)
Kim, Bum-Soo (Kangwon National Univ.)
Choi, Mi-Jung (Kangwon National Univ.)
Moon, Yang-Sae (Kangwon National Univ.)
Publication Information
Journal of KIISE / v.42, no.2, 2015 , pp. 242-254 More about this Journal
Abstract
In this paper, we address the problem of improving the performance of multi-step k-NN search using multi-dimensional indexes. Due to information loss by lower-dimensional transformations, existing multi-step k-NN search solutions produce a large tolerance (i.e., a large search range), and thus, incur a large number of candidates, which are retrieved by a range query. Those many candidates lead to overwhelming I/O and CPU overheads in the postprocessing step. To overcome this problem, we propose two efficient solutions that improve the search performance by reducing the tolerance of a range query, and accordingly, reducing the number of candidates. First, we propose a tolerance reduction-based (approximate) solution that forcibly decreases the tolerance, which is determined by a k-NN query on the index, by the average ratio of high- and low-dimensional distances. Second, we propose a coefficient control-based (exact) solution that uses c k instead of k in a k-NN query to obtain a tigher tolerance and performs a range query using this tigher tolerance. Experimental results show that the proposed solutions significantly reduce the number of candidates, and accordingly, improve the search performance in comparison with the existing multi-step k-NN solution.
Keywords
multi-step k-NN search; similarity search; high-dimensional objects; low-dimensional transformation;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, "Fast Nearest Neighbor Search in Medical Image Databases," Proc. of the 22nd Int'l Conference on Very Large Data Bases, Bombay, India, pp. 215-226, Sept. 1996.
2 R. Agrawal, C. Faloutsos, and A. Swami, "Efficient Similarity Search in Sequence Databases," Proc. of the 4th Int'l Conf. on Foundations of Data Organization and Algorithms, Chicago, Illinois, pp. 69-84, Oct. 1993.
3 Course of dimensionality. Encyclopedia of Machine Learning, pp. 257-258, Springer, 2010.
4 Y.-S. Moon, K.-Y. Whang, and W.-S. Han, "General Match: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 382-393, Jun. 2002.
5 Y.-S. Moon, K.-Y. Whang, and W.-K. Loh, "Duality-Based Subsequence Matching in Time-Series Databases," Proc. of the 17th Int'l Conf. on Data Engineering, IEEE ICDE, Heidelberg, Germany, pp. 263-272, Apr. 2001.
6 C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, "Fast Subsequence Matching in Time-Series Databases," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Minneapolis, Minnesota, pp. 419-429, May. 1994.
7 Y. Tao, K. Yi, C. Sheng, and P. Kalnis, "Quality and Efficiency in High Dimensional Nearest Neighbor Search," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Providence, Rhode Island, USA, pp. 563-575, Jun./Jul. 2009.
8 T. Seidl and H. P. Kriegel, "Optimal Multi-Step k-Nearest Neighbor Search," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Seattle, Washington, pp. 154-165, Jun. 1998.
9 T. Seidl, Adaptable Similarity Search in 3-D Spatial Database Systems, Herbert Utz Verlag, 1998.
10 S. C. Chapra, Numerical Methods for Engineers, 6th Ed., McGraw-Hill Science, 2010.
11 B. G. Samuel and J. S. Neil, Using SPSS for Windows and Macintosh: Analyzing and Understanding Data, 6th Ed., Pearson College Div, 2010.
12 B.-S. Kim, Y.-S. Moon, and J. Kim, "Noise Control Boundary Image Matching Using Time-Series Moving Average Transform," Journal of KIISE: Database, Vol. 36, No. 4, pp. 327-340, Aug. 2009. (in Korean)   과학기술학회마을
13 Y.-S. Moon, B.-S. Kim, M. S. Kim, and K.-Y. Whang, "Scaling-Invariant Boundary Image Matching Using Time-Series Matching Techniques," Data & Knowledge Engineering, Vol. 69, No. 10, pp. 1022-1042, Oct. 2010.   DOI
14 National climatic data center, [Online]. Available: http://www.ncdc.noaa.gov. (downloaded 2013 Mar. 9)
15 N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Atlantic City, New Jersey, pp. 322-331, May 1990.
16 W.-S. Han, J. Lee, Y.-S. Moon, and H. Jiang, "Ranked Subsequence Matching in Time-Series Databases," Proc. of the 33rd Int'l Conf. on Very Large Data Bases, Vienna, Austria, pp. 423-434, Sept. 2007.
17 G. Roh, J. Roh, S. Hwang, and B. Yi, "Supporting Pattern Matching Queries over Trajectories on Road Networks," IEEE Trans. on Knowledge and Data Engineering, Vol. 23, No. 11, pp. 1753-1758, Nov. 2011.   DOI   ScienceOn
18 Y.-S. Moon and J. Kim, "Efficient Moving Average Transform-Based Subsequence Matching Algorithms in Time-Series Databases," Information Sciences, Vol. 177, No. 23, pp. 5415-5431, Dec. 2007.   DOI
19 Y. Zhu and D. Shasha, "Warping Indexes with Envelope Transforms for Query by Humming," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, San Diego, California, pp. 181-192, Jun. 2003.
20 K.-P. Chan, A. W.-C. Fu, and C. T. Yu, "Harr Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping," IEEE Trans. on Knowledge and Data Engineering, Vol. 15, No. 3, pp. 686-705, Jan./Feb. 2003.   DOI