CS-Tree : Cell-based Signature Index Structure for Similarity Search in High-Dimensional Data

CS-트리 : 고차원 데이터의 유사성 검색을 위한 셀-기반 시그니쳐 색인 구조

  • 송광택 (전북대학교 대학원 컴퓨터공학과) ;
  • 장재우 (전북대학교 컴퓨터공학과, 전기전자회로합성연구소)
  • Published : 2001.08.01

Abstract

Recently, high-dimensional index structures have been required for similarity search in such database applications s multimedia database and data warehousing. In this paper, we propose a new cell-based signature tree, called CS-tree, which supports efficient storage and retrieval on high-dimensional feature vectors. The proposed CS-tree partitions a high-dimensional feature space into a group of cells and represents a feature vector as its corresponding cell signature. By using cell signatures rather than real feature vectors, it is possible to reduce the height of our CS-tree, leading to efficient retrieval performance. In addition, we present a similarity search algorithm for efficiently pruning the search space based on cells. Finally, we compare the performance of our CS-tree with that of the X-tree being considered as an efficient high-dimensional index structure, in terms of insertion time, retrieval time for a k-nearest neighbor query, and storage overhead. It is shown from experimental results that our CS-tree is better on retrieval performance than the X-tree.

최근 고차원 색인 구조들이 멀티미디어 데이터베이스, 데이터 웨어하우징과 같은 데이터베이스 응용에서 유사성 검색을 위해 요구된다. 본 논문에서는 고차원 특징벡터에 대한 효율적인 저장과 검색을 지원하는 셀-기반 시그니쳐 트리(CS-트리)를 제안한다. 제안하는 CS-트리는 고차원 특징 벡터 공간을 셀로써 분할하여 하나의 특징 벡터를 그에 해당되는 셀의 시그니쳐로 표현한다. 특징 벡터 대신 셀의 시그니쳐를 사용함으로써 트리의 깊이를 줄이고, 그 결과 효율적인 검색 성능을 달성한다. 또한 셀에 기반하여 탐색 공간을 효율적으로 줄이는 유사성 검색 알고리즘을 제시한다. 마지막으로 우수한 고차원 색인 기법으로 알려져 있는 X-트리와 삽입시간, k-최근접 질의에 대한 검색 시간 그리고 부가저장 공간 측면에서 성능 비교를 수행한다. 성능비교 결과 CS-트리가 검색 성능에서 우수함을 보인다.

Keywords

References

  1. C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, and R. Barber, 'Effecient and effective querying by image content,' Journal of Intelligent Information Systems, Vol.3, No.3-4, pp.231-262, July. 1994 https://doi.org/10.1007/BF00962238
  2. J. M. Hellerstein, J. F. Naughton, and A. Pfeffer, 'Generalized search trees for database systems,' In Proc. of the 21st Int. Conf. on VLDB, pp.562-573, Sept. 1995
  3. V. Pestov, 'On the geometry of similarity search : Dimen-sionality curse and concentration of measure,' Technical Report RP-99-01, School of Mathematical and Computing Sciences, Victoria University of Wellington, New Zealand, January. 1999
  4. D. A. White and R. Jain, 'Similarity Indexing with the SS-tree,' In Proc. of the 12th Int. Conf. on Data Engineering, pp.516-523, 1996 https://doi.org/10.1109/ICDE.1996.492202
  5. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, 'The $R^*-tree$: an efficient and robust access method for points and rectangles,' In Proc. of Int. Conf. on ACM SIGMOD, pp.322-331, 1990 https://doi.org/10.1145/93597.98741
  6. N. Katayama and S. Satoh, 'The SR-tree; an index struc-ture for high-dimensional nearest neighbor queries,' In Proc. of Int. Conf. on ACM SIGMOD, pp.369-380, 1997 https://doi.org/10.1145/253260.253347
  7. K.I. Lin, H. Jagadish, and C. Faloutsos, 'The TV-tree ; an index structure for high dimensional data,' VLDB Journal, Vol.3, pp.517-542, 1994 https://doi.org/10.1007/BF01231606
  8. S. Berchtold, D. Keim, and H.-P. Kriegel, 'The X-tree : an index structue for high-dimensional data,' In Proc. of the 22nd Int. Conf. on VLDB, pp.28-39, 1996
  9. N. Roussopoulos, S. Kelley, F. Vincent, 'Nearest neigh-bor queries,' In Proc. of Int. Conf. on ACM SIGMOD, pp.71-79, 1995 https://doi.org/10.1145/223784.223794
  10. S. Arya, et al., 'An optimal algorithm for approximate nearest neighbor searching,' In Proc. ACM-SIAM Sympo-sium on Discrete Algoritms, pp.573-582, 1994
  11. S. Berchtold, C. Bohm, D. Keim, H.-P. Kriegel, 'A cost model for nearest neighbor search in high-dimensional data space,' In Proc. of ACM PODS Symposium on Principles of Database System, pp.78-86, 1997 https://doi.org/10.1145/263661.263671