CS-트리 : 고차원 데이터의 유사성 검색을 위한 셀-기반 시그니쳐 색인 구조

CS-Tree : Cell-based Signature Index Structure for Similarity Search in High-Dimensional Data

  • 송광택 (전북대학교 대학원 컴퓨터공학과) ;
  • 장재우 (전북대학교 컴퓨터공학과, 전기전자회로합성연구소)
  • 발행 : 2001.08.01

초록

최근 고차원 색인 구조들이 멀티미디어 데이터베이스, 데이터 웨어하우징과 같은 데이터베이스 응용에서 유사성 검색을 위해 요구된다. 본 논문에서는 고차원 특징벡터에 대한 효율적인 저장과 검색을 지원하는 셀-기반 시그니쳐 트리(CS-트리)를 제안한다. 제안하는 CS-트리는 고차원 특징 벡터 공간을 셀로써 분할하여 하나의 특징 벡터를 그에 해당되는 셀의 시그니쳐로 표현한다. 특징 벡터 대신 셀의 시그니쳐를 사용함으로써 트리의 깊이를 줄이고, 그 결과 효율적인 검색 성능을 달성한다. 또한 셀에 기반하여 탐색 공간을 효율적으로 줄이는 유사성 검색 알고리즘을 제시한다. 마지막으로 우수한 고차원 색인 기법으로 알려져 있는 X-트리와 삽입시간, k-최근접 질의에 대한 검색 시간 그리고 부가저장 공간 측면에서 성능 비교를 수행한다. 성능비교 결과 CS-트리가 검색 성능에서 우수함을 보인다.

Recently, high-dimensional index structures have been required for similarity search in such database applications s multimedia database and data warehousing. In this paper, we propose a new cell-based signature tree, called CS-tree, which supports efficient storage and retrieval on high-dimensional feature vectors. The proposed CS-tree partitions a high-dimensional feature space into a group of cells and represents a feature vector as its corresponding cell signature. By using cell signatures rather than real feature vectors, it is possible to reduce the height of our CS-tree, leading to efficient retrieval performance. In addition, we present a similarity search algorithm for efficiently pruning the search space based on cells. Finally, we compare the performance of our CS-tree with that of the X-tree being considered as an efficient high-dimensional index structure, in terms of insertion time, retrieval time for a k-nearest neighbor query, and storage overhead. It is shown from experimental results that our CS-tree is better on retrieval performance than the X-tree.

키워드

참고문헌

  1. C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, and R. Barber, 'Effecient and effective querying by image content,' Journal of Intelligent Information Systems, Vol.3, No.3-4, pp.231-262, July. 1994 https://doi.org/10.1007/BF00962238
  2. J. M. Hellerstein, J. F. Naughton, and A. Pfeffer, 'Generalized search trees for database systems,' In Proc. of the 21st Int. Conf. on VLDB, pp.562-573, Sept. 1995
  3. V. Pestov, 'On the geometry of similarity search : Dimen-sionality curse and concentration of measure,' Technical Report RP-99-01, School of Mathematical and Computing Sciences, Victoria University of Wellington, New Zealand, January. 1999
  4. D. A. White and R. Jain, 'Similarity Indexing with the SS-tree,' In Proc. of the 12th Int. Conf. on Data Engineering, pp.516-523, 1996 https://doi.org/10.1109/ICDE.1996.492202
  5. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, 'The $R^*-tree$: an efficient and robust access method for points and rectangles,' In Proc. of Int. Conf. on ACM SIGMOD, pp.322-331, 1990 https://doi.org/10.1145/93597.98741
  6. N. Katayama and S. Satoh, 'The SR-tree; an index struc-ture for high-dimensional nearest neighbor queries,' In Proc. of Int. Conf. on ACM SIGMOD, pp.369-380, 1997 https://doi.org/10.1145/253260.253347
  7. K.I. Lin, H. Jagadish, and C. Faloutsos, 'The TV-tree ; an index structure for high dimensional data,' VLDB Journal, Vol.3, pp.517-542, 1994 https://doi.org/10.1007/BF01231606
  8. S. Berchtold, D. Keim, and H.-P. Kriegel, 'The X-tree : an index structue for high-dimensional data,' In Proc. of the 22nd Int. Conf. on VLDB, pp.28-39, 1996
  9. N. Roussopoulos, S. Kelley, F. Vincent, 'Nearest neigh-bor queries,' In Proc. of Int. Conf. on ACM SIGMOD, pp.71-79, 1995 https://doi.org/10.1145/223784.223794
  10. S. Arya, et al., 'An optimal algorithm for approximate nearest neighbor searching,' In Proc. ACM-SIAM Sympo-sium on Discrete Algoritms, pp.573-582, 1994
  11. S. Berchtold, C. Bohm, D. Keim, H.-P. Kriegel, 'A cost model for nearest neighbor search in high-dimensional data space,' In Proc. of ACM PODS Symposium on Principles of Database System, pp.78-86, 1997 https://doi.org/10.1145/263661.263671