An Index Structure based on Space Partitions and Adaptive Bit Allocations for Multi-Dimensional Data

다차원 데이타를 위한 공간 분할 및 적응적 비트 할당 기반 색인 구조

  • 복경수 (한국과학기술원 전산학과) ;
  • 김은재 (충북대학교 정보통신공학과) ;
  • 유재수 (충북대학교 전기전자컴퓨터공학부)
  • Published : 2005.10.01

Abstract

In this paper, we propose the index structure based on a vector approximation for efficiently supporting the similarity search of multi-dimensional data. The proposed index structure splits a region with the space partition method and allocates to the split region dynamic bits according to the distribution of data. Therefore, the index structure splits a region to the unoverlapped regions and can reduce the depth of the tree by storing the much region information of child nodes in a internal node. Our index structure represents the child node more exactly and provide the efficient search by representing the region information of the child node relatively using the region information of the parent node. We show that our proposed index structure is better than the existing index structure in various experiments. Experimental results show that our proposed index structure achieves about $40\%$ performance improvements on search performance over the existing method.

본 논문에서는 다차원 데이타의 유사도 검색을 효율적으로 지원하기 위한 벡터 근사 기반의 색인 구조를 제안한다. 제안하는 색인 구조는 공간 분할 방식으로 영역을 분할하고 실제 데이타들이 존재하는 영역에 대해 동적 비트를 할당하여 영역을 표현한다. 따라서, 분할된 영역들 사이에 겹침이 발생하지 않으며 하나의 중간 노드에 많은 영역 정보를 저장할 수 있어 트리의 깊이를 감소시킨다. 또한, 특정 영역에 군집화되어 있는 데이타에 대해서 효과적인 표현 기법을 제공하며 자식 노드의 영역 정보는 부모 노드의 영역 정보를 이용하여 상대적으로 표현함으로써 영역 표현에 대한 정확성을 보장한다. 이를 통해 검색성능 향상을 제공한다. 제안하는 색인 구조의 우수성을 보이기 위해 기존에 제안된 다차원 색인 구조와의 다양한 실험을 통하여 성능의 우수성을 입증한다. 성능 평가 결과를 통해 제안하는 색인 구조가 기존 색인 구조보다 $40\%$정도 검색 성능이 향상됨을 증명한다.

Keywords

References

  1. S. Berchtold, D. A. Keim and H. P. Kriegel, 'The X-Tree : An Index Structure for High-Dimensional Data,' Proc. the 22nd International Conference on Very Large Data Bases, pp.28-39, 1996
  2. V. Gaede, O. Gunther, 'Multidimensional Access Methods,' ACM Computing Surveys, 30(2), pp.170-231, 1998 https://doi.org/10.1145/280277.280279
  3. K. Chakrabarti and S. Mehrotra, 'The Hybrid Tree : An Index Structure for High Dimensional Feature Spaces,' Proc. the 15th International Conference on Data Engineering, pp.440-447, 1999 https://doi.org/10.1109/ICDE.1999.754960
  4. J. T. Robinson, 'The K-D-B-Tree : A Search Structure For Large Multidimensional Dynamic Indexes,' Proc. the ACM SIGMOD International Conference on Management of Data, pp.10-18, 1981
  5. A. Henrich, H. W. Six and P. Widmayer, 'The LSD tree : Spatial Access to Multidimensional Point and Nonpoint Objects,' Proc. the Fifteenth International Conference on Very Large Data Bases, pp.45-53, 1989
  6. A. Henrich, 'The LSDh-Tree : An Access Structure for Feature Vectors,' Proc. the Fourteenth International Conference on Data Engineering, pp.362-369, 1998
  7. O. Procopiuc, P. K. Agarwal, L. Arge and J. S. Vitter, 'Bkd-Tree : A Dynamic Scalable kd-Tree,' Proc. 8th International Symposium on Spatial and Temporal Databases, pp.46-65, 2003 https://doi.org/10.1007/b11839
  8. B. Yu, R. Orlandic, T. Bailey and J. Somavaram, 'KDBKD-Tree : A Compact KDB-Tree Structure for Indexing Multidimensional Data,' Proc. International Conference on Information Technology : Coding and Computing, pp.676-680, 2003 https://doi.org/10.1109/ITCC.2003.1197612
  9. A. Guttman, 'R-Trees: A Dynamic Index Structure for Spatial Searching,' In Proceedings of ACM SIGMOD International Conference on Management of Data, pp.47-57, Jun., 1984 https://doi.org/10.1145/602259.602266
  10. N. Beckmann et al., 'The $R^{*}$-tree : An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l, Conf. on Management of Data, ACM SIGMOD, pp.322-331, May, 1990 https://doi.org/10.1145/93597.98741
  11. K. I. Lin, H. V. Jagadish and C. Faloutsos, 'The TV-tree: An Index Structure for High-Dimensional Data,' VLDB Journal, Vol.3, No.4, pp.517-542, 1994 https://doi.org/10.1007/BF01231606
  12. R. Weber, H. J. Schek and S. Blott, 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,' Proc. 24rd International Conference on Very Large Data Bases, pp.194-205, 1998
  13. J. An, Y. P. Chen, Q. Xu, and X. Zhou, 'A New Indexing Method for High Dimensional Dataset,' Proc. International Conference on Database Systems for Advanced Applications, pp.385-397, 2005 https://doi.org/10.1007/b107189
  14. E. Tuncel, H. Ferhatosmanoglu, and K. Rose, 'VQ-index: an index structure for similarity searching in multimedia databases,' Pro. the 10th ACM International Conference on Multimedia, pp.543-552, 2002 https://doi.org/10.1145/641007.641117
  15. J. S. Lee, J. S. Yoo, S. H. Lee, and M. J. Kim, 'An Efficient Content-Based High-Dimensional Index Structure for Image Data,' ETRI Journal, Vol.22, No.2, pp.32-43, 2002 https://doi.org/10.4218/etrij.00.0100.0204
  16. J. Goldstein, R. Ramakrishnan and U. Shaft, 'Compressing Relations and Indexes,' Proc. the Fourteenth International Conference on Data Engineering, pp.370-379, 1998 https://doi.org/10.1109/ICDE.1998.655800
  17. K. T. Song, H. J. Nam and J. W. Chang, 'A Cell-based Index Structure for Similarity Search in High-dimensional Feature Spaces,' Proc. the 2001 ACM Symposium on Applied Computing, pp.264-268, 2001 https://doi.org/10.1145/372202.372338
  18. Y. Sakurai, M. Yoshikawa, S. Uemura and H. Kojima, 'The A-tree : An Index Structure for High-Dimensional Spaces Using Relative Approximation,' Proc. 26th International Conference on Very Large Data Bases, pp.516-526, 2000
  19. Y. Sakurai, M. Yoshikawa, S. Uemura and H. Kojima, 'Spatial indexing of High-dimensional Data based on Relative Approximation,' VLDB Journal, Vol.11, No.2, pp.93-108, 2002 https://doi.org/10.1007/s00778-002-0066-9
  20. G. H. Cha and C. W. Chung, 'The GC-Tree : A High-Dimensional Index Structure for Similarity Search in Image Databases IEEE Transactions on Multimedia, Vol.4, No.2, pp.235-247, 2002 https://doi.org/10.1109/TMM.2002.1017736
  21. N. Roussopoulos, S. Kelley, and F. Vincent. Nearest Neighbor Queries. In Proceedings of the ACM SIGMOD Conference, May, 1995 https://doi.org/10.1145/223784.223794
  22. J. Kuan, and P. Lewis, 'Fast k nearest neighbour search for R-tree family,' Proc. International Conference on Information, Communications and Signal Processing, pp.924-928, 1997 https://doi.org/10.1109/ICICS.1997.652114
  23. R. Orlandic and B. Yu, 'Implementing KDB-Trees to Support High-Dimensional Data,' Proc. International Database Engineering and Applications Symposium, pp.58-67, 2001 https://doi.org/10.1109/IDEAS.2001.938071