GB-색인: 고차원 데이타의 복합 유사 질의 및 적합성 피드백을 위한 색인 기법

GB-Index: An Indexing Method for High Dimensional Complex Similarity Queries with Relevance Feedback

  • 차광호 (서울산업대학교 컴퓨터공학과)
  • 발행 : 2005.08.01

초록

멀티미디어 데이타베이스와 같은 고차원 응용에서 유사 색인과 검색은 어려운 문제이며, 특히, 다수의 특성을 함께 색인하는 경우에는 더욱 어렵다. 본 논문에서는 고차원 이미지 데이타베이스에서 복합 유사 질의 및 적합성 피드백을 효율적으로 처리하기 위한 새로운 색인 기법인 GB-색인을 제시한다. GB-색인은 각 특성 차원을 독립적으로 처리함으로써 다수의 특성과 다수의 질의 객체를 유연하게 제어한다. 아울러, 비트맵 색인을 통해 데이타베이스에 있는 모든 객체를 비트맵의 집합으로 표현하여 질의를 효율적으로 처리한다. GB-색인의 기술적인 주된 공헌은 다음과 같다: (1) 고차원 데이타를 위한 효율적인 색인, (2) 효율적인 복합 유사 질의 처리, (3) 적합성 피드백을 위한 분리형 질의의 효과적 처리. 실험 결과에 따르면 GB-색인은 순차 탐색 및 VA-파일에 비해 큰 성능 향상을 보였다.

Similarity indexing and searching are well known to be difficult in high-dimensional applications such as multimedia databases. Especially, they become more difficult when multiple features have to be indexed together. In this paper, we propose a novel indexing method called the GB-index that is designed to efficiently handle complex similarity queries as well as relevance feedback in high-dimensional image databases. In order to provide the flexibility in controlling multiple features and query objects, the GB-index treats each dimension independently The efficiency of the GB-index is realized by specialized bitmap indexing that represents all objects in a database as a set of bitmaps. Main contributions of the GB-index are three-fold: (1) It provides a novel way to index high-dimensional data; (2) It efficiently handles complex similarity queries; and (3) Disjunctive queries driven by relevance feedback are efficiently treated. Empirical results demonstrate that the GB-index achieves great speedups over the sequential scan and the VA-file.

키워드

참고문헌

  1. Y. Rui et al., 'Relevance Feedback: A Power Tool for Interactive Content-Based Image Retrieval,' IEEE Trans. Circuits and Video Technology, 8(5), 644-655, 1998 https://doi.org/10.1109/76.718510
  2. K.S. Beyer et al., 'When is nearest neighbor meaningful?' Proc. ICDT, 217-235, 1999 https://doi.org/10.1007/3-540-49257-7_15
  3. A. Hinneburg, C.C. Aggarwal, and D.A. Keim, 'What is the nearest neighbor in high dimensional spaces?,' Proc. VLDB, 506-515, 2000
  4. C.C. Aggarwal and P.S. Yu, 'The IGrid Index: Reversing the Dimensionality Curse for Similarity Indexing in High Dimensional Space,' Proc. ACM SIGKDD, 119-129, 2000
  5. K. Bohm, M. Mlivoncic, H.-J. Schek, and R. Weber, 'Fast Evaluation Techniques for Complex Similarity Queries,' Proc. VLDB, 211-220, 2001
  6. R. Fagin, 'Combining Fuzzy Information from Multiple Systems,' Proc. ACM PODS, 216-226, 1996 https://doi.org/10.1145/237661.237715
  7. G.-H. Cha and C.-W. Chung, 'Object-Oriented Retrieval Mechanism for Semistructured Image Collections,' Proc. ACM Multimedia, 323-332, 1998 https://doi.org/10.1145/290747.290795
  8. R. Weber, H.-J. Schek, and S. Blott, 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,' Proc. VLDB, 194-205, 1998
  9. K. Chakrabarti and S. Mehrotra, 'Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces,' Proc. VLDB, 89-100, 2000
  10. K.V.R. Kanth, D. Agrawal, and A. Singh, A., 'Dimensionality Reduction for Similarity Searching in Dynamic Databases,' Proc. ACM SIGMOD, 166-176, 1998 https://doi.org/10.1145/276304.276320
  11. S. Arya et al, 'An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions,' JACM, 45(6),. 891-923, Nov. 1998 https://doi.org/10.1145/293347.293348
  12. P. Indyk and R. Motwani, 'Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality,' Proc. ACM STOC, 604-613, 1998 https://doi.org/10.1145/276698.276876
  13. E. Kushilevitz, R. Ostrovsky, and Y. Rabani, 'Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces,' Proc. ACM STOC, 614-623, 1998 https://doi.org/10.1145/276698.276877
  14. W.-C. Lai, C. Chang, E. Chang, K.-T. Cheng, and M. Crandell, 'PBIR-MM: Multimodal Image Retrieval and Annotation,' Proc. ACM Multimedia, 421-422, 2002
  15. S. Berchtold, D.A. Keim, and H.-P. Kriegel, 'The X-tree: An index structure for high-dimensional data,' Proc. VLDB, 28-39, 1996
  16. D. White and R. Jain, 'Similarity indexing with the SS-tree,' Proc. ICDE, pp. 516-523, 1996
  17. G.-H. Cha, X. Zhu, D. Petkovic, and C.-W. Chung, 'An Efficient Indexing Method for Nearest Neighbor Searches in High-Dimensional Image Databases,' IEEE Trans. on Multimedia, 4(1), 76-87, March 2002 https://doi.org/10.1109/6046.985556
  18. G. H. Cha and C. W. Chung, 'The GC-Tree : A High-Dimensional Index Structure for Similarity Search in Image Databases IEEE Transactions on Multimedia, Vol.4, No.2, pp.235-247, 2002 https://doi.org/10.1109/TMM.2002.1017736
  19. E. Tuncel, H. Ferhatosmanoglu, and K. Rose, 'VQ-index: an index structure for similarity searching in multimedia databases,' Pro. the 10th ACM International Conference on Multimedia, pp.543-552, 2002 https://doi.org/10.1145/641007.641117
  20. J. MacQueen, 'Some methods for classification and analysis of multivariate observations,' Proc. 5th Berkeley Symp. Math. Statist, Prob., 1:281-297, 1967
  21. R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, 'Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,' Proc. ACM SIGMOD, 94-105, 1998 https://doi.org/10.1145/276304.276314
  22. S. Berchtold, C. Boehm, D.A. Keim, and H.-P. Kriegel, 'A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space,' Proc. ACM PODS, 78-86, 1997 https://doi.org/10.1145/263661.263671
  23. L. Wu, C. Faloutsos, K. Sycara, and T.R. Payne, 'FALCON: Feedback Adaptive Loop for Content-Based Retrieval,' Proc. VLDB Conf., 297-306, 2000
  24. P. O'Neil and D.Quass, 'Improved Query Performance with Variant Indexes.' In Proceeding of the ACM SIGMOD International Conference on Management of Data, 1997 https://doi.org/10.1145/253260.253268