Design of an Efficient Parallel High-Dimensional Index Structure

효율적인 병렬 고차원 색인구조 설계

  • Park, Chun-Seo (Electronics and Telecommunications Research Institute) ;
  • Song, Seok-Il (Dept. of Information Communication Engineering, Chungbuk National University) ;
  • Sin, Jae-Ryong (Dept. of Information Communication Engineering, Chungbuk National University) ;
  • Yu, Jae-Su (Dept. of Information Communication Engineering, Chungbuk National University)
  • 박춘서 (한국전자통신연구원) ;
  • 송석일 (충북대학교 정보통신공학과) ;
  • 신재룡 (충북대학교 정보통신공학과) ;
  • 유재수 (충북대학교 정보통신공학과)
  • Published : 2002.02.01

Abstract

Generally, multi-dimensional data such as image and spatial data require large amount of storage space. There is a limit to store and manage those large amount of data in single workstation. If we manage the data on parallel computing environment which is being actively researched these days, we can get highly improved performance. In this paper, we propose a parallel high-dimensional index structure that exploits the parallelism of the parallel computing environment. The proposed index structure is nP(processor)-n$\times$mD(disk) architecture which is the hybrid type of nP-nD and lP-nD. Its node structure increases fan-out and reduces the height of a index tree. Also, A range search algorithm that maximizes I/O parallelism is devised, and it is applied to K-nearest neighbor queries. Through various experiments, it is shown that the proposed method outperforms other parallel index structures.

일반적으로 이미지나 공간 데이터베이스와 같은 다차원의 특징을 갖는 데이터들은 대용량의 저장공간을 요구한다. 이 대량의 데이터를 하나의 워크스테이션에 저장하고 검색을 수행하는 데는 한계가 있다. 최근 활발히 연구되고 있는 병렬 컴퓨팅 환경에서 이들에 대한 저장 및 검색을 수행한다면 훨씬 더 높은 성능 향상을 가져 올 수 있을 것이다. 이 논문에서는 기존에 존재하는 병렬 컴퓨팅 환경의 장점을 최대한 이용하는 병렬 고차원 색인구조를 제안한다. 제안하는 색인구조는 nP(프로세서)-nD(디스크)와 lP-nD의 결합 형태인 nP-n$\times$mD의 구조라고 볼 수 있다. 노드 구조는 팬-아웃을 증가시키고 트리의 높이를 줄일 수 있도록 설계되었다. 또한 I/O의 별렬성을 최대화하는 범위 탐색 알고리즘을 제안하고 이것을 K-최근접 탐색 알고리즘에 적용하여 탐색 성능향상을 꾀한다. 마지막으로, 다양한 환경에서의 실험을 통해 제안하는 색인구조의 탐색 성능을 테스트하고 기존에 제안된 병렬 다차원 색인구조와의 비교를 통해 제안한 방법의 우수함을 보인다.

Keywords

References

  1. K.I. Lin, H. Jagadish, and C, Faloutsos, 'The TV-tree : An Index Structure for High Dimensional Data,' VLDB Journal, Vol 3, pp. 517-542, 1994 https://doi.org/10.1007/BF01231606
  2. S. Berchtold, D. A. Keim and H-P. Kriegel, 'The X-tree : An Index Structure for High-Dimensional Data,' In Proc. 22nd VLDB Coni pp. 28-39, 1996
  3. D. A. White and R. Jain, 'Similarity Indexing with the SS-tree,' In Proc. ICDE, New Orleans, pp. 516-523, 1996
  4. N. Katayama and S. Satoh, 'The SR-Tree : An index structure for high dimensional nearest neighbor queries,' In Proc. SIGMOD conf., pp. 369-380 1997 https://doi.org/10.1145/253262.253347
  5. 이석회, 유재수, 조기형, 허대영, 'CIR-Tree : 효율적인 고차원 색인기법', 한국정보과학회 논문지(B), 한국정보과학회 제26권 제6호, pp. 724-734, Jun 1999
  6. J.T. Robinson. 'The K-D-B-tree: A search structure for large multidimensional dynamic indexed.' In Proc. ACM SIGMOD Conf., pp. 10-18, 1981 https://doi.org/10.1145/582318.582321
  7. Lomet D. and Salzberg B, 'The hB-Tree: A Robust Multiattribute Search Structure,' In Proc. ICDE Conf, pp. 296-304, 1989 https://doi.org/10.1109/ICDE.1989.47229
  8. A. Henrich, H.-W. Six and P. Widmayer, 'The LSD-tree: spatial access to multidimensional point and non-point objects,' In Proc. VLDB Conf., pp. 45-53, 1989
  9. M. Freeston, 'The BANG file: a new kind of grid file,' In Proc. VLDB conf., pp. 260-269, 1987
  10. J. Nievergelt, H. Hinterberger, and K. Sevcik, 'The grid file: An adaptable, symmetric multikey file structure,' ACM Transactions on Database Systems(TODS). 1984 https://doi.org/10.1145/348.318586
  11. K. Chakrabarti and S. Mehrotra. 'The Hybrid Tree : An Index Structure for High-Dimensional Feature Spaces,' In Proc. ICDE conf., pp. 440-447, 1999 https://doi.org/10.1109/ICDE.1999.754960
  12. Aristides Gionis, Piotr Indyk and Rajeev Motwani, 'Similarity Search in High Dimensions via Hashing,' In Proc. VLDB Conf., pp. 518-529, 1999
  13. Weber R., Scheck H.-J and Blott S., 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,' In Proc. VLDB Conf., pp. 194-205, 1998
  14. Berchtold S., 'Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces,' In Proc. ICDE Conf., pp. 577-588, 2000 https://doi.org/10.1109/ICDE.2000.839456
  15. Ning An, Liujian Quian, Anand Sivasubramaniam and Tom Kecfe, 'Evaluating Parallel R-Tree Implementations on a Network of Workstations,' In Proc. ACM GIS Conf., pp. 159-160, 1998 https://doi.org/10.1145/288692.288721
  16. 조성훈, 김성주, 이준호, 이주영, 박석천, 'SANS의 구조와 기술 요소', 정보처리학회지 제8권 제4호, pp.19-28, 2001
  17. I. Kamel and C. Faloutsos, 'Parallel R-trees,' CS-TR-2820, UMIACS-TR-92-1, Computer Science Technical Report Series, University of Maryland, Collage Park, MD, 1992
  18. Stefan Berchtold, Christian Bohm, Bemhard Braunmuller, Daniel A.Keim and Hans-Peter Kriegel, 'Fast Parallel Similarity Search in Multimedia Databases,' In Proc. SIGMOD Conf., pp. 1-12, 1997 https://doi.org/10.1145/253262.253263
  19. Kap S. Bang and Huizu Lu, 'The PML-tree: An Efficient Parallel Spatial Index Structure for Spatial Databases,' In Proc. ACM Conf., pp. 79-88. 1996 https://doi.org/10.1145/228329.228338
  20. N. Koudas, C. Faloutsos and I. Kamel, 'Declus-tering R-trees on Multi-Computer Architectures,' Technical Research Report ISR 1994
  21. Bemad Scnnitzer and Scott T.Leutenegger, 'Master-Client R-trees: A New Parallel R-trec Architecture,' In Proc. SSDBM Conf. pp. 68-77, 1999 https://doi.org/10.1109/SSDM.1999.787622
  22. Botao Wang, Hiroyuki Horinokuchi, Kunihiko Kaneko and Akifumi Makinouchi, 'Parallel R-tree Search Algorithm on DSVM,' In Proc. DASFAA Conf., pp. 237-245, 1999 https://doi.org/10.1109/DASFAA.1999.765757
  23. Xiaodong Fu, Dingxing Wang, Weimin Zheng and Mciming Sheng, 'GPR-Tree: A Global Parallel Index Structure for Multiattribute Declustering on Cluster of Workstations,' IEEE, 1997 https://doi.org/10.1109/APDC.1997.574047
  24. Roger Weber, 'Parallel VA-File,' VLDB, 1999
  25. http://www.metastor.com/products/sans/E4400_ datasheet.pdf
  26. N. Beckmann, H.P. Komacker, R. Schneider and B. Seeger, 'The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. ACM SIGMOD Conf., pp. 322-331, 1990
  27. A. Guttman, 'R-Trees: A dynamic index structure for spatial searching,' In Proc. ACM SIGMOD Conf., pp. 47-57, 1984 https://doi.org/10.1145/602259.602266
  28. K. Lin, H. V. Jagadish, and C. Faloutsos, 'The TV-Tree an index structure for high dimensional data,' VLDB Journal, pp. 517-542, 1994 https://doi.org/10.1007/BF01231606
  29. A. Henrich, 'Improving the performance of multidimensional access structures based on k-d-trees,' In Proc. Information and Knowledge Management Conf., pp. 68-75, 1996 https://doi.org/10.1109/ICDE.1996.492090
  30. T.Sellis, N. Roussopoulos and C. Faloutsos, 'The R--Tree: a dynamic index for multi-dimensional objects,' In Proc. VLDB Conf., pp. 507-518, 1987
  31. Ibrahim Kamel and Christos Faloutsos, 'Parallel R-trees,' In Proc. ACM SIGMOD Conf., pp. 195-204, 1992 https://doi.org/10.1145/130283.130315
  32. Stefan Berchtold, 'Improving the Query performance of High-Dimensional Index Structure by Bulk Load Operation,' In Proc. EDBT Conf., pp. 216-230, 1998