시공간 데이타웨어하우스를 위한 힐버트큐브

Hilbert Cube for Spatio-Temporal Data Warehouses

  • 최원익 (서울대학교 전기컴퓨터공학부) ;
  • 이석호 (서울대학교 전기컴퓨터공학부)
  • 발행 : 2003.10.01

초록

최근 시공간 데이타에 대한 OLAP연산 효율을 증가시키기 위한 여러 가지 연구들이 행하여지고 있다. 이들 연구의 대부분은 다중트리구조에 기반하고 있다. 다중트리구조는 공간차원을 색인하기 위한 하나의 R-tree와 시간차원을 색인하기 위한 다수의 B-tree로 이루어져 있다. 하지만, 이러한 다중트리구조는 높은 유지비용과 불충분한 질의 처리 효율로 인해 현실적으로 시공간 OLAP연산에 적용하기에는 어려운 점이 있다. 본 논문에서는 이러한 문제를 근본적으로 개선하기 위한 접근 방법으로서 힐버트큐브(Hilbert Cube, H-Cube)를 제안하고 있다. H-Cube는 집계질의(aggregation query) 처리 효율을 높이기 위해 힐버트 곡선을 이용하여 셀들에게 완전순서(total-order)를 부여하고 있으며, 아울러 전통적인 누적합(prefix-sum) 기법을 함께 적용하고 있다. H-Cube는 대상공간을 일정한 크기의 셀로 나누고 그 셀들을 힐버트 값 순서로 저장한다. 이러한 셀들이 시간순서로 모여 규브형태를 이루게 된다. 또한 H-Cube는 시간의 흐름에 따라 변화되는 지역적인 데이타 편중에 대처하기 위해 적응적으로 셀을 정제한다. H-Cube는 정적인 공간 차원에서 움직이는 짐 객체에 초점을 두고 있는 적웅적이며, 완전순서화되어 있으며, 또한 누적합을 이용한 셀 기반의 색인구조이다. 본 논문에서는 H-Cube의 성능 평가를 위해서 다양한 실험을 하였으며, 그 결과로서 유지비용과 질의 처리 효율성면 모두에서 다중트리구조보다 높은 성능 향상이 있음을 보인다.

Recently, there have been various research efforts to develop strategies for accelerating OLAP operations on huge amounts of spatio-temporal data. Most of the work is based on multi-tree structures which consist of a single R-tree variant for spatial dimension and numerous B-trees for temporal dimension. The multi~tree based frameworks, however, are hardly applicable to spatio-temporal OLAP in practice, due mainly to high management cost and low query efficiency. To overcome the limitations of such multi-tree based frameworks, we propose a new approach called Hilbert Cube(H-Cube), which employs fractals in order to impose a total-order on cells. In addition, the H-Cube takes advantage of the traditional Prefix-sum approach to improve Query efficiency significantly. The H-Cube partitions an embedding space into a set of cells which are clustered on disk by Hilbert ordering, and then composes a cube by arranging the grid cells in a chronological order. The H-Cube refines cells adaptively to handle regional data skew, which may change its locations over time. The H-Cube is an adaptive, total-ordered and prefix-summed cube for spatio-temporal data warehouses. Our approach focuses on indexing dynamic point objects in static spatial dimensions. Through the extensive performance studies, we observed that The H-Cube consumed at most 20% of the space required by multi-tree based frameworks, and achieved higher query performance compared with multi-tree structures.

키워드

참고문헌

  1. D. Papadias, Y. Tao, P. Kalnis, and J. Zhang: Indexing Spatio-Temporal Data Warehouses. In Proceedings of IEEE International Conference on Data Engineering, pages 166-175, 2002 https://doi.org/10.1109/ICDE.2002.994706
  2. A. Guttman, 'R-Trees: A Dynamic Index Structure for Spatial Searching,' In Proceedings of ACM SIGMOD International Conference on Management of Data, pp.47-57, Jun., 1984 https://doi.org/10.1145/602259.602266
  3. N. Beckmann et al., 'The $R^{*}$-tree : An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l, Conf. on Management of Data, ACM SIGMOD, pp.322-331, May, 1990 https://doi.org/10.1145/93597.98741
  4. Dimitris Papadias, Panos Kalnis, Jun Zhang, and Yufei Tao, 'Efficient OLAP Operations in Spatial Data Warehouses,' In Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, 2001
  5. Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, and Ramakrishnan Srikant, 'Range Queries in OLAP Data Cubes,' In Proceedings of the 1997 ACM SIGMOD Int'l. Conf. on Management of Data, pp. 73-88, 1997
  6. H. Samet, 'The Quadtree and Related Hierarchical Data Structure,' ACM Computing Surveys, 16(2), pp.187-260, 1984 https://doi.org/10.1145/356924.356930
  7. V. Gaede, O. Gunther, 'Multidimensional Access Methods,' ACM Computing Surveys, 30(2), pp.170-231, 1998 https://doi.org/10.1145/280277.280279
  8. J. Nievergelt, H. Hinterberger, and K. C. Sevcik, 'The Grid file : An Adaptable, Symmetric Multikey File Structure,' ACM Transactions on Database Systems. 9(1), pp.38-71, 1984 https://doi.org/10.1145/348.318586
  9. J.Gray, A.Bosworth, A.Layman, H.Pitamish, Data Cube : A Relational Aggregatioon Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, In Proc. of ICDE, pp. 152-159, 1996 https://doi.org/10.1109/ICDE.1996.492099
  10. David J. Abel and David M. Mark, 'A Comparative Analysis of some 2-Dimensional Orderings,' International Journal of Geographical Information Systems, Vol.4, No.1, pp. 21-31, 1990 https://doi.org/10.1080/02693799008941526
  11. H. V. Jagadish, 'Linear Clustering of Objects with Multiple Attributes,' In Proceedings of the 1990 ACM SIGMOD Int'l Conf. on Management of Data, pp. 332-342, 1990 https://doi.org/10.1145/93597.98742
  12. Christos Faloutsos and Shari Roseman, 'Fractals for secondary key retrieval,' In Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 247-252, 1989 https://doi.org/10.1145/73721.73746
  13. Bongki Moon, H. V. Jagadish, Christos Faloutsos, and Joel H. Saltz, 'Analysis of the Clustering Properties of the Hilbert Space-Filling Curve,' TKDE, Vol.13, No.1, pp. 124-141, 2001 https://doi.org/10.1109/69.908985
  14. J. G. Griffiths, 'An Algorithm for Displaying a Class of Space-filling Curves,' Software-Practice and Experience, Vo1.16, No.5, pp. 403-411, 1986 https://doi.org/10.1002/spe.4380160503
  15. Jon Louis Bentley, Donald F. Stanat, and E. Hollings Williams Jr., 'The Complexity of Finding Fixed-Radius Near Neighbors,' In Information Processing Letters, Vol.6, No.6, pp. 209-212, 1977 https://doi.org/10.1016/0020-0190(77)90070-9
  16. Yannis Theodoridis, JeRerson R. O. Silva, and Mario A. Nascimento, 'On the Generation of Spatiotemporal Datasets,' In Proceedings of the 6th lnt'l. Symp. on Spatial Databases, pp. 147-164, 1999