중첩된-서브큐브: 전위-합 큐브를 위한 손실 없는 압축 방법

Overlapped-Subcube: A Lossless Compression Method for Prefix-Sun Cubes

  • 강흠근 (한국과학기술원 전산학과) ;
  • 민준기 (한국과학기술원 전산학과) ;
  • 전석주 (안산1대학 인터넷정보과) ;
  • 정진완 (한국과학기술원 전산학과)
  • 발행 : 2003.12.01

초록

영역 질의는 의사결정에서 자주 사용되는 중요한 질의이다. 그러나, 영역 질의를 처리하기 위해서는 많은 점(cell)들이 검색되어야 하기 때문에 효율적인 처리가 쉽지 않았다. 이러한 문제를 해결하기 위해서 영역의 크기에 관계없이 일정한 시간에 영역 질의를 처리할 수 있는 전위-합 큐브(prefix-sum cube)가 제안되었다. 그러나, 전위-합 큐브는 영역 질의의 처리는 효율적으로 할 수 있지만, 그것을 저장하기 위해 매우 큰 저장 공간이 필요하다는 문제를 갖고 있다. 본 논문에서는 전위-합 큐브의 이 문제를 해결하기 위해서 손실 없이 전위-합 큐브를 압축하는 중첩된-서브큐브 압축 방법을 제안한다. 중첩된-서브큐브 압축 방법은 전위-합 큐브의 압축을 위해서 만들어진 것으로 압축된 상태에서 저장된 값을 검색할 수 있는 매우 유용한 특징이 있다. 이 특징으로 인해, 질의 처리 시 압축된 전위-합 큐브를 그대로 사용할 수 있다. 압축된 전위-합 큐브를 사용하면, 동일한 크기의 버퍼에 전위-합 큐브의 더 많은 부분을 저장할 수 있다. 이것은 질의 처리 시 디스크 입출력의 횟수를 획기적으로 감소시킨다.

A range-sum query is very popular and becomes important in finding trends and in discovering relationships between attributes in diverse database applications. It sums over the selected cells of an OLAP data cube where target cells are decided by specified query ranges. The direct method to access the data cube itself forces too many cells to be accessed, therefore it incurs severe overheads. The prefix-sum cube was proposed for the efficient processing of range-sum queries in OLAP environments. However, the prefix-sum cube has been criticized due to its space requirement. In this paper, we propose a lossless compression method called the overlapped-subcube that is developed for the purpose of compressing prefix-sum cubes. A distinguished feature of the overlapped-subcube is that searches can be done without decompressing. The overlapped-subcube reduces the space requirement for storing prefix-sum cubes, and improves the query performance.

키워드

참고문헌

  1. R. Agrawal, A. Gupta, S. Sarawagi, 'Modeling Multidimensional Databases,' In Proceedings of the 13th International Conference on Data Engineering, pages 232-243, 1997
  2. E.F. Codd, S.B. Codd, and C.T. Salley, 'Providing OLAP (n-Iine analytical processing) to user-analysts: An IT mandate,' Technical report, 1993
  3. A. Shoshani, 'OLAF and Statistical Databases: Similarities and Differences,' In Proceedings of ACM PODS, pages 185-196, 1997 https://doi.org/10.1145/263661.263682
  4. Ralph Kimball, The Data Warehousing ToolKit, John Wiley and Sons, 1996
  5. T. Niemi, J. Nummenmaa, P. Thanisch, 'Functional Dependencies in Controlling Sparsity of OLAP Cubes,' In Proceedings of the second International Conference on DaWaK, pages 199-209, 2000
  6. C. Ho, R. Agrawal, N. Megiddo, R. Srikant, 'Range Queries in OLAP Data Cubes,' In Proceedings of ACM SIGMOD, pages 73-88, 1997
  7. J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, 'Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab and Sub Totals,' In Proceedings of the 12th International Conference on Data Engineering, pages 152-159, 1996
  8. BMC Software, Constructing a Data Warehouse, White Paper, www.bmc.com/products/documents/ 9322.pdf
  9. M. Riedewald, D. Agrawal, A. E. Abbadi, 'pCube: Update-Efficient Online Aggregation With Progressive Feedback and Error Bounds,' In Proceedings of the 12th International Conference on SSDBM, pages 95-108, 2000 https://doi.org/10.1109/SSDM.2000.869781
  10. S. Chaudhuri and U. Dayal, An Overview of Data Warehousing and OLAP Technology, SIGMOD Record, 26(1), 1997 https://doi.org/10.1145/248603.248616
  11. M. Gyssens and L.V.S. Lakshmanan, 'A foundation for multi-dimensional databases,' Proceedings of the 23th International Conference on VLDB, pages 106-115, 1997
  12. Y. Kotidis, N. Roussopoulos, 'An Alternative Storage Organization for ROLAP Aggregate Views Based on Cubetrees,' Proceedings of ACM SIGMOD, pages 249-258, 1998 https://doi.org/10.1145/276304.276327
  13. C. Li and X.S. Wang, 'A data model for supporting on-line analytical processing,' In Proceedings of the 5th International Conference on Information and Knowledge Management, pages 81-88, 1996 https://doi.org/10.1145/238355.238444
  14. S. J. Chun, C. W. Chung, J. H. Lee, S. L. Lee, 'Dynamic Update Cube for Range-Sum Queries,' In Proceedings of the 27th International Conference on VLDB, pages 521-530, 2001
  15. J. S. Vitter, M. Wang, 'Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets,' In Proceedings of ACM SIG-MOD, pages 193-204, 1999 https://doi.org/10.1145/304182.304199
  16. J, S. Vitter, M. Wang, B. Lyer, 'Data Cube Approximation and Histograms via Wavelets,' In Proceedings of the 7th International Conference on Information and Knowledge Management, pages 96-104, 1998 https://doi.org/10.1145/288627.288645
  17. S. Sarawagi, M. Stonebraker, 'Efficient Organization of Large Multi-Dimensional Arrays,' In Proceedings of the 10th International Conference on Data Engineering, pages 328-336, 1994 https://doi.org/10.1109/ICDE.1994.283048
  18. Y. Zhao, K Ramasamy, K. Tufte, and J. F. Naughton, 'Array-Based Evaluation of Multi-Dimensional Queries in Object-Relational Database Systems,' In Proceedings of the 14th International Conference on Data Engineering, pages 241-249, 1998 https://doi.org/10.1109/ICDE.1998.655782
  19. J. Li, D. Rotem, J. Srivastava, 'Aggregation Algorithms for Very Large Compressed Data Warehouses,' In Proceedings of the 25th International Conference on VLDB, pages 651-662, 1999
  20. M. Riedewald, D. Agrawal, A. E. Abbadi, R. Pajarola, 'Space-Efficient Data Cubes for Dynamic Environments,' In Proceedings of the second International Conference on DaWaK, pages 24-33, 2000
  21. In Proceedings of the second International Conference on DaWak Space-Efficient Data Cubes for Dynamic Environments M.Riedewald;D.Agrawal;A.E.Abbadi;R.Pajarola