Browse > Article
http://dx.doi.org/10.3837/tiis.2019.02.007

An Efficient Indexing Structure for Multidimensional Categorical Range Aggregation Query  

Yang, Jian (School of Computer and Communication Engineering, University of Science and Technology Beijing)
Zhao, Chongchong (School of Computer and Communication Engineering, University of Science and Technology Beijing)
Li, Chao (Research Institute of Information Technology, Tsinghua University)
Xing, Chunxiao (Research Institute of Information Technology, Tsinghua University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.2, 2019 , pp. 597-618 More about this Journal
Abstract
Categorical range aggregation, which is conceptually equivalent to running a range aggregation query separately on multiple datasets, returns the query result on each dataset. The challenge is when the number of dataset is as large as hundreds or thousands, it takes a lot of computation time and I/O. In previous work, only a single dimension of the range restriction has been solved, and in practice, more applications are being used to calculate multiple range restriction statistics. We proposed MCRI-Tree, an index structure designed to solve multi-dimensional categorical range aggregation queries, which can utilize main memory to maximize the efficiency of CRA queries. Specifically, the MCRI-Tree answers any query in $O(nk^{n-1})$ I/Os (where n is the number of dimensions, and k denotes the maximum number of pages covered in one dimension among all the n dimensions during a query). The practical efficiency of our technique is demonstrated with extensive experiments.
Keywords
Categorical; Multidimensional Indexing; Query; MCRI-Tree;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Nekrich, Yakov, "Efficient range searching for categorical and plain data," Acm Transactions on Database Systems, vol. 39, no. 1, pp. 1-21, January, 2014.   DOI
2 Tao, Yufei, and C. Sheng, "I/O-Efficient Bundled Range Aggregation," IEEE Transactions on Knowledge & Data Engineering, vol. 26, no. 6, pp. 1521-1531, June, 2014.   DOI
3 S. Singh, C. Mayfield, S. Prabhakar, R. Shah and S. Hambrusch, "Indexing Uncertain Categorical Data," in Proc. of IEEE Conf. on Data Engineering, pp. 616-625, April 15-20, 2007.
4 N. Sarkas, G. Das, N. Koudas and A.K.H. Tung, "Categorical skylines for streaming data," in Proc. of the 27th ACM SIGMOD International Conference on Management of Data, SIGMOD'08, Vancouver, Bc, Canada, pp. 239-250, June 9-12, 2008.
5 D. Comer, "Ubiquitous B-Tree," Acm Computing Surveys vol. 11, no. 2 pp. 121-137, 1979.   DOI
6 M.K. AguileraW. Golab and M.A. Shah, "A practical scalable distributed B-tree," in Proceedings of the Vldb Endowment, vol. 1, no. 1, pp. 598-609, August, 2008.   DOI
7 C. S. Ellis, "Concurrent Search and Insertion in AVL Trees," in IEEE Transactions on Computers, vol. C-29, no. 9, pp. 811-817, Sept, 1980.   DOI
8 C.H. WuT.W. Kuo and L.P. Chang, "An efficient B-tree layer implementation for flash-memory storage systems," Acm Transactions on Embedded Computing Systems, vol. 6, no. 3, Article 19, July 2007.
9 H. Roh, S. Kim, D. Lee and S. Park, "As B-Tree: A Study of an Efficient B+-tree for SSDs," Journal of Information Science & Engineering, vol. 30, no. 1, pp. 85-106, January, 2014.
10 H. Roh, S. Park, S. Kim, M. Shin and S.W. Lee, "B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives," in Proceedings of the VLDB Endowment, vol. 5, no. 4, pp. 286-297, 2012.   DOI
11 P. JinP. Yang and L. Yue, "Optimizing B+-tree for hybrid storage systems," Distributed & Parallel Databases vol. 33, no. 3, pp. 449-475, September, 2015.   DOI
12 J. Yang and J. Widom, "Incremental Computation and Maintenance of Temporal Aggregates," Vldb Journal, vol. 12, no. 3, pp. 262-283, October, 2003.   DOI
13 D. Papadias, P. Kalnis, J. Zhang and Y. Tao, "Efficient OLAP Operations in Spatial Data Warehouses," in Proc. of 7th Int. Symposium on Advances in Spatial and Temporal Databases , Redondo Beach, CA, USA,Berlin, pp. 443-459, July 12-15, 2001.
14 J. Rao and K.A. Ross, "Cache Conscious Indexing for Decision-Support in Main Memory," in Proc. of 25th International Conference on Very Large Data Bases, VLDB'99, pp. 78-89, September 07 - 10, 1999.
15 L. Arge, "The buffer tree: A new technique for optimal I/O-algorithms," in Proc. of 4th Int. Workshop on Algorithms and Data Structures(WADS'95). vol. 3, no.28, pp. 334-345, 1995.
16 L. Arge, "The Buffer Tree: A Technique for Designing Batched External Data Structures," Algorithmica, vol. 37, no. 1,pp. 1-24, 2003.   DOI
17 E.D. Demaine and M. Farach-Colton, "Cache-Oblivious B-Trees," Siam Journal on Computing. vol. 35, no. 2, pp. 341-358, 2005.   DOI
18 D. Agrawal, D. Ganesan, R. Sitaraman, Y. Diao and S. Singh, "Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices," in Proc. of the Vldb Endowment, vol. 2, no. 1, pp. 361-372, January 2009.   DOI
19 J. Rao and K.A. Ross, "Making B+- trees cache conscious in main memory," Acm Sigmod Record, vol. 29, no. 2, pp. 475-486, June 2000.   DOI
20 P. O Neil, E. Cheng, D. Gawlick and E. O Neil, "The log-structured merge-tree (LSM-tree)," Acta Informatica, vol. 33, no. 4, pp. 351-385, June 1996.   DOI
21 B.C. Kuszmaul, "A comparison of fractal trees to log-structured merge (LSM) trees," Tokutek White Paper, 2014.
22 V. Gaede, "Multidimensional access methods," ACM Computing Surveys (CSUR). vol. 30, no. 2 pp. 170-231, 1998.   DOI
23 Y. Li, B. He, R.J. Yang, Q. Luo and K. Yi, "Tree indexing on solid state drives," Proceedings of the VLDB Endowment , vol. 3, no. 1-2, pp. 1195-1206, September 2010.   DOI
24 R. Sears and R. Ramakrishnan, "bLSM: a general purpose log structured merge tree," in Proc. of the 31th ACM SIGMOD International Conference on Management of Data. SIGMOD'12. Scottsdale, Arizona, USA, pp. 217-228, May 20 - 24, 2012
25 J.J. LevandoskiD.B. Lomet and S. Sengupta, "The Bw-Tree: A B-tree for new hardware platforms," in Proc. of IEEE Conf. on Data Engineering. pp. 302-313, April 8-12, 2013.
26 B. Wang, Y. Hou, M. Li, H. Wang and H. Li, "Maple: scalable multi-dimensional range search over encrypted cloud data with tree-based index," in Proc. of the 9th ACM symposium on Information, computer and communications security, pp. 111-122, June 04-06, 2014.
27 M. Freeston, "A general solution of the n-dimensional B-tree problem," ACM SIGMOD Record. vol. 24, no. 2, pp. 80-91, 1995.   DOI
28 S. Nishimura, H. Yokota, "QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curves," in Proc. of the 36th ACM International Conference on Management of Data. SIGMOD'17. Chicago, Illinois, USA, pp. 1525-1537, May 14-19, 2017.
29 D.B. Lomet, "The hB-tree: A Multiattribute Indexing Method with Good Guaranteed Performance," Acm Transactions On Database Systems(TODS), vol. 15, no. 4, pp. 625-658, 1990.   DOI
30 G. Li, P. Zhao, L. Yuan and S. Gao, "Efficient Implementation of a Multi-Dimensional Index Structure Over Flash Memory Storage Systems," Journal of Supercomputing, vol. 64, no. 3, pp. 1055-1074, June 2013.   DOI
31 A. Vlachou, "Efficient RDF Query Processing using Multidimensional Indexing," in Proc. of the 21st Pan-Hellenic Conference on Informatics, Larissa, Greece, Article No. 44, September 28-30 2017.
32 T. ZaschkeC. Zimmerli and M.C. Norrie, "The PH-Tree - a Space-Efficient Storage Structure and Multi-Dimensional Index," in Proc. of the ACM SIGMOD international conference on Management of data. SIGMOD'14, Snowbird, Utah, USA, pp. 397-408, June 22-27, 2014.