Browse > Article

A Cyclic Sliced Partitioning Method for Packing High-dimensional Data  

김태완 (부산대학교 컴퓨터및정보통신연구소)
이기준 (부산대학교 전자계산학과)
Abstract
Traditional works on indexing have been suggested for low dimensional data under dynamic environments. But recent database applications require efficient processing of huge sire of high dimensional data under static environments. Thus many indexing strategies suggested especially in partitioning ones do not adapt to these new environments. In our study, we point out these facts and propose a new partitioning strategy, which complies with new applications' requirements and is derived from analysis. As a preliminary step to propose our method, we apply a packing technique on the one hand and exploit observations on the Minkowski-sum cost model on the other, under uniform data distribution. Observations predict that unbalanced partitioning strategy may be more query-efficient than balanced partitioning strategy for high dimensional data. Thus we propose our method, called CSP (Cyclic Spliced Partitioning method). Analysis on this method explicitly suggests metrics on how to partition high dimensional data. By the cost model, simulations, and experiments, we show excellent performance of our method over balanced strategy. By experimental studies on other indices and packing methods, we also show the superiority of our method.
Keywords
High dimensional data; Access Method; Packing; Cost Model;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 C. Aggarwal, J. Wolf, P. Yu and M. Epelman, 'The S-Tree: An Efficient Index for Multi-dimensional Objects,' Int. Symp. SSD'97, page 350-373, 1997   DOI   ScienceOn
2 R. Wober, H.-J. Schek and S. Blott, 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,' Proc. 24th Int'l Conf. on Very Large Data Bases(VLDB), page 194-205, 1998
3 G.R. Hjaltason, H. Samet and Y. Sussmann, 'Speeding up bulk-loading of quadtrees,' ACMGIS, page 50-53, 1997   DOI
4 S.T. Leutenegger, and D.M. Nicol, 'Efficient Bulk-Loading of Gridfiles,' ICASE Report 94-74, 1994
5 R. E. Bellman, 'Adaptive Control Process,' Princeton University Press, 1961
6 B. -U. Pagel, H. -W. Six and M. Winter, 'Window Query-Optimal Clustering of Spatial Objects,' ACM PODS, page 86-94, 1995   DOI
7 T.W. Kim and K.-J. Li, 'A Distance-Based Packing Method for High Dimensional Data,' ADC'03, page 135-144, 2003
8 C. Bohm, S. Berchtold and D. Keim, 'Searching in High-Dimensional Spaces-Index Structures for Improving the Performance of Multimedia Databases,' ACM Computing Surveys, 33(3), page 322-373, 2001   DOI   ScienceOn
9 S.T. Leutenegger and M.A. Lopez, 'The Effect of Buffering on the Performance of R-Trees,' Proc. 14th Int. Conf. on Data Engineering (ICDE), page 164-171, 1998   DOI
10 D. Barbara, et al., 'The New Jersey Data Reduction Report,' IEEE Bulletin of the Technical Committee on Data Engineering, 20(4), page 3-45, 1997
11 S. Chaudhuri and U. Dayal, 'An Overview of Data Warehousing and OLAP Technology,' SIGMOD Record, 1997   DOI   ScienceOn
12 D.M. Gavrila, 'R-tree Index Optimazation,' Technical Report CS-TR-3292, 1996
13 V. Gaede and O. Gunther, 'Multidimensional Access Methods,' ACM Computing Surveys, 30(2), page 170-231, 1998   DOI   ScienceOn
14 I. Kamel and C. Faloutsos, 'On Packing R-trees,' Proc. Int. Conf. on Information and Knowledge Management (CIKM), page 490-499, 1993   DOI
15 N. Beckmann, H.-P. Kriegel, R. Schneider and B. Seeger, 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' Proc. ACM SIGMOD Int. Conf. on Management of Data, page 322-331, 1990
16 B. -U. Pagel, H. -W. Six, H. Toben and P. W. Widmayer, 'Towards an Analysis of Range Query Performance in Spatial Data Structures,' ACM PODS, 1993   DOI
17 L. Arge, 'Efficient External-Memory Data Structures and Applications,' Ph.D. Thesis, BRICS Dissertation Series, DS-96-03, University of Aarhus, 1996
18 L. Arge, K. Hindrichs, J. Vahrenhold, and J. S. Vitter, 'Efficient Bulk Operations on Dynamic R-trees,' ALENEX, page 328-348, 1999
19 J. van den Bercken, B. Seeger and P. W. Widmayer, 'A Generic Approach to Bulk Loading Multidimensional Index Structures,' Proc. 23rd Int'l Conf. on Very Large Data Bases (VLDB) page 406-415, 1997
20 Y.J. Garcia, M.L. Lopez and S.T. Leutenegger, 'A Greedy Algorithm for Bulk Loading R-trees,' Technical Report 97-2, 1997
21 A. Guttman, 'R-trees: A Dynamic Index Structure for Spatial Searching,' Proc. ACM SIGMOD Int. Conf. on Management of Data, page 47-57, 1984   DOI
22 R. Eenk, et al., 'Bulk loading a Data Warehouse built upon a UB-Tree,' Proc. IEEE IDEAS, page 179-187, 2000   DOI
23 S. Berchtold, C. Bohm and H.-P. Kriegel, 'Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations,' Proc. EDBT, 1998
24 S. T. Leutenegger, M. A. Lopez and J. Edington, 'STR: A Simple and Efficient Algorithm for R-Tree Packing,' Proc. 13th Int. Conf. on Data Engineering (ICDE), 1997   DOI
25 J.T. Roussopoulos and L. Leifker, 'Direct spatial search on pictorial databases using r-trees,' Proc. ACM SIGMOD Int. Conf. on Management of Data, 1985   DOI
26 D. White and R. Jain, 'Similarity Indexing: Algorithms and Performance,' Int. Symp. on Optical Science and Technology (SPIE), page 62-73, 1996   DOI
27 S. Berchtold, D. A. Keim and H.-P. Kreigel, 'The X-tree: An Index Structure for High-Dimensional Data,' Proc. 22rd Int'l Conf. on Very Large Data Bases (VLDB) page 28-39, 1996
28 A. K. Jain, M. N. Murty and P. J. Flynn, 'Data Clustering; A Review,' ACM Computing Surveys, 31(3), page 264-323, 1999   DOI   ScienceOn