Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2004.11D.5.1011

Performance Analysis on Declustering High-Dimensional Data by GRID Partitioning  

Kim, Hak-Cheol (부산대학교 대학원 전자계산학과)
Kim, Tae-Wan (행정자치부 전자정부전략개발실)
Li, Ki-Joune (부산대학교 정보 컴퓨터공학부)
Abstract
A lot of work has been done to improve the I/O performance of such a system that store and manage a massive amount of data by distributing them across multiple disks and access them in parallel. Most of the previous work has focused on an efficient mapping from a grid ceil, which is determined bY the interval number of each dimension, to a disk number on the assumption that each dimension is split into disjoint intervals such that entire data space is GRID-like partitioned. However, they have ignored the effects of a GRID partitioning scheme on declustering performance. In this paper, we enhance the performance of mapping function based declustering algorithms by applying a good GRID par-titioning method. For this, we propose an estimation model to count the number of grid cells intersected by a range query and apply a GRID partitioning scheme which minimizes query result size among the possible schemes. While it is common to do binary partition for high-dimensional data, we choose less number of dimensions than needed for binary partition and split several times along that dimensions so that we can reduce the number of grid cells touched by a query. Several experimental results show that the proposed estimation model gives accuracy within 0.5% error ratio regardless of query size and dimension. We can also improve the performance of declustering algorithm based on mapping function, called Kronecker Sequence, which has been known to be the best among the mapping functions for high-dimensional data, up to 23 times by applying an efficient GRID partitioning scheme.
Keywords
Parallel I/O; High-Dimensional Data; GRID Partition; Estimation Model;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Berchtold, C. Bohm and H-.P. Kriegel, Improving the Query Performance of High-Dimensional Index Structures by Bulk Loading R-trees, In Proc EDBT Conf, pp.216-230, 1998
2 D-R. Liu and M-Y. Wu, A Hypergraph Based Approach to Declustering Problems, Distributed and Parallel Databases, Vol.10, No.3, pp.269-288, 2001   DOI
3 T-W. Kim, H-C. Kim and K-J Li, Analyzing the range query performance of two partitioning methods in high-dimensional space, Technical Report, Department of Computer Science, Pusan National University, 2003. http://isel.cs. pusan.ac.kr/paper/pdf/twkim_03_IPL.pdf
4 C. Chang, B. Moon, A. Acharya and C. Shock, Titan: a High-Performance Remote-sensing Database, In Proc ICDE Conf. pp.375-384, 1997   DOI
5 R. Bhatia, R. K. Sinha and C-M. Chen, Hierachical Declustering Schemes for Range Queries, In Proc EDBT Conf, pp.525-537, 2000
6 S. Prabhakar, D. Agrawal and A. E. Abbadi, Disk Allocation for Fast Range and Nearest-Neighbor Queries, Distributed and Parallel Databases, Vol.14, No.2, pp.107-135, 2003   DOI
7 Kamel and C. Faloutsos, Parallel R-trees, In Proc SIGMOD, pp.195-204, 1992   DOI
8 B. Chor, C. E. Leiserson, R. L. Rivest and J. B. Shearer, An Application of Number Theory to the Organization of Raster-Graphics Memory, Journal of ACM, Vol.33, No.1, pp.86-104, 1986   DOI   ScienceOn
9 L. T. Chen and D. Rotem, Declustering Objects for Visualization, In Proc VLDB Conf. PP.85-96, 1993
10 M. T. Fang, R. C. T. Lee and C. C. Chang, The Idea of De-Clustering and Its Applications, In Proc VLDB Conf. pp.181-I88, 1986
11 C-.M Chen and R. K. Sinha, Analysis and Comparison of Declustering Schemes for Interactive Navigation Queries, IEEE TKDE, Vol.12, No.5, pp.763-778, 2000   DOI   ScienceOn
12 S-W. Kuo, M. Winslett, Y. Cho and J. Lee, New GDM-based Declustering Methods for Parallel Range Queries, In Proc. IDEAS Symp, pp.119-127, 1999   DOI
13 S. Berchtold, C. B6hm, B. Braunmuller, D. A. Keirn and H.-P. Kriegel, Fast Parallel Similarity Search in Multimedia Databases, In Proc. SIGMOD Conf, pp.1-12, 1997   DOI   ScienceOn
14 K. Abdel-Ghaffar and A. E. Abbadi, Optimal Allocation of Two-Dimensional Data, In Proc ICDT Conf, pp.409-418, 1997
15 Y-L. La, K. A. Hua and H. C. Young. GeMDA: A Multidimensional Data Partitioning Technique for Multiprocessor Database Systems. Distributed and Parallel Databases, Vol.9, No.3, pp.211-236, 2001   DOI
16 H. C. Du and J. S. Sobolewski, Disk Allocation for Cartisian Files on Multiple-Disk Systems, ACM Trans. Database Systems, Vol.7, No.1, pp.82-102, 1982   DOI   ScienceOn
17 D. R. Liu and S. Shekhar, Partitioning Similarity Graphs: A Framework for Declustering Problems: International Journal Information Systems, Vol.21, No.6, pp.475-496, 1996   DOI   ScienceOn
18 S. Prabhakar, K. Abdel-Ghaffar and A. El Abbadi, Cyclic Allocation of Two-Dimensional Data, In Proc. ICDE Conf. pp.94-101, 1998   DOI
19 Y. Zhou, S. Shekhar and M. Coyle, Disk Allocation Methods for Parallelizing Grid Files, In Proc. ICDE Conf, pp. 243-252, 1994   DOI
20 C. M. Chen and C. T. Cheng, From Discrepancy to Declustering : Near optimal multidimensional declustering strategies for range queries, In Proc PODS Conf. pp.29-38, 2002   DOI
21 C. Faloutsos and P. Bhagwat, Declustering Using Fractals, In Prog. Parallel and Distributed Information Systems Conf. pp.18-25, 1993   DOI
22 C. Faloutsos and D. Metaxas, Disk Allocation Methods Using Error Correcting Codes, IEEE Trans on Computers, Vol.40, No.8, pp.907-914, 1991   DOI   ScienceOn
23 M. H. Kim and S. Pramanik, Optimal File Distribution For Partial Match Retrieval, In Prog. SIGMOD Conf, pp. 173-182, 1988   DOI
24 T-W. Kim, A Distance-Based Packing Method for High Dimensional Data, PhD thesis, Pusan National University, 2003
25 C-M. Chen, R. Bhatia and R. K. Sinha, Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences, IEEE TKDE, Vol.15, No.3, pp.659-670, 2003   DOI   ScienceOn
26 M. J. Atallah and S. Prabhakar, (Almost) Optimal Parallel Block Access for Range Queries, In Prog. PODS Conf. pp. 205-215, 2000   DOI
27 R. Bhatia, R. K. Sinha and C.-M. Chen, Declustering Using Golden Ratio Sequences, In Prog. ICDE Conf. pp.271-280, 2000   DOI