Browse > Article

Spatial Partitioning for Query Result Size Estimation in Spatial Databases  

황환규 (강원대학교 전기전자통신공학부)
Publication Information
Abstract
The query optimizer's important task while a query is invoked is to estimate the fraction of records in the databases that satisfy the given query condition. The query result size estimation in spatial databases, like relational databases, proceeds to partition the whole input into a small number of subsets called “buckets” and then estimate the fraction of the input in the buckets. The accuracy of estimation is determined by the difference between the real data counts and approximations in the buckets, and is dependent on how to partition the buckets. Existing techniques for spatial databases are equi-area and equi-count techniques, which are respectively analogous in relation databases to equi-height histogram that divides the input value range into buckets of equal size and equi-depth histogram that is equal to the number of records within each bucket. In this paper we propose a new partitioning technique that determines buckets according to the maximal difference of area which is defined as the product of data ranges End frequencies of input. In this new technique we consider both data values and frequencies of input data simultaneously, and thus achieve substantial improvements in accuracy over existing approaches. We present a detailed experimental study of the accuracy of query result size estimation comparing the proposed technique and the existing techniques using synthetic as well as real-life datasets. Experiments confirm that our proposed techniques offer better accuracy in query result size estimation than the existing techniques for space query size, bucket number, data number and data size.
Keywords
Query Optimization; Query Result Size Estimation; Spatial Selectivity Estimation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Ubell, M., 'The Mantage Extensible Datablade Architecture,' Proc. SIGMOD Intl. Conf. on Management of Data, 1994   DOI
2 Selinger, P., M.M. Astrahan, D.D. Chamberin, R.A. Lorie, T.G. Price, 'Access Path Selection in a Relational Database Mangement System,' Proc. SIGMOD Intl. Conf. on Management of Data, pp. 23-34, 1979   DOI
3 Chen, C. M., and N. Roussopoulos, 'Adaptive Selectivity Estimation using Query Feed back,' Proc. SIGMOD Intl. Conf. on Management of Data, pp. 161-172, 1994   DOI
4 Tiger/line files (tm), 1992 Technical Documentation, Technical Report, U. S. Bureau of the Census, 1992
5 ARC/INFO, 'Understaning GIS - the ARC/INFO Method,' ARC/INFO, 1993
6 Guttman, A, 'Rrtrees: A Dynamic Index Structure for Spatial Indexing,' Proc. SIGMOD Intl. Conf. on Management of Data, 1985
7 Beckman, N., H-P Kriegel, R. Schneider, and B. Seeger, 'The R*- Trees: An Efficient and Robust Access Method for Points and Rectangles,' Proc. SIGMOD Intl. Conf. on Management of Data, pp. 322-331, 1990
8 Acharya, S., V. Poosala, and S. Ramaswamy. 'Selectivity Estimation in Spatial Databases', Proc. SIGMOD Intl. Conf. on Management of Data, 1999   DOI
9 Poosala, V., Y. Ioannidis, P. Haas, and E. Shekita, 'Improved Histograms for Selectivity Estimation of Range Predicates,' Proc. SIGMOD Intl. Conf. on Management of Data, 1996   DOI
10 Guting, R. H., 'An Introduction to Spatial Database Systems,' The VLDB Journal, Vol. 3, No.4, PP. 357-400, October 1994
11 Poosala, V., Y. Joannidis, P. Haas, and E. Shekida, 'Improved Histogram for Selectivity Estimation of Range Predicates', Proc. SIG MOD Intl. Conf. on Management of Data, pp. 294-305, 1996   DOI
12 Lipton, R. J., J. F. Naughton, and D. A. Schneider, 'Practical Selectivity Estimation through Adaptive Sampling,' Proc. SIGMOD Intl. Conf. on Management of Data, pp. 1-11, 1990   DOI