An Enhanced Two Dimensional Histogram Method Utilizing Dense Regions

Roh, Yo-Han;Chung, Yon-Dohn;Ghim, Ho-Jin;Kim, Myoung-Ho;

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

제35권6호
/
Pages.544-554
/
2008
/
1229-7739(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

고 밀도 영역을 이용한 향상된 2차원 히스토그램 기법

An Enhanced Two Dimensional Histogram Method Utilizing Dense Regions

노요한 (한국과학기술원 전산학과) ;
정연돈 (고려대학교 컴퓨터통신공학부) ;
김호진 (한국과학기술원 전산학과) ;
김명호 (한국과학기술원 전산학과)

발행 : 2008.12.15

PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

히스토그램은 데이타베이스 시스템에서 질의 결과 크기를 추정하는 데 널리 이용되고 있다. 히스토그램 기법에서 질의 결과 크기에 대한 추정은 각 버킷 영역 내의 객체들이 균등하게 분포한다는 가정하에 이루어진다. 그러나, 주어진 질의 영역 내의 객체들은 균등하게 분포하지 않을 수 있다. 다시 말해서, 버킷 영역 내에 높은 밀도의 객체 군집 즉 클러스터가 존재할 수 있으며 이로 인하여 히스토그램의 정확도가 현저히 저하될 수 있다. 본 연구의 목적은 히스토그램의 정확도를 향상시키는 데 있다. 이를 위하여 본 연구는 클러스터를 고려한 새로운 히스토그램 기법을 제안한다. 제안하는 기법은 주어진 데이타 분포내에 존재하는 고 밀도 영역을 탐색하고 이를 히스토그램 생성에 활용한다. 제안하는 기법은 클러스터에 의한 정확도 저하를 효과적으로 감소시킴으로써 데이타가 균등하게 분포하지 않은 상황에서 향상된 성능을 제공할 수 있다. 실험을 통해 본 연구는 제안하는 기법이 기존 기법의 성능을 최대 74% 향상시킴을 확인하였다.

Histograms are popularly used for selectivity estimation in database systems. In conventional histogram methods, buckets return the approximated results based on the assumption that all objects in a bucket are uniformly distributed. However, the objects within the region of a query are not likely to be uniformly distributed. That is, there can be some skews (i.e., clusters) in the buckets, which may significantly degrade the accuracy of the histogram. The aim of this work is to enhance the accuracy of histograms. For this purpose, we propose a new two-dimensional histogram method considering clusters. The proposed method detects dense regions and exploits them for organizing buckets. Since the proposed method effectively reduces accuracy degradation caused by clusters, it can provide improved, robust accuracy against skewed data distributions. Through experiments, we show that the proposed method provides up to 74% improved performance compared with the conventional histogram.

키워드

참고문헌

Y. E. Ioannidis, "The History of Histograms," VLDB, pp. 19-30, 2003
Y. E. Ioannidis, "Query Optimization," Computing Surveys, Vol.28, No.1, pp. 121-123, 1996
V. Poosala and Y. E. Ioannidis, "Selectivity estimation without the attribute value independence assumption," VLDB, pp. 486-495, 1997
N. Bruno, S. Chaudhuri, and L. Gravano, "STHoles: A Multidimensional Workload-Aware Histogram," ACM SIGMOD, pp. 211-222, 2001 https://doi.org/10.1145/376284.375686
U. Srivastava, P. Haas, V. Markl, N. Megiddo, M. Kutsch, and T. Tran, "ISOMER: Consistent Histogram Construction Using Query Feedback," ICDE, pp. 39-44, 2006
S. Acharya, V. Poosala, and S. Ramaswamy, "Selectivity estimation in spatial databases," ACM SIGMOD, pp. 13-24, 1999
D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi, "Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes," ACM SIGMOD, pp. 463-474, 2000 https://doi.org/10.1145/335191.335448
V. Markl, P.J. Hass, M. Kutsch, N. Megiddo, U. Srivastava, and T. M. Tran, "Consistent selectivity estimation via maximum entropy," Journal of VLDB, Vol. 16, No.1, pp. 55-76, 2007
C. Faloutsos and I. Kamel, "Relaxing the Uniformity and Independence Assumptions Using the Concept of Fractal Dimension," Journal of Computer and System Science, Vol.55, No.2, pp. 229-240, 1997 https://doi.org/10.1006/jcss.1997.1522
M. Garofalakis and P.B. Gibbons, "Approximate Query Processing: Taming the Terabytes," Tutorial in VLDB, 2001
M. Muralikrishna and D. J. DeWitt, "Equi-depth histograms for estimating selectivity factors for multidimensional queries," ACM SIGMOD, pp. 28- 36, 1988
V. Markl, N. Megiddo, M. Kutsch, T. Tran, P. Hass, and U. Srivastava, "Consistently Estimating the Selectivity of Conjuncts of Predicates," VLDB, pp. 373-384, 2005
S. Muthukrishnan, V. Poosala, and T. Suel, "On rectangular partitionings in two dimensions: Algorithms, complexity, and applications," 7th International Conference on Database Theory, pp. 236- 256. 1999
Y. J. Choi, and C. W. Chung, "Selectivity Estimation for Spatio-Temporal Queries to Moving Objects," ACM SIGMOD, pp. 440-451, 2002
Y. Tao, J. Sun, and D. Papadias, "Selectivity Estimation for Predictive Spatio-Temporal Queries," ICDE, pp. 417-428, 2003
H. K. Park, J. H. Son, and M. H. Kim, "Dynamic histograms for future spatiotemporal range predicates," Information Sciences, Vol. 172, No. 1-2, pp. 195-214, 2005 https://doi.org/10.1016/j.ins.2004.07.007
D. Papadias, Y. Tao, G. Fu, and B. Seeger, "Progressive Skyline Computation in Database Systems," ACM Transactions on Database Systems, Vol. 30, No. 1, pp. 41-82, 2005 https://doi.org/10.1145/1061318.1061320
J. Sun, Y. Tao, D. Papadias, and G. Kollios, "Spatio-temporal Join Selectivity," Information Systems, Vol. 31, No. 8, pp. 793-813, 2006 https://doi.org/10.1016/j.is.2005.02.002
E. Frentzos, K. Gratsias, and Y. Theodoridis, "On the Effect of Location Uncertainty in Spatial Querying," IEEE Transactions on Knowledge and Data Engineering, preprint, 30 July 2008, doi: 10.1109/TKDE.2008.164
J. Roh, H. K. Park, K. W. Min, and M. H. Kim, "A Histogram Utilizing the Cluster Information," Technical Report CS/TR-2004-210, KAIST, Nov. 2004
Y. Theodoridis and M. Naseimento, "Generating Spatiotemporal Datasets on the WWW," ACM SIGMOD, pp. 39-43, 2000

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

고 밀도 영역을 이용한 향상된 2차원 히스토그램 기법

An Enhanced Two Dimensional Histogram Method Utilizing Dense Regions

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)