Estimating The Number of Hierarchical Distinct Values using Arrays of Attribute Value Intervals

Song, Ha-Joo;Kim, Hyoung-Joo;

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Volume 6 Issue 2
/
Pages.265-273
/
2000
/
1229-7712(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Estimating The Number of Hierarchical Distinct Values using Arrays of Attribute Value Intervals

속성값 구간 배열을 이용한 계층 상이값 갯수의 계산 기법

송하주 (서울대학교 컴퓨터공학부) ;
김형주 (서울대학교 컴퓨터공학부)

Published : 2000.04.30

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In relational database management systems(RDBMS), a table consIn relational database management systems(RDBMS), a table consists of sets of records which are composed of a set of attributes. The number of distinct values(NDV) of an attribute denotes the number of distinct attribute values that actually appear in the database records, and is widely used in optimizing queries and supporting statistic queries. Object-relational database management systems(ORBBMSS), however, support the inheritance between tables which enforces an attribute defined in a super-table to be inherited in sub-tables automatically. Hence, in ORDBMSS, not only NDV of an attribute In a single table but also NDV of an attribute in multiple tables(HNDV) is needed. In this paper, we propose a method that calculates HNDV using arrays of attribute value intervals. In this method, an array of attribute value intervals is created for an attribute of interest In each table in a table hierarchy, and HNDV can be calculated or estimated by merging the arrays of attribute value intervals. The proposed method accurately calculates HNDV using small additional storage space and is efficient for an environment where only some of the tables in a table hierarchy are frequently updated.

관계형 데이타베이스 시스템의 각 테이블은 레코드의 집합이며 각 레코드는 일련의 속성들의 집합으로 이루어진다. 속성에 대한 상이값수란 레코드의 속성에 대해 실제로 데이타베이스 내에 사용되고 있는 서로 다른 속성값의 개수를 나타내며 질의 최적화나 통계적 질의의 지원에 유용하게 사용된다. 한편 기존 관계형 데이타베이스 시스템과는 달리 객체-관계 데이타베이스 시스템은 테이블간의 계승 관계를 지원하므로 상위 테이블에서 정의된 속성을 하위 테이블에서 계승받게 된다. 따라서 상이값수 또한 단일 테이블에 관한 정보뿐만 아니라 하위 테이블의 속성 정보를 모두 반영하는 계층 상이값수가 필요하다. 본 논문은 기존 상이값수 측정 방법을 그대로 사용하되 계층 상이값수를 계산하는 방법으로써 속성값 구간 배열을 이용하는 기법을 제안한다. 이 기법은 해당 테이블과 하위 테이블에 대하여 각각 속성값 구간 배열을 구성하고 그것을 합병함으로써 계층 상이값수를 계산한다. 제안하는 기법은 작은 양의 저장 공간만을 사용하여 계층 상이값수를 정확히 구할 수 있게 하며 계층 내의 각 테이블에 대한 갱신 연산이 불균등하게 이루어지는 환경에서 더욱 효과적으로 이용될 수 있다.

Keywords

References

Y. E. Ioannidis. Query optimization. The Computer Science and Engineering Handbook, pages 1038-1057, 1997
V. Poosala. Histogram-based Estimation Techniques in Database Systems. PhD thesis, University of Wisconsin Madison, 1997
P. J. Haas, J. F. Naughton, S. Seshadri, and L. Stokes. Sampling-based estimation of the number of distinct values of attribute. Proceedings of the International Conference on Very Large Data Bases, pages 311-322, 1995
S. Chaudhuri, R. Motwani, and V. R. Narasayya. Random sampling for histogram construction: How much is enough? Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 436-447, 1998 https://doi.org/10.1145/276305.276343
M. Stonebraker, P. Brown, and D. Moore. Object-Relational DBMSs : Tracking the Next Great Wave. Morgan Kaufmann Publishers, 1998
W. Kim. Introduction to Object-Oriented Databases. The MIT Press, 1990
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 82-93, 1979 https://doi.org/10.1145/582095.582099
M. V. Mannino, P. Chu, and T. Sager. Statistical profile estimation in database systems. ACM Computing Surveys, pages 191-221, 1995 https://doi.org/10.1145/62061.62063
Y. Ioannidis and V. Poosala. Histogram-based solutions to diverse database estimation problems. Data Engineering Bulletin, 18(3):10-18, 1995
W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Statistical estimators for relational algebra expressions. Proceedings of ACM SIGACTSIGMOD-SIGART symposium on principles of Database Systems, pages 276-287, 1988 https://doi.org/10.1145/308386.308455
W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Processing aggregate relational queries with hard time constraints. Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 68-77, 1989 https://doi.org/10.1145/66926.66933
W.-C. Hou and G. Ozsoyoglu. Statistical estimators for aggregate relational algebra queries. ACM Trans. Database Syst., 16(4):600-654, 1991 https://doi.org/10.1145/115302.115300

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Estimating The Number of Hierarchical Distinct Values using Arrays of Attribute Value Intervals

속성값 구간 배열을 이용한 계층 상이값 갯수의 계산 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)