DOI QR코드

DOI QR Code

Evaluating Join Performance on Relational Database Systems

  • Received : 2010.10.25
  • Accepted : 2010.12.14
  • Published : 2010.12.31

Abstract

The join operator is fundamental in relational database systems. Evaluating join queries on large tables is challenging because records need to be efficiently matched based on a given key. In this work, we analyze join queries in SQL with large tables in which a foreign key may be null, invalid or valid, given a referential integrity constraint. We conduct an extensive join performance evaluation on three DBMSs. Specifically, we study join queries varying table sizes, row size and key probabilistic distribution, inserting null, invalid or valid foreign key values. We also benchmark three well-known query optimizations: view materialization, secondary index and join reordering. Our experiments show certain optimizations perform well across DBMSs, whereas other optimizations depend on the DBMS architecture.

Keywords

References

  1. CHAUDHURI, S. 1998. An overview of query optimization in relational systems. In Proc. ACM PODS Conference. 84-93.
  2. CODD, E. 1979. Extending the database relational model to capture more meaning. ACM TODS 4, 4, 397-434. https://doi.org/10.1145/320107.320109
  3. ELMASRI, R. AND NAVATHE, S. B. 2000. Fundamentals of Database Systems, 3rd ed. Addison/Wesley, Redwood City, California.
  4. GARCIA-GARCIA, J. AND ORDONEZ, C. 2008. Estimating and bounding aggregations in databases with referential integrity errors. In Proc. ACM DOLAP Workshop. 49-56.
  5. GARCIA-MOLINA, H., ULLMAN, J., AND WIDOM, J. 2001. Database Systems: The Complete Book, 1st ed. Prentice Hall.
  6. GELDER, A. V. 1993. Multiple join size estimation by virtual domains (extended abstract). In Proc. ACM PODS Conference. 180-189.
  7. HURSON, A. R., MILLER, L. L., AND PAKZAD, S. H. 1987. Incomplete information and the join operation in database machines. In ACM '87: Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow. IEEE Computer Society Press, Los Alamitos, CA, USA, 436-443.
  8. IOANNIDIS, Y. E. AND KANG, Y. 1990. Randomized algorithms for optimizing large join queries. In Proc. ACM SIGMOD Conference. 312-321.
  9. KABRA, N. AND DEWITT, D. J. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proc. ACM SIGMOD Conference. 106-117.
  10. KELLER, A. AND WINSLETT, M. 1984. Approaches for updating databases with incomplete information and nulls. In Proc. IEEE ICDE Conference.
  11. LIM, E. AND CHIANG, R. 2000. The integration of relationship instances from heterogeneous databases. Decis. Support Syst. 29-2, 3-4, 153-167. https://doi.org/10.1016/S0167-9236(00)00070-1
  12. MISHRA, P. AND EICH, M. H. 1992. Join processing in relational databases. ACM Comput. Surv. 24, 1, 63-113. https://doi.org/10.1145/128762.128764
  13. ORDONEZ, C. AND GARCIA-GARCIA, J. 2008. Referential integrity quality metrics. Decision Support Systems Journal 44, 2, 495-508. https://doi.org/10.1016/j.dss.2007.06.004
  14. SWAMI, A. 1989. Optimization of large join queries: combining heuristics and combinatorial techniques. In Proc. ACM SIGMOD Conference. 367-376.
  15. TAY, Y. C. 1990. On the optimality of strategies for multiple joins. In Proc. ACM PODS Conference. 124-131.
  16. YUAN, L. AND CHIANG, D. 1998. A sound and complete query evaluation algorithm for relational databases with null values. In Proc. ACM SIGMOD Conference.