Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2002.9D.5.755

A Pipelined Hash Join Method for Load Balancing  

Moon, Jin-Gue (국방과학연구소)
Park, No-Sang (한국기계연구원 전산실)
Kim, Pyeong-Jung (도립충북과학대학 컴퓨터정보과학과)
Jin, Seong-Il (충남대학교 정보통신공학부)
Abstract
We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new hash join methods with load balancing capabilities. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets adaptively via a frequency distribution. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join processing without staying on disks. Unless the pipelining execution of multiple hash joins includes some load balancing mechanisms, the skew effect can severely deteriorate system performance. In this paper, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.
Keywords
Parallel Database; Pipelined Hash Join; Data Skew; Load Balance;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. Mishra and M. H. Eich, 'Join Processing in Relational Databases,' ACM Computing Surveys, Vol.24, No.1, pp.63-113, March, 1992   DOI
2 G. Graefe, 'Query Evaluation Techniques for Large Databases,' ACM Computing Surveys, Vol.25, No.2, pp.73-170, June, 1993   DOI   ScienceOn
3 D. J. DeWitt and J. Gray, 'Parallel Database Systems: The Future of High Performance Database Systems,' Comm. ACM, Vol.35, No.6, pp.85-98, June, 1992   DOI
4 E. Rahm, 'Parallel Query Processing in Shared Disk Database Systems,' Proc. Of Intl. Conf. On Management of Data, ACM SIGMOD, pp.32-37, 1993
5 P. Scheuermann, G. Weikum, and P. Zabback, 'Data partitioning and load balancing in parallel disk systems,' VLDB Journal, pp.48-66, 1998   DOI
6 D. A. Schneider and D. J. DeWitt, 'A Performance Evaluation of Four Parallel Join Algorithms in a Shared-nothing Multiprocessor Environment,' Proc. SIGMOD Conf., pp. 110-121, May, 1989   DOI
7 D. Schneider and D. J. DeWitt, 'Tradeoffs in Processing Complex Join Queries via Hashing in Multicomputer Database Machines,' Proc. 16th Int'l Conf. VLDB, pp.469-480, August, 1990
8 N. Roussopoulos and H. Kang, 'A Pipeline n-way Join Algorithm based on the 2-way Semijoin Program,' IEEE Trans. Knowledge and Data Engineering, Vol.3, No.4, pp. 461-473, December, 1991   DOI   ScienceOn
9 A. Wilschut and P. Apers, 'Dataflow query execution in parallel main memory environment,' Proc. First Conf. Parallel and Distributed Information Systems, pp.68-77, December, 1991
10 M-S. Chen, M. La, P. S. Yu, and H. C. Young, 'Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins,' IEEE Trans. Knowledge and Data Engineering, Vol. 7, No.4, pp.656-668, August, 1995   DOI   ScienceOn
11 H-I. Hsiao and M-S. Chen, 'Parallel Execution of Hash Joins in Parallel Databases,' IEEE Trans. Parallel and Distributed Systems, Vol.8, No.8, pp.872-883, August, 1997   DOI   ScienceOn
12 M-S. Chen and P. S. Yu, 'Interleaving a Join Sequence with Semijoins in Distributed Query Processing,' IEEE Trans. Parallel and Distributed Systems, Vol.3, No.5, pp. 611-621, September, 1992   DOI   ScienceOn
13 K. A. Hua, C. Lee, and C. M. Hua, 'Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning,' IEEE Trans. Knowledge and Data Engineering, Vol.7, No.6, pp.968-983, December, 1995   DOI   ScienceOn
14 D. J. DeWitt, J. F. Naughton, D. A. Schneider, and S. Seshadri, 'Practical Skew Handling in Parallel-joins,' Proc. Int'l Conf. VLDB, pp.27-40, August, 1992
15 M-S. Chen, H-I. Hsiao, and P. S. Yu, 'Applying Hash Filters to Improving the Execution of Bushy Trees,' Proc. 14th Int'l Conf. VLDB, pp.505-516, August, 1993
16 M. S. Lakshmi and P. S. Yu, 'Effectiveness of Parallel Joins,' IEEE Trans. Knowledge and Data Engineering, Vol. 2, No.4, pp.410-424, December, 1990   DOI   ScienceOn