An Advanced Parallel Join Algorithm for Managing Data Skew on Hypercube Systems

하이퍼큐브 시스템에서 데이타 비대칭성을 고려한 향상된 병렬 결합 알고리즘

  • 원영선 (충남대학교 정보통신공학부 BK) ;
  • 홍만표 (아주대학교 정보 및 컴퓨터공학부)
  • Published : 2003.04.01

Abstract

In this paper, we propose advanced parallel join algorithm to efficiently process join operation on hypercube systems. This algorithm uses a broadcasting method in processing relation R which is compatible with hypercube structure. Hence, we can present optimized parallel join algorithm for that hypercube structure. The proposed algorithm has a complete solution of two essential problems - load balancing problem and data skew problem - in parallelization of join operation. In order to solve these problems, we made good use of the characteristics of clustering effect in the algorithm. As a result of this, performance is improved on the whole system than existing algorithms. Moreover. new algorithm has an advantage that can implement non-equijoin operation easily which is difficult to be implemented in hash based algorithm. Finally, according to the cost model analysis. this algorithm showed better performance than existing parallel join algorithms.

본 논문에서는 하이퍼큐브 시스템에서 결합 연산을 효율적으로 처리할 수 있는 향상된 병렬 결합 알고리즘을 제안한다. 새로운 알고리즘은 릴레이션 R을 처리함에 있어 하이퍼큐브 구조에 적합한 방송 알고리즘을 사용함으로써 하이퍼큐브 구조에 최적인 병렬 결합 알고리즘을 보이게 된다. 또한 병렬화 성능의 최대 주안점인 부하균등 문제와 데이타 불균형으로 인한 과부하 문제를 완전히 해결하고 결집 효과의 특성을 수용함으로써 전체 성능이 향상된다. 새로운 알고리즘은 해쉬를 기반으로 하는 알고리즘에서 구현하기 어려운 non-equijoin 연산을 쉽게 구현할 수 있다는 장점을 가지며, 비용 모형을 통해 분석한 결과 기존의 병렬 결합 알고리즘들에 비해 보다 나은 성능을 나타냄을 확인한다.

Keywords

References

  1. Soon M. Chung, Arindam Chatterjee, 'Performance Analysis of a Parallel Distributive Join Algorithm on the Intel Paragon', International Conference on Parallel and Distributed Systems, 1997 https://doi.org/10.1109/ICPADS.1997.652621
  2. H.I.Choi, B.M.Im, M.H.Kim, Y.J.Lee, 'An Efficient Parallel Join Algorithm Based on Hypercube Partitioning', Proceedings of the 3rd Conference on Parallel and Distributed Information Systems, pp50-57, 1994 https://doi.org/10.1109/PDIS.1994.331733
  3. S. Cho, Y. Wean, M. Hong, 'A Parallel Join Algorithm Using Hyper Quick Sort', Proceedings of the Ninth IASTED International Conference on Parallel and Distributed Computing and Systems, USA, pp97-106, October, 1997
  4. Soon M.Chung and Jaerheen Yang, 'A Parallel Distributive Join Algorithm for Cube Connected Multiprocessors', IEEE Transactions on Parallel and Distributed Systems, 7(2), pp127-137, 1996 https://doi.org/10.1109/71.485502
  5. D.J. DeWitt, and R. Gerber, 'Multiprocessor Hash-Based Join Algorithms', Proceedings of the 11th International Conference on Very Large Data Bases, pp151-162, August, 1985
  6. M.Negri and G.Pelagatti, 'Distributive join : A new algorithm for joining relation', ACM Transations on Database Systems, 16(4), pp655-669, 1991 https://doi.org/10.1145/115302.115299
  7. Edward R.Omiecinski, Eileen Tien Lin, 'The Adaptive-Hash Join Algorithm for A Hypercube Multicomputer', IEEE Transactions on Parallel and Distributed systems, 3(3):334-349, May 1992 https://doi.org/10.1109/71.139207
  8. Youngsun Weon, Seokbong Cho, Kyuock Lee, Youngkwon Cha, Man Pyo Hong, 'Performance Analysis of an Advanced Parallel Join Algorithm on Hypercube Systems', Journal of KISS, 26(6), 1999
  9. Patrick Valduriez,Georges Gardarin, 'Join and Semijoin Algorithms for a Multiprocessor Database Machine', ACM Transactions on Database Systems, 9(1), pp133-161, March 1984 https://doi.org/10.1145/348.318590
  10. D.J.DeWitt, R.H.Katz, F.Olken, L.D. Shapiro, M.R.Stonebraker, D.Wood, 'Implementation Techniques for Main memory database system', Proceeding of SIGMOD Conf., pp1-8, June, 1984 https://doi.org/10.1145/602259.602261
  11. Leonard D.Shapiro, 'Join Processing in Database Systems with Large Main Memories', ACM Transactions on Database Systems, 11(3), pp.239-264, September 1986 https://doi.org/10.1145/6314.6315
  12. Priti Mishra and Margaret H.Eich, 'Join Processing in Relational Databases', ACM Compuing Sunieys, 24(1), pp.63-113, March 1992 https://doi.org/10.1145/128762.128764
  13. Vipin Kumar, Introducing to parallel Computing design and analysis of parallel algorithms, The Benjamin / Cummings Publishing Company Inc., 1994
  14. Hui-I Hsiao, Ming -Syan Chen, Philip S. Yu, 'Parallel Execution of Hash Joins in Parallel Databases', IEEE Trans. Parallel and Distributed Systems, 8(8), pp872-883, Aug. 1997 https://doi.org/10.1109/71.605772
  15. Donovan A.Schneider, 'A Performance Evaluation of Four Parallel Join Algorithms in a Shared- othing Multiprocessor Environment', Proceeding of the 1989 SIGMOD Conference ACM, pp.110-121, 1989 https://doi.org/10.1145/67544.66937