Browse > Article

Performance Evaluation of Hash Join Algorithm on Flash Memory SSDs  

Park, Jang-Woo (성균관대학교 임베디드 소프트웨어학과)
Park, Sang-Shin (성균관대학교 임베디드 소프트웨어학과)
Lee, Sang-Won (성균관대학교 정보통신공학부)
Park, Chan-Ik (삼성전자 Flash WS 그룹)
Abstract
Hash join is one of the core algorithms in databases management systems. If a hash join cannot complete in one-pass because the available memory is insufficient (i.e., hash table overflow), however, it may incur a few sequential writes and excessive random reads. With harddisk as the tempoary storage for hash joins, the I/O time would be dominated by slow random reads in its probing phase. Meanwhile, flash memory based SSDs (flash SSDs) are becoming popular, and we will witness in the foreseeable future that flash SSDs replace harddisks in enterprise databases. In contrast to harddisk, flash SSD without any mechanical component has fast latency in random reads, and thus it can boost hash join performance. In this paper, we investigate several important and practical issues when flash SSD is used as tempoary storage for hash join. First, we reveal the va patterns of hash join in detail and explain why flash SSD can outperform harddisk by more than an order of magnitude. Second, we present and analyze the impact of cluster size (i.e., va unit in hash join) on performance. Finally, we emperically demonstrate that, while a commerical query optimizer is error-prone in predicting the execution time with harddisk as temporary storage, it can precisely estimate the execution time with flash SSD. In summary, we show that, when used as temporary storage for hash join, flash SSD will provide more reliable cost estimation as well as fast performance.
Keywords
Flash memory; Hash join; Overflow; Cluster; Query Optimize;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Hansjorg Zeller, Jim Gray, TANDEM COMPUTERS, "Hash join algorithms in a multiuser environment," technical report 90.4, Fed. 1990.
2 L. M.Haas, M. J.Carey, M. Livny, and A. Shukla, "Sing the truth about ad hoc join costs," The VLDB journal-The International journal on Very Lar ge Data Bases, vol.6, no.3, pp.241-256, 1997.   DOI   ScienceOn
3 "blktrace." http://linux.die.net/man/8/blktrace.
4 "seekwatcher." http://oss.oracle.com/mason/seekwatcher
5 J Do and J. M.Patel, "Join processing for flash ssds:remembering past lessons," Data Management On New Hardware, pp.1-8, 2009.
6 G.Graefe, Univ.of Colorado, Boulder, USA, Dept. Comput. Sci., "Parallel external sorting in volcano," technical report 459, June 1990.
7 L.D.Shapiro, "Join processing in database systems with large main memories," ACM Trans.Database Syst., vol.11, no.3, pp.239-264, 1986.   DOI
8 S. W.Schlosser, J Schindler, S. Papado-manolakis, M. Shao, A. Ailamaki, and G. R.Christos Faloutsos, "On multidimensional data and modem disks," Proceedings of the 4th coriference on USENIX Coriference on File and Storage Technologies - voI.4,pp.225-238, 2005.
9 S.-W. LEE, B. Moon, C. Park, J.-M. Kim, and S.-W. Kim, "A case for flash memory ssd in enterprise database applications," in SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1075-1086, 2008.
10 Oracle, "Oracle enterprise manager database tuning with the oracle tuning pack," http:;/download.oracle.com/docs/B1050101/em.920/ a86647/index.htm.
11 Center of Expertise, Oracle Worldwide Customer Support., "Hash joins, implementation and tuning release 7.3," technical report, Mar. 1997.
12 G. Grafe, A. Linville, and L.D.Shapiro, "Sort versus hash revisited," IEEE Transactions on Knowledge and Data Engineering, vol.6, no.6, pp. 934-944, 1994.   DOI   ScienceOn