Efficient Processing of Grouped Aggregation on Non-Uniformed Memory Access Architecture

비균등 메모리 접근 구조에서의 효율적인 그룹화 집단 연산의 처리

  • 최성준 (한국기술교육대학교 컴퓨터공학과) ;
  • 민준기 (한국기술교육대학교 컴퓨터공학부)
  • Received : 2018.11.07
  • Accepted : 2018.12.07
  • Published : 2018.12.31

Abstract

Recently, to alleviate the memory bottleneck problme occurred in Symmetric Multiprocessing (SMP) architecture, Non-Uniform Memory Access (NUMA) architecture was proposed. In addition, since an aggregation operator is an important operator providing properties and summary of data, the efficiency of the aggregation operator is crucial to overall performance of a system. Thus, in this paper, we propose an efficient aggregation processing technique on NUMA architecture. Our proposed technique consists of partition phase and merge phase. In the partition phase, the target relation is partitioned into several partial relations according to grouping attribute. Thus, since each thread can process aggregation operator on partial relation independently, we prevent the remote memory access during the merge phase. Furthermore, at the merge phase, we improve the performance of the aggregation processing by letting each thread compute aggregation with a local hash table as well as avoiding lock contention to merge aggregation results generated by all threads into one.

최근, 대칭형 다중 처리 (SMP: Symmetric Multiprocessing) 구조에서 발생하는 메모리 병목 현상을 보완하기 위하여 비균등 메모리 접근 구조 (NUMA: Non-Uniform Memory Access) 구조가 제시되었다. 또한, 집단 연산자는 데이터의 특성 및 요약 정보를 제공하는 주요 연산자로써, 집단 연산자의 효율성은 전체 시스템의 성능에 매우 큰 영향을 미친다. 따라서, 본 논문에서는 NUMA 구조에서 효율적으로 집단 연산을 처리할 수 있는 기법을 제안한다. 제안 기법은 분할 단계와 합병 단계로 구성되며, 분할 단계에서 그룹 속성에 따라서 대상 릴레이션을 부분 릴레이션들로 분할한다. 따라서, 각 쓰레드가 독립적으로 부분 릴레이션에 대하여 집단 연산을 수행할 수 있으므로 합병 단계에서 원격 메모리 접속이 발생하지 않도록 하였다. 또한, 합병 단계에서는 각 쓰레드가 지역 해시 테이블을 이용하여 집단 연산을 수행하도록 하고 쓰레드들이 생성한 집단 연산 결과들을 하나로 병합하는데 잠금 경쟁이 발생하지 않도록 하여 처리 성능을 향상하였다.

Keywords

Acknowledgement

Supported by : 한국기술교육대학교

References

  1. Li Wang, Minqi Zhou, Ming-Chien Shan, and Aoying Zhou, "NUMA-aware scalable and efficient in-memory aggregation on large domains", IEEE Transactions on Knowledge and Data Engineering 27(4) pp. 1071-1084, 2015. https://doi.org/10.1109/TKDE.2014.2359675
  2. Mahesh Kumar Behera, Kalyan S, Prasanna Venkatesh and Antoni Wolski, "SINCA: Scalable in-memory event aggregation using clustered operators", Proceddings of IEEE International Conference on Data Engineering Workshops (ICDEW), pp. 210-215, 2015
  3. Ambuj Shatdal and Jeffrey F. Naughton. "Adaptive parallel aggregation algorithms." ACM SIGMOD Record, 24(2), pp. 104-115, 1995. https://doi.org/10.1145/568271.223801
  4. John Cieslewicz and Kenneth A. Ross. "Adaptive aggregation on chip multiprocessors." Proceedings of the 33rd international conference on Very large data bases, pp. 339-350, 2007.
  5. Jingren Zhou and Kenneth A. Ross, "Implementing database operations using SIMD instructions", Proceedings of the ACM SIGMOD international conference on Management of data, pp. 145-156, 2002.
  6. Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross, "Rethinking SIMD vectorization for in-memory databases", Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, pp. 1493-1508, 2015.
  7. Andre C. Nacul, Francesco Regazzoni, and Marcello Lajolo, "Hardware scheduling support in SMP architectures", Proceedings of the conference on Design, automation and test in Europe, pp. 642-647, 2007.
  8. Yinan Li, Ippokratis Pandis, Rene Mueller, Vijayshankar Raman and Guy Lohman, "NUMA-aware algorithms: the case of data shuffling", Proceedings of Biennial Conference on Innovative Data Systems Research (CIDR), 2013.
  9. V. Leis, P. Boncz, A. Kemper and T. Neumann, "Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age", Proceedings of the ACM SIGMOD international conference on Management of data, pp. 743-754, 2014.
  10. Intel Coperation, "An Introduction to the Intel QuickPath Interconnect", 2009.
  11. S. R. Cho, J. Lee, S. W. Hwang, H. Han and S. W. Lee, "VSkyline: Vectorization for Efficient Skyline Computation", ACM SIGMOD Record, 39(2), pp. 19-26, 2010. https://doi.org/10.1145/1893173.1893176
  12. O. Polychroniou and K. A. Ross, "Vectorized Bloom Filters for Advanced SIMD Processors", Proceddings of DaMoN, pp. 6, 2014.
  13. Hiroshi Inoue, Moriyoshi Ohara, Kenjiro Taura, "Faster set intersection with SIMD instructions by reducing branch mispredictions", Proceedings of the VLDB Endowment, 8(3), pp. 293-304, 2014. https://doi.org/10.14778/2735508.2735518
  14. Martina-Cezara Albutiu, Alfons Kemper and Thomas Neumann. "Massively parallel sort-merge joins in main memory multi-core database systems", Proceedings of the VLDB Endowment, 5(10) pp. 1064-1075, 2012. https://doi.org/10.14778/2336664.2336678