AVX-512를 활용한 인텔 차세대 프로세서에서의 효과적인 프로그래밍 방법

Choe, Jae-Yeong;Kim, Rae-Hyeon;Im, Rok-Taek;

Korea Information Processing Society Review (정보처리학회지)

Volume 25 Issue 1
/
Pages.68-77
/
2018
/
1226-9182(pISSN)
/
2734-0376(eISSN)

Korea Information Processing Society (한국정보처리학회)

AVX-512를 활용한 인텔 차세대 프로세서에서의 효과적인 프로그래밍 방법

최재영 (숭실대학교) ;
김래현 (숭실대학교) ;
임록택 (숭실대학교)

Published : 2018.01.31

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Keywords

References

Goto, K., van de Geijn, R.A. "Anatomy of high-performance matrix multiplication", ACM Transactions on Mathematical Software (TOMS) 34(3), 12 (2008) https://doi.org/10.1145/1356052.1356053
Gunnels, J.A., Henry, G.M., Van De Geijn, R.A. "A family of highperformance matrix multiplication algorithms.", In: International Conference on Computational Science, pp. 51-60. Springer (2001)
Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R., Henry, G., Shet, A.G., Chrysos, G., Dubey, P. "Design and implementation of the linpack benchmark for single and multi-node systems based on Intel Xeon Phi Coprocessor" In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pp.126-137. IEEE (2013)
"Intel Intrinsics Guide." Software.intel.com. (2018). [online] Available at: https://software.intel.com/sites/landingpage/IntrinsicsGuide/ [Accessed 22 Mar. 2018].
Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann (2016)
Lim, R., Lee, Y., Kim, R., Choi, J. "An Implementation of matrix-matrix multiplication on the Intel KNL processor with AVX-512." In: Cluster Computing (Submitted)
Peyton, J.L. "Programming dense linear algebra kernels on vectorized architectures." Master's thesis, The University of Tennessee, Knoxville (2013)
Van Zee, F. G., van de Geijn, R. A. "BLIS: A Framework for Rapidly Instantiating BLAS Functionality" In: ACM Trans. Math. Softw., 41(3), pp.1-33. ACM (2015)
Xianyi, Z., Qian, W., Yunquan, Z. "Model-driven level 3 BLAS performance optimization on Loongson 3A processor" In: Parallel and Distributed Systems, 2012 IEEE 18th International Conference, pp. 684-691. IEEE (2012)