[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KTCCS.2022.11.5.133

A Performance Study on CPU-GPU Data Transfers of Unified Memory Device

Kwon, Oh-Kyoung (한국과학기술정보연구원 슈퍼컴퓨팅본부)
Gu, Gibeom (한국과학기술정보연구원 슈퍼컴퓨팅본부)

Publication Information

KIPS Transactions on Computer and Communication Systems / v.11, no.5, 2022 , pp. 133-138 More about this Journal

Abstract

Recently, as GPU performance has improved in HPC and artificial intelligence, its use is becoming more common, but GPU programming is still a big obstacle in terms of productivity. In particular, due to the difficulty of managing host memory and GPU memory separately, research is being actively conducted in terms of convenience and performance, and various CPU-GPU memory transfer programming methods are suggested. Meanwhile, recently many SoC (System on a Chip) products such as Apple M1 and NVIDIA Tegra that bundle CPU, GPU, and integrated memory into one large silicon package are emerging. In this study, data between CPU and GPU devices are used in such an integrated memory device and performance-related research is conducted during transmission. It shows different characteristics from the existing environment in which the host memory and GPU memory in the CPU are separated. Here, we want to compare performance by CPU-GPU data transmission method in NVIDIA SoC chips, which are integrated memory devices, and NVIDIA SMX-based V100 GPU devices. For the experimental workload for performance comparison, a two-dimensional matrix transposition example frequently used in HPC applications was used. We analyzed the following performance factors: the difference in GPU kernel performance according to the CPU-GPU memory transfer method for each GPU device, the transfer performance difference between page-locked memory and pageable memory, overall performance comparison, and performance comparison by workload size. Through this experiment, it was confirmed that the NVIDIA Xavier can maximize the benefits of integrated memory in the SoC chip by supporting I/O cache consistency.

Keywords

HPC; GPU; Unified Memory; Data Transfer;

Citations & Related Records

Reference

1	Anshuman Bhat, CUDA on Xavier, GTC 2018 [Internet], http://on-demand.gputechconf.com/gtc/2018/presentation/s8868-cuda-on-xavier-what-is-new.pdf.
2	O. K. Kwon and G. Gu, "A performance study on CPU-GPU data transfers of NVIDIA tegra and tesla GPUs," Proceedings of Annual Conference of KIPS 2021, pp.39-42, 2021.
3	CUDA C++ Programming Guide [Internet], https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
4	CUDA for Tegra [Internet], https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html.
5	R. S. Santos, D. M. Eler, and R. E. Garcia, "Performance evaluation of data migration methods between the host and the device in CUDA-based programming," Information Technology: New Generations, pp.689-700, 2016.
6	Nikolay Sakharnykh, Everything You Need Toknow about Unified Memory, GTC 2018 [Internet], https://on-demand.gputechconf.com/gtc/2018/presentation/s8430-everythingyou-need-to-know-about-unified-memory.pdf.
7	P. Wang, J. Wang, C. Li, J. Wang, H. Zhu, and Guo, M. "Grus: Toward unified-memory-efficient high-performance graph processing on GPU," ACM Transactions on Architecture and Code Optimization (TACO), Vol.18, No.2, pp.1-25, 2021.
8	S. Chien, I. Peng, and S. Markidis, "Performance evaluation of advanced features in CUDA unified memory," 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC), pp.8-18, Nov. 2019.

KSCI

A Performance Study on CPU-GPU Data Transfers of Unified Memory Device 통합메모리 장치에서 CPU-GPU 데이터 전송성능 연구

A Performance Study on CPU-GPU Data Transfers of Unified Memory Device