DOI QR코드

DOI QR Code

A Performance Comparison of Parallel Programming Models on Edge Devices

엣지 디바이스에서의 병렬 프로그래밍 모델 성능 비교 연구

  • Received : 2023.06.18
  • Accepted : 2023.08.17
  • Published : 2023.08.31

Abstract

Heterogeneous computing is a technology that utilizes different types of processors to perform parallel processing. It maximizes task processing and energy efficiency by leveraging various computing resources such as CPUs, GPUs, and FPGAs. On the other hand, edge computing has developed with IoT and 5G technologies. It is a distributed computing that utilizes computing resources close to clients, thereby offloading the central server. It has evolved to intelligent edge computing combined with artificial intelligence. Intelligent edge computing enables total data processing, such as context awareness, prediction, control, and simple processing for the data collected on the edge. If heterogeneous computing can be successfully applied in the edge, it is expected to maximize job processing efficiency while minimizing dependence on the central server. In this paper, experiments were conducted to verify the feasibility of various parallel programming models on high-end and low-end edge devices by using benchmark applications. We analyzed the performance of five parallel programming models on the Raspberry Pi 4 and Jetson Orin Nano as low-end and high-end devices, respectively. In the experiment, OpenACC showed the best performance on the low-end edge device and OpenSYCL on the high-end device due to the stability and optimization of system libraries.

Keywords

Acknowledgement

본 논문은 정부 (과학기술정보통신부)의 재원으로 한국연구재단의 지원 (No. RS-2023-00211606)과 정보통신기획평가원의 지역지능화혁신인재양성사업의 지원 (No. IITP-2023-RS-2022-00156389)으로 연구하였음

References

  1. V. G. Cerf, "On Heterogeneous Computing," Communication of the ACM, Vol. 64, No. 21, pp. 9, 2021.
  2. OpenCL, https://www.khronos.org/opencl/
  3. S. Mendez, "Edge Computing Systems with Kubernetes," Packt Publishing, 2022.
  4. J. Diaz, C. Munoz-Caro, A. Nino, "A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era," IEEE Transactions on Parallel and Distributed Systems, Vol. 23, No. 8, pp. 1369-1386, 2012. https://doi.org/10.1109/TPDS.2011.308
  5. CUDA, https://developer.nvidia.com/cuda-toolkit
  6. SYCL, https://www.khronos.org/sycl/
  7. T. Deakin, J. Price, M. Martineau, S. McIntosh-Smith, "Evaluating Attainable Memory Bandwidth of Parallel Programming Models Via BabelStream," International Journal of Computational Science and Engineering, Vol. 17, No. 3, pp. 247-262, 2018. https://doi.org/10.1504/IJCSE.2018.095847
  8. Raspberry Pi, https://www.raspberrypi.com/products/raspberry-pi-4-model-b/
  9. Jetson Orin Nano, https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/
  10. A. Alpay, V. Heuveline, "One Pass to Bind Them: The First Single-Pass SYCL Compiler with Unified Code Representation Across Backends," in Proceedings of the 2023 International Workshop on OpenCL, Article 7, 2023.
  11. OpenMP, https://www.openmp.org/
  12. OpenACC, https://www.openacc.org/
  13. D. Angus, S. Georgiev, H. A. Gonzalez, J. Riordan, P. Keir, M. Goli, "Porting SYCL Accelerated Neural Network Frameworks to Edge Devices," in Proceedings of the 2023 International Workshop on OpenCL, Article No. 4, 2023.
  14. J. Y. Park, J. H. Hong, K. S. Chung "Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL," IEMEK J. Embed. Sys. Appl., Vol. 11, No. 6, pp. 325-334, 2016 (in Korean). https://doi.org/10.14372/IEMEK.2016.11.6.325
  15. S. Memeti, L. Li, S. Pllana, J. Kolodziej, C. Kessler, "Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption," in Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, pp. 1-7, 2017.
  16. Codeplay ComputeCpp, https://developer.codeplay.com/products/computecpp/ce/home
  17. Intel oneAPI DPC++, https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html
  18. OpenSYCL, https://github.com/OpenSYCL/OpenSYCL
  19. STREAM Benchmark, https://www.cs.virginia.edu/stream/
  20. VC4CL: OpenCL for VideoCore IV GPU, https://github.com/doe300/VC4CL
  21. strace: Linux syscall tracer, https://strace.io/
  22. JetPack SDK, https://developer.nvidia.com/embedded/jetpack
  23. Portable Computing Language, http://portablecl.org/
  24. NVIDIA HPC SDK, https://developer.nvidia.com/hpc-sdk
  25. C. Feng, P. Han X. Zhang, B. Yang, Y. Liu, L. Guo, "Computation Offloading in Mobile Edge Computing Networks: A Survey," Journal of Network and Computer Applications, Vol. 202, 103366, 2022.