DOI QR코드

DOI QR Code

An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU

Coloring이 적용된 Gauss-Seidel 해법을 통한 CPU와 GPU의 연산 효율에 관한 연구

  • Yoon, Jong Seon (Dept. of Mechanical Engineering, Seoul Nat'l Univ. of Science and Technology) ;
  • Jeon, Byoung Jin (Integrative Cardiovascular Imaging Research Center, Yonsei Cardiovascular Center, College of Medicine, Yonsei Univ.) ;
  • Choi, Hyoung Gwon (Dept. of Mechanical and Automotive Engineering, Seoul Nat'l Univ. of Science and Technology)
  • 윤종선 (서울과학기술대학교 기계공학과) ;
  • 전병진 (연세대학교 의과대학 심혈관영상연구센터) ;
  • 최형권 (서울과학기술대학교 기계.자동차공학과)
  • Received : 2016.10.13
  • Accepted : 2016.11.08
  • Published : 2017.02.01

Abstract

The performance of the colored Gauss-Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss-Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss-Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.

본 연구에서는 Coloring 기법을 적용한 Gauss-Seidel 해법의 연산 성능을 분석하기 위해 2차원과 3차원 전도 열전달 문제를 다양한 격자 크기에서 해석하였다. 지배방정식의 이산화는 유한차분법과 유한요소법을 사용하였다. CPU의 경우에는 상대적으로 작은 격자계에서 연산 성능이 좋으며, 계산에 사용되는 메모리의 크기가 캐시메모리보다 크게 되면 연산 성능이 급격히 떨어진다. 반면에, GPU는 메모리 지연시간 숨김 특성으로 인하여 격자의 수가 충분히 많을 때 연산 성능이 좋다. GPU에 기반한 Colored Gauss-Seidel 해법은 단일 CPU를 이용한 연산에 비해서 각각 최대 7배의 속도 향상을 보인다. 또한, GPU 기반에서 Colored Gauss-Seidel 해법은 Jacobi 보다 약 2배 빠름을 확인하였다.

Keywords

References

  1. Wang, T., Yao, Y., Han, L., Zhang, D. and Zhang, Y., 2009, "Implementation of Jacobi Iterative Method on Graphics Processor Unit," Intelligent Computing and Intelligent Systems, IEEE International Conference, Vol. 3, pp. 324-327.
  2. Thibault, J. C. and Senocak, I., 2009, "CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows," Proceedings of the 47th AIAA aerospace sciences meeting, p. 758.
  3. Jacobsen, D. A. and Senocak, I., 2013, "Multi-level Parallelism for Incompressible Flow Computations on GPU Clusters," Parallel Computing, Vol. 39, No. 1, pp. 1-20. https://doi.org/10.1016/j.parco.2012.10.002
  4. Jacobsen, D. A., Thibault, J. C. and Senocak, I., 2010, "An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters," In 48th AIAA aerospace sciences meeting and exhibit, Vol. 16, p. 2.
  5. Kubale, M., 2004, Graph Colorings, American Mathematical Society, Rhode Island, pp. 1-20.
  6. Hsieh, C. W., Kuo, S. H., Kuo, F. A. and Chou, C. Y., 2010, "Solving Parabolic Problems Using Multithread and GPU," In International Symposium on Parallel and Distributed Processing with Applications, IEEE., pp. 75-80.
  7. Kuo, S. H., Chiu, P. H., Lin, R. K. and Lin, Y. T., 2010, "GPU Implementation for Solving Incompressible Two-phase Flows," International Journal of Information and Mathematical Sciences, Vol. 5, pp. 241-249.
  8. Li, R. and Saad, Y., 2013, "GPU-Accelerated Preconditioned Iterative Linear Solvers," The Journal of Supercomputing, Vol. 63, pp. 443-466. https://doi.org/10.1007/s11227-012-0825-3
  9. Burden, R. L. and Faires, J. D., 2011, Numerical Analysis 9th Edition, Cengage Learning, Boston, pp. 450-459.
  10. Cheng, J., Grossman, M. and McKercher, T., 2014, Professional CUDA C Programming, John Wiley&Sons, Inc., New York, p. 31.