DOI QR코드

DOI QR Code

Implementation of Efficient Power Method on CUDA GPU

CUDA 기반 GPU에서 효율적인 Power Method의 구현

  • 김정환 (건국대학교 컴퓨터응용과학부) ;
  • 김진수 (건국대학교 컴퓨터응용과학부)
  • Received : 2010.11.19
  • Accepted : 2010.12.27
  • Published : 2011.02.28

Abstract

GPU computing is emerging in high performance application area since it can easily exploit massive parallelism in a way of cost-effective computing. The power method which finds the eigen vector of a given matrix is widely used in various applications such as PageRank for calculating importance of web pages. In this research we made the power method efficiently parallelized on GPU and also suggested how it can be improved to enhance its performance. The power method mainly consists of matrix-vector product and it can be easily parallelized. However, it should decide the convergence of the eigen vector and need scaling of the vector subsequently. Such operations incur several calls to GPU kernels and data movement between host and GPU memories. We improved the performance of the power method by means of reduced calls to GPU kernels, optimized thread allocation and enhanced decision operation for the convergence.

GPU는 저렴한 비용으로 쉽게 대규모 데이터 병렬성을 활용할 수 있는 장점을 갖고 있어 많은 고성능 컴퓨팅 응용 분야에서 사용되고 있는 추세다. 행렬의 고유벡터를 구하는 power method는 웹 페이지의 중요도를 계산하는 PageRank 알고리즘 등 여러 응용 분야에서 활용되고 있는 방법으로써, 본 연구에서는 power method를 GPU에서 병렬화하여 구현하였으며, 성능을 최적화하기 위한 개선 방법을 제시하였다. Power method는 행렬과 벡터의 곱셈 연산이 반복적으로 수행되며 GPU에서 쉽게 병렬화가 가능하다. 그러나, 고유벡터의 수렴 여부 판단을 위한 연산 등의 작업과 다음 곱셈을 위한 벡터 크기의 조정 등의 작업이 부가적으로 필요하며, 이러한 작업은 GPU 내의 커널 코드를 여러 차례 호출하고 불필요한 데이터 이동을 유발하는 문제점이 있다. 본 연구에서는 커널 호출 회수를 줄이고 스레드 배치를 최적함과 동시에 수렴 여부 판단을 위한 연산을 최적함으로써 power method의 성능을 향상시켰다.

Keywords

References

  1. John Nickolls and William J. Dally "The GPU Computing Era," IEEE Micro, Vol. 30, Issue 2, March-April 2010.
  2. Tom R. Halfhill, "Parallel Processing with CUDA," Microprocessor Report, Jan. 2008.
  3. NVIDIA CUDA C Programming Guide, Ver. 3.1.1, Nvidia, July 2010.
  4. T. Brandvik and G. Pullan, "Acceleration of a 3D Euler Solver Using Commodity Graphics Hardware," Proc. 48th AIAA Aerospace Sciences Meeting and Exhibit, AIAA Press, 2008.
  5. J.A. Anderson, C.D. Lorenz and A. Travesset, "General Purpose Molecular Dynamics Simulations Fully Implemented on Graphics Processing Units," J. Computational Physics, Vol. 227, No. 10, May 2008.
  6. S. Ryoo et al., "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU using CUDA,'' Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, ACM Press, 2008.
  7. S.A. Johnson et al., Apparatus and Method for Imaging Objects with Wavefields, US patent 6,636,584, Patent and Trademark Office, 2003.
  8. ju Hwan Kim, Koojoo Kwon, Byeong-Seok Shin, "Large-Scale Ultrasound Volume Rendering using Bricking",Korea Society of Computer Information,No13(7) pp117-126,Dec. 2008
  9. Chinmay Karande, Kumar Chellapilla and Reid Andersen, "Speeding up Algorithms on Compressed Web Graphs," Proceedings of the Second ACM International Conference on Web Search and Data Mining, 2009.
  10. S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Computer Networks and ISDN Systems Vol. 33, No. 3, pp.107-117, 1998.
  11. Tianji Wu, Bo Wang, Yi Shan, Feng Yan, Yu Wang and Ningyi Xu, "Efficient PageRank and SpMV Computation on AMD GPUs," 39th International Conference on Parallel Processing, 2010.
  12. Imran Patel and John R. Gilbert, "An Empirical Study of the Performance and Productivity of Two Parallel Programming Models," IEEE International Symposium on Parallel and Distributed Processing, 2008.
  13. Brian Bradie, A Friendly Introduction to Numerical Analysis, Pearson Prentice Hall, 2006.
  14. J. D. Z. Bai, J. Dongarra, A. Ruhe and H. van der Vorst, "Templates for the solution of algebraic eigenvalue problems: A practical guide," In Society for Industrial and Applied Mathematics, 2000.
  15. Eun-jin Im,"An Efficient Computation of Matrix Triple Products",Korea Society of Computer Information,No11(3) pp141-149,

Cited by

  1. 실시간 3차원 레이저 레이더 영상 생성을 위한 CUDA 기반 병렬처리 소프트웨어 설계 vol.18, pp.1, 2011, https://doi.org/10.9708/jksci.2013.18.1.001
  2. RTOS 기반의 소프트웨어 2D BitBLT 엔진의 설계 vol.19, pp.4, 2011, https://doi.org/10.9708/jksci.2014.19.4.035
  3. 멀티코어형 모바일 GPU의 작업 분배 및 효율성 분석 vol.15, pp.7, 2011, https://doi.org/10.5762/kais.2014.15.7.4545