Browse > Article
http://dx.doi.org/10.9728/dcs.2017.18.3.613

Correct Implementation of Sub-warp Parallel Prefix Operations based on GPU Hardware Architecture  

Park, Taejung (Department of Digital Media, Duksung Women's University)
Publication Information
Journal of Digital Contents Society / v.18, no.3, 2017 , pp. 613-619 More about this Journal
Abstract
This paper presents a CUDA (Compute Unified Device Architecture) code to achieve correct GPU parallel segmented prefix operation results with less than 32 segment length for large data arrays. Mark Harris and Michael Garland had published CUDA code to address the tasks. This paper shows that their code does not generate correct results when the local segment length is less than 32, discusses the cause of the problem, and presents a CUDA code that generates correct results. The segmented parallel prefix operation presented in this paper can be applied as a building block to various large parallel processing algorithms including the k-nearest neighbor search problems.
Keywords
CUDA; GPGPU; Parallel prefix operation; Segmented exclusive scan;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 wikipedia. Available: https://en.wikipedia.org/wiki/Prefix_sum
2 Fermi architecture white paper. Available: http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf
3 M. Harris, M. Garland, and W. Hwu (editor-in-chiefs), GPU Computing Gems Jade Edition, 1st ed. Morgan Kaufmann Pub., ch. 3, pp. 29-38, 2011.
4 Parallel Prefix Sum on the GPU (Scan). Available: http://www.umiacs.umd.edu/-ramani/cmsc828e_gpusci/ScanTalk.pdf
5 CUDA C Programming guide. Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz4jURrxaId
6 Mark Harris, GPU Gems 3, ch. 39. "Parallel Prefix Sum (Scan) with CUDA". Available: https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch39.html
7 T. Park, "Analysis of Morton Code Conversion for 32 Bit IEEE 754 Floating Point Variables", The Journal of Digital Contents Society, Vol. 17, No. 3, pp. 165-172, June 2016.   DOI
8 S. Li, L. Simons, J. B. Pakaravoor, F. Abbasinejad, J. D. Owens, and N. Amenta, "kANN on the GPU with shifted sorting,". In Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics (EGGH-HPG'12), Switzerland, pp. 39-47, 2012.
9 J. Cheng, M. Grossman, and T. McKercher, Professional CUDA C Programming, 1st ed. Wrox, pp. 90-93, 2014.
10 CUDA Toolkit documentation. Available: http://docs.nvidia.com/cuda/
11 J. Cheng, M. Grossman, and T. McKercher, Professional CUDA C Programming, 1st ed. Wrox, pp. 84-87, 2014.