Browse > Article
http://dx.doi.org/10.7583/JKGS.2012.12.5.67

Bandwidth Efficient Summed Area Table Generation for CUDA  

Ha, Sang-Won (Dept. of Computer Science, Yonsei Univ.)
Choi, Moon-Hee (Samsung Electronics Corp.)
Jun, Tae-Joon (Dept. of Computer Science, Yonsei Univ.)
Kim, Jin-Woo (Dept. of Computer Science, Yonsei Univ.)
Byun, Hye-Ran (Dept. of Computer Science, Yonsei Univ.)
Han, Tack-Don (Dept. of Computer Science, Yonsei Univ.)
Abstract
Summed area table allows filtering of arbitrary-width box regions for every pixel in constant time per pixel. This characteristic makes it beneficial in image processing applications where the sum or average of the surrounding pixel intensity is required. Although calculating the summed area table of an image data is primarily a memory bound job consisting of row or column-wise summation, previous works had to endure excessive access to the high latency global memory in order to exploit data parallelism. In this paper, we propose an efficient algorithm for generating the summed area table in the GPGPU environment where the input is decomposed into square sub-images with intermediate data that are propagated between them. By doing so, the global memory access is almost halved compared to the previous methods making an efficient use of the available memory bandwidth. The results show a substantial increase in performance.
Keywords
Summed area table; Integral Map; GPGPU; Parallel Prefix Scan; Parallel Algorithm;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Hensley, J., Scheuermann, T., Coombe, G., Singh, M., and Lastra, A. "Fast summed-area table generation and its applications," Computer Graphics Forum, Vol. 24, No. 3, pp 547-555, Sept. 2005.   DOI
2 Demers, J., "Depth of Field: A Survey of Techniques," GPU Gems, Addison Wesley, pp 375-390, 2004.
3 Grabner, M., Grabner, H., and Bischof, H., "Fast approximated SIFT," ACCV 2006, LNCS, Vol. 3851, pp 918-927, 2006.
4 Bay, H., Tuytelaars, T., and Gool, L. V., "SURF: Speeded Up Robust Features," ECCV 2006, LNCS, Vol. 3951, pp 404-417, 2006.
5 Harris, M., Sengupta, S., and Owens, J. D. "Parallel prefix sum (scan) with CUDA," In Nguyen, H., ed., GPU Gems 3. Addison Wesley, 2007.
6 NVIDIA CUDA C Programming Guide, Ver. 4.0, 2011.
7 Harris, M., Sengupta, S., and Owens, J.D., "Parallel Prefix Sum (Scan) with CUDA," GPU Gems 3, H. Nguyen, Addison-Wesley, Ch. 31, Aug. 2007.
8 Kogge, P. M. and Stone, S. S., "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," IEEE Trans. on Computers, Vol. C-22, No. 8, pp 786-793, 1973.   DOI
9 CUDA Data Parallel Primitives Library, http://code.google.com/p/cudpp
10 Crow, F. C. "Summed-area tables for texture mapping," In SIGGRAPH '84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, NY, NY, USA, pp 207-212, 1984.
11 Heckbert, P. S., "Filtering by Repeated Integration," ACM SIGGRAPH Computer Graphics, Vol. 20, No. 4, pp 315-321, 1986.   DOI