Browse > Article
http://dx.doi.org/10.5573/ieek.2013.50.4.108

Performance of the Finite Difference Method Using Cache and Shared Memory for Massively Parallel Systems  

Kim, Hyun Kyu (Div. of Computer Science and Engineering, Chonbuk National University)
Lee, Hyo Jong (Div. of Computer Science and Engineering, CAIIT, Chonbuk National University)
Publication Information
Journal of the Institute of Electronics and Information Engineers / v.50, no.4, 2013 , pp. 108-116 More about this Journal
Abstract
Many algorithms have been introduced to improve performance by using massively parallel systems, which consist of several hundreds of processors. A typical example is a GPU system of many processors which uses shared memory. In the case of image filtering algorithms, which make references to neighboring points, the shared memory helps improve performance by frequently accessing adjacent pixels. However, using shared memory requires rewriting the existing codes and consequently results in complexity of the codes. Recent GPU systems support both L1 and L2 cache along with shared memory. Since the L1 cache memory is located in the same area as the shared memory, the improvement of performance is predictable by using the cache memory. In this paper, the performance of cache and shared memory were compared. In conclusion, the performance of cache-based algorithm is very similar to the one of shared memory. The complexity of the code appearing in a shared memory system, however, is resolved with the cache-based algorithm.
Keywords
Anisodtropic diffution filter; CUDA; GPGPU; parallel processor;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 강동수, 신병석. "의료영상에서의 GPGPU활용.", 전자공학회지, 36권 5호, pp 79-87. 2009년 5월.   과학기술학회마을
2 R. T. Whitaker and X. Xinwei, "Variable-conductance, level-set curvature for image denoising," in Image Processing, 2001. Proceedings. 2001 International Conference on, 2001, pp. 142-145 vol.3.
3 G. Gerig, O. Kubler, R. Kikinis, and F. A. Jolesz, "Nonlinear anisotropic filtering of MRI data," Medical Imaging, IEEE Transactions on, vol. 11, pp. 221-232, 1992.   DOI   ScienceOn
4 NVIDIA. (2012). CUDA C BEST PRACTICES GUIDE (v4.1 ed.). http://developer.download.nvidia.com/compute/Dev Zone/docs/html/C/doc/CUDA_C_Best_Practices_G uide.pdf
5 M. Moazeni, A. Bui, and M. Sarrafzadeh, "A memory optimization technique for softwaremanaged scratchpad memory in GPUs," in Application Specific Processors, 2009. SASP '09. IEEE 7th Symposium on, pp. 43-49, 2009.
6 이호영, 박종현, 김준성. "CUDA를 이용한 FDTD 알고리즘의 병렬처리.", 전자공학회논문지-CI편, 47권 4호, pp 82-87. 2010년. 7월.   과학기술학회마을
7 Sung-In Choi, Soon-Yong Park, Jun Kim and Yong-Woon Park. "Multi-view Range Image Registration using CUDA." In: : 대한전자공학회, pp 733-736. 2008년 7월.
8 McCabe, T. J. "A Complexity Measure." Software Engineering, IEEE Transactions on SE-2(4): 308-320. 1976.   DOI
9 K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick, "Auto-tuning the 27-point stencil for multicore," presented at the In Proc. iWAPT2009: The Fourth International Workshop on Automatic Performance Tuning, 2009.
10 P. Perona and J. Malik, "Scale-space and edge detection using anisotropic diffusion," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 12, pp. 629-639, 1990.   DOI   ScienceOn
11 M. J. Black, G. Sapiro, D. H. Marimont, and D. Heeger, "Robust anisotropic diffusion," Image Processing, IEEE Transactions on, vol. 7, pp. 421-432, 1998.   DOI   ScienceOn
12 Y. Xiaosheng, W. Chengdong, J. Tong, and C. Shuo, "A time-dependent anisotropic diffusion image smoothing method," in Intelligent Control and Information Processing (ICICIP), 2011 2nd International Conference on, 2011, pp. 859-862.
13 A. Yezzi, Jr., "Modified curvature motion for image smoothing and enhancement," Image Processing, IEEE Transactions on, vol. 7, pp. 345-352, 1998.   DOI   ScienceOn
14 D. Moth. (2011). Taming GPU compute with C++ AMP. http://channel9.msdn.com/Events/ BUILD/BUILD2011/TOOL-802T
15 L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, "Larrabee: a many-core x86 architecture for visual computing," presented at the ACM SIGGRAPH 2008 papers, Los Angeles, California, 2008.
16 NVIDA. (2012). CUDA_C_Programming_Guide (v4.2 ed.). http://developer.download.nvidia.com/ compute/DevZone/docs/html/C/doc/CUDA_C_Prog ramming_Guide.pdf
17 A. Munshi. (2012). The OpenCL Specification (v1.2 rev15 ed.). http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf
18 M. Harris. (2002). The General-Purpose Computation on Graphics Hardware. http://www.gpgpu.org
19 B. R. Gaster and L. Howes, "Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck?," Computer, vol. 45, pp. 42-52, 2012.
20 P. Micikevicius, "3D finite difference computation on GPUs using CUDA," presented at the Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, Washington, D.C., 2009.
21 G. A. McMechan, "MIGRATION BY EXTRAPOLATION OF TIME-DEPENDENT BOUNDARY VALUES*," Geophysical Prospecting, vol. 31, pp. 413-420, 1983.   DOI   ScienceOn
22 K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick, "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures," presented at the Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Austin, Texas, 2008.
23 NVIDIA. (2009). FERMI Compute Architecture White Paper (v1.1 ed.). http://www.nvidia.com/content/PDF/fermi_white_ papers/NVIDIA_Fermi_Compute_Architecture_Whi tepaper.pdf
24 A. P. Witkin, "Scale-space filtering," presented at the Proceedings of the Eighth international joint conference on Artificial intelligence - Volume 2, Karlsruhe, West Germany, 1983.