Browse > Article
http://dx.doi.org/10.15701/kcgs.2017.23.2.1

Accelerating Medical Image Processing on Integrated GPU Using OpenCL  

Kim, Beom-Jun (Dept. of Computer Engineering, Inha Univ.)
Shin, Byeong-seok (Dept. of Computer Engineering, Inha Univ.)
Abstract
A variety of filters are applied to improve the quality of noise and low resolution medical images. This is necessary to reduce the radiation dose of the patient and to improve the utilization of the conventional spherical imaging equipment. In the conventional method, it is common to perform filtering using the CPU of the PC. However, it is difficult to produce results in real time by applying various calculations and filters to high-resolution human images using only the CPU performance of a PC used in a hospital. In this paper, we analyze the structure and performance of Intel integrated GPU in CPU and propose a method to perform image filtering using OpenCL parallel processing function. By applying complex filters with high computational complexity to medical images, high quality images can be generated in real time.
Keywords
Medical Image; SLM(Shared Local Memory); integrated GPU; L3 cache;
Citations & Related Records
연도 인용수 순위
  • Reference
1 B Keswani, A Comparative Performance Analysis of Convolution W/O OpenCL on a Standalone System, Advances in Computing and Communication Engineering(ICACCE), 2015
2 Bilal Jan, Fast parallel sorting algorithms on GPU, International Journal of Distributed and Parallel Systems, Vol3, No.6, 2012
3 Nobuyuki Otsu, A Threshold Selection Method from Gray-Level Histograms, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-9, NO.1, 1979
4 M Harris, Optimizing parallel reduction in CUDA, NVIDIA Developer Technology, 2007
5 M Harris, Parallel prefix sum (scan) with CUDA, GPU gems, 2007
6 LDagum, R menon, OpenMP : OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, 1998
7 Khronos OpenCL Working Group, The OpenCL specification, Hot Chips 21 Symposium (HCS), IEEE, 2009
8 Intel Corporation, Intergrated graphics and video computer display system, US Patent 5,432,900, 1995
9 Brenner, David J, Computed Tomography - An Increasing Source of Radiation Exposure Current Concepts, The New England Journal of Medicine 357.22, Nov 29, 2007
10 E. Stewart, Intel Integrated Performance Primitives: How to Optimize Software Applications Using Intel IPP, Intel Press, 2004
11 Intel corporation, The Compute Architecture of intel(R) Processor Graphics Gen8, 2014
12 Intel Corporation, The Compute Architecture of Intel(R) Processor Graphics Gen9, 2015
13 Janghaeng Lee, Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems, Parallel Architectures and Compilation Techniques (PACT), 2013
14 Daniel Lustig, Reducing GPU offload latency via fine-grained CPU-GPU synchronization, High Performance Computer Architecture (HPCA2013), 2013
15 Moinuddin K. Qureshi, Yale N. Patt, Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches, Micro architecture, MICRO-39, 2006
16 S. Che, M. Boyer, J. Meng, D. TaIjan, J. Sheaffer, S.H. Lee, and K. Skadron, Rodinia: A benchmark suite for heterogeneous computing, International Symposium on Workload Characterization, Oct. 2009
17 Victor, Garcia, Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneo us applications, Workload Characterization (IISWC), 2016
18 Jason Power, gem5-gpu: A Heterogeneous CPU-GPU Simulator, IEEE Computer Architecture Letters, June, 2015
19 Ali Bakhoda, Analyzing CUDA workloads using a detailed GPU simulator, Performance Analysis of Systems and Software, 2009
20 S.J. Pennycook, An investigation of the performenace portability of OpenCL, Journal of Parallel and Distributed Computing, Volume 73. Issue 11 , November 2013
21 M. M. Baskaran, A compiler framework for optimization of affine loop nests for gpgpus, in Proceedings of the 22nd annual international conference on Supercomputing, pages 225-234, 2008
22 Timothy G. Rogers, Cache-Concious Wavefront Scheduling, MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, Pages 72-83, 2012
23 JE Stone, OpenCL : A parallel programming standard for hetrogeneous computing systems, Computing in science & engineering, 2010
24 NVlDIA Inc., OpenCL Best Practices Guide, May 2010
25 Yi Yang, A GPGPU compiler for memory optimization and parallelism management, PLDI '10 Proceedings of the 31st ACM SGIGPLAN Conference on Programming Language Design and Implementation, p 86-97, 2010