Acknowledgement
Supported by : National Research Foundation of Korea(NRF)
References
- Brenner, David J, Computed Tomography - An Increasing Source of Radiation Exposure Current Concepts, The New England Journal of Medicine 357.22, Nov 29, 2007
- E. Stewart, Intel Integrated Performance Primitives: How to Optimize Software Applications Using Intel IPP, Intel Press, 2004
- LDagum, R menon, OpenMP : OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, 1998
- Khronos OpenCL Working Group, The OpenCL specification, Hot Chips 21 Symposium (HCS), IEEE, 2009
- Intel Corporation, Intergrated graphics and video computer display system, US Patent 5,432,900, 1995
- Intel corporation, The Compute Architecture of intel(R) Processor Graphics Gen8, 2014
- Intel Corporation, The Compute Architecture of Intel(R) Processor Graphics Gen9, 2015
- Janghaeng Lee, Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems, Parallel Architectures and Compilation Techniques (PACT), 2013
- Daniel Lustig, Reducing GPU offload latency via fine-grained CPU-GPU synchronization, High Performance Computer Architecture (HPCA2013), 2013
- Moinuddin K. Qureshi, Yale N. Patt, Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches, Micro architecture, MICRO-39, 2006
- S. Che, M. Boyer, J. Meng, D. TaIjan, J. Sheaffer, S.H. Lee, and K. Skadron, Rodinia: A benchmark suite for heterogeneous computing, International Symposium on Workload Characterization, Oct. 2009
- Victor, Garcia, Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneo us applications, Workload Characterization (IISWC), 2016
- Jason Power, gem5-gpu: A Heterogeneous CPU-GPU Simulator, IEEE Computer Architecture Letters, June, 2015
- Ali Bakhoda, Analyzing CUDA workloads using a detailed GPU simulator, Performance Analysis of Systems and Software, 2009
- S.J. Pennycook, An investigation of the performenace portability of OpenCL, Journal of Parallel and Distributed Computing, Volume 73. Issue 11 , November 2013
- Timothy G. Rogers, Cache-Concious Wavefront Scheduling, MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, Pages 72-83, 2012
- JE Stone, OpenCL : A parallel programming standard for hetrogeneous computing systems, Computing in science & engineering, 2010
- NVlDIA Inc., OpenCL Best Practices Guide, May 2010
- M. M. Baskaran, A compiler framework for optimization of affine loop nests for gpgpus, in Proceedings of the 22nd annual international conference on Supercomputing, pages 225-234, 2008
- Yi Yang, A GPGPU compiler for memory optimization and parallelism management, PLDI '10 Proceedings of the 31st ACM SGIGPLAN Conference on Programming Language Design and Implementation, p 86-97, 2010
- B Keswani, A Comparative Performance Analysis of Convolution W/O OpenCL on a Standalone System, Advances in Computing and Communication Engineering(ICACCE), 2015
- Bilal Jan, Fast parallel sorting algorithms on GPU, International Journal of Distributed and Parallel Systems, Vol3, No.6, 2012
- Nobuyuki Otsu, A Threshold Selection Method from Gray-Level Histograms, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-9, NO.1, 1979
- M Harris, Optimizing parallel reduction in CUDA, NVIDIA Developer Technology, 2007
- M Harris, Parallel prefix sum (scan) with CUDA, GPU gems, 2007