References
- K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The Case for a Single-Chip Multiprocessor," In Proceedings of 7th Conference on Architectural Support for Programming Languages and Operating Systems, pp.2-11, 1996.
- V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microarchitectures," In Proceedings of International Symposium on Computer Architecture, pp.248-259, 2000.
- H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger, "Dark Silicon and the End of Multicore Scaling," In Proceedings of International Symposium on Computer Architecture, pp.365-376, 2011.
- iSuppli Market Research, available at http://www.isuppli.com/
- M. D. Hill and M. R. Marty, "Amdahl's law in the multicore era," IEEE Computer, Vol.41, No.7, pp.33-38, 2008.
- Y. H. Jang, C. Park, J. H. Park, N. Kim, and K. H. Yoo, "Parallel Processing for Integral maging Pickup using Multiple Threads," nternational Journal of Korea Contents, Vol.5, No.4, pp.30-34, 2009. https://doi.org/10.5392/IJoC.2009.5.4.030
- Y. H. Jang, C. Park, J. S. Jung, J. H. Park, N. Kim, J. S. Ha, and K. H. Yoo, "Integral Imaging Pickup Method of Bio-Medical Data using GPU and Octree," International Journal of Korea Contents, Vol.10, No.9, pp.1-9, 2009.
- NVIDIA Corporation, available at http://www.nvidia.com/
- NVIDIA's Next Generation CUDA Compute Arc hitecture: Fermi, available at http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf
- H. J. Choi, D. O. Son, J. M. Kim, and C. H. Kim, "Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization," Journal of SuperComputing, Vol.69, No.1, pp.330-356, 2014. https://doi.org/10.1007/s11227-014-1155-4
- I. Buck, "Gpu computing with nvidia cuda," In Proceedings of International Conference on Special Interest Group on Computer Graphics and Interactive Techniques(SIGGRAPH), p.6, 2007.
- T. Li, P. Brett, R. Knauerhase, D. Koufaty, D. Reddy, and S. Hahn, "Operating System Support for Overlapping-ISA Heterogeneous Multi-core Architectures," In Proceedings of International Symposium on High Performance Computer Architecture, pp.1-12, 2010.
- Performance Comparison between CPU and GP U, Available at http://www.ncsa.illinois.edu/-kindr/projects/hpca/files/ppac09_presentation.pdf
- V. W. Lee, C. K. Kim, J. Chhugani, M. Deisher, D. H. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, "Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU," In Proceedings of International Symposium on Computer Architecture, pp.451-460, 2010.
- General-purpose computation on graphics hardware, available at http://www.gpgpu.org
- Y. Zhang and J. D. Owens, "A Quantitative Performance Analysis Model for GPU Architectures," In Proceedings of International Symposium on High Performance Computer Architecture, pp.382-393, 2011.
- E. Blem, M. Sinclair, and K. Sankaralingam, "Challenge Benchmarks That Must be Conquered to Sustain the GPU Revolution," In Proceedings of Workshop on Emerging Applications for Manycore Architecture, 2010
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," In Proceedings of Microarchitecture, pp.407-420, 2007.
- V. Narasiman, C. J. Lee, M. Shebanow, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt, "Improving GPU Performance via Large Warps and Two-Level Warp Scheduling," In Proceedings of international symposium on Microarchitecture, pp.308-317, 2011.
- J. Meng, D. Tarjan, and K. Skadron, "Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance," In Proceedings of International Symposium on Computer Architecture, pp.235-246, 2010.
- W. W. L. Fung and T. M. Aamodt, "Thread Block Compaction for Efficient SIMT Control Flow," In Proceedings of International Symposium on High Performance Computer Architecture, pp.25-36, 2011.
- O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems," In Proceedings of International Symposium on Computer Architecture, pp.63-74, 2008.
- A. Jog, O. Kayiran, N. C. Nachiappan, A. K. Mishra, M. T. Kandemir, O. Mutlu, R. Iyer, and C. R. Das, "OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance," In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pp.395-406, 2013.
- H. J. Choi, H. G. Jeon, and C. H. Kim, "Quantitative Analysis of the Negative Factors on the GPU Performance," Journal of KIISE : Computing Practices and Letters, Vol.18, No.4, pp.282-287, 2012.
- J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch : Enabling Energy Optimizations in GPGPUs," In Proceedings of International Symposium on Computer Architecture, pp.487-498, 2013.
- A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009.
- NVIDIA SDK, available at http://developer.download.nvidia.com/compute/cuda/sdk/website/samples.html
- S. Che, M. Boyer, M. Jiayuan, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K.Skadron, "Rodinia: A Benchmark Suite for Heterogeneous Computing," In Proceedings of the International Symposium on Workload Characterization, pp.44-54, 2009.