1 |
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The Case for a Single-Chip Multiprocessor," In Proceedings of 7th Conference on Architectural Support for Programming Languages and Operating Systems, pp.2-11, 1996.
|
2 |
V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microarchitectures," In Proceedings of International Symposium on Computer Architecture, pp.248-259, 2000.
|
3 |
H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger, "Dark Silicon and the End of Multicore Scaling," In Proceedings of International Symposium on Computer Architecture, pp.365-376, 2011.
|
4 |
iSuppli Market Research, available at http://www.isuppli.com/
|
5 |
M. D. Hill and M. R. Marty, "Amdahl's law in the multicore era," IEEE Computer, Vol.41, No.7, pp.33-38, 2008.
|
6 |
Y. H. Jang, C. Park, J. H. Park, N. Kim, and K. H. Yoo, "Parallel Processing for Integral maging Pickup using Multiple Threads," nternational Journal of Korea Contents, Vol.5, No.4, pp.30-34, 2009.
DOI
ScienceOn
|
7 |
Y. H. Jang, C. Park, J. S. Jung, J. H. Park, N. Kim, J. S. Ha, and K. H. Yoo, "Integral Imaging Pickup Method of Bio-Medical Data using GPU and Octree," International Journal of Korea Contents, Vol.10, No.9, pp.1-9, 2009.
|
8 |
NVIDIA Corporation, available at http://www.nvidia.com/
|
9 |
NVIDIA's Next Generation CUDA Compute Arc hitecture: Fermi, available at http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf
|
10 |
H. J. Choi, D. O. Son, J. M. Kim, and C. H. Kim, "Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization," Journal of SuperComputing, Vol.69, No.1, pp.330-356, 2014.
DOI
ScienceOn
|
11 |
I. Buck, "Gpu computing with nvidia cuda," In Proceedings of International Conference on Special Interest Group on Computer Graphics and Interactive Techniques(SIGGRAPH), p.6, 2007.
|
12 |
T. Li, P. Brett, R. Knauerhase, D. Koufaty, D. Reddy, and S. Hahn, "Operating System Support for Overlapping-ISA Heterogeneous Multi-core Architectures," In Proceedings of International Symposium on High Performance Computer Architecture, pp.1-12, 2010.
|
13 |
Performance Comparison between CPU and GP U, Available at http://www.ncsa.illinois.edu/-kindr/projects/hpca/files/ppac09_presentation.pdf
|
14 |
V. W. Lee, C. K. Kim, J. Chhugani, M. Deisher, D. H. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, "Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU," In Proceedings of International Symposium on Computer Architecture, pp.451-460, 2010.
|
15 |
General-purpose computation on graphics hardware, available at http://www.gpgpu.org
|
16 |
V. Narasiman, C. J. Lee, M. Shebanow, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt, "Improving GPU Performance via Large Warps and Two-Level Warp Scheduling," In Proceedings of international symposium on Microarchitecture, pp.308-317, 2011.
|
17 |
Y. Zhang and J. D. Owens, "A Quantitative Performance Analysis Model for GPU Architectures," In Proceedings of International Symposium on High Performance Computer Architecture, pp.382-393, 2011.
|
18 |
E. Blem, M. Sinclair, and K. Sankaralingam, "Challenge Benchmarks That Must be Conquered to Sustain the GPU Revolution," In Proceedings of Workshop on Emerging Applications for Manycore Architecture, 2010
|
19 |
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," In Proceedings of Microarchitecture, pp.407-420, 2007.
|
20 |
J. Meng, D. Tarjan, and K. Skadron, "Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance," In Proceedings of International Symposium on Computer Architecture, pp.235-246, 2010.
|
21 |
W. W. L. Fung and T. M. Aamodt, "Thread Block Compaction for Efficient SIMT Control Flow," In Proceedings of International Symposium on High Performance Computer Architecture, pp.25-36, 2011.
|
22 |
O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems," In Proceedings of International Symposium on Computer Architecture, pp.63-74, 2008.
|
23 |
A. Jog, O. Kayiran, N. C. Nachiappan, A. K. Mishra, M. T. Kandemir, O. Mutlu, R. Iyer, and C. R. Das, "OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance," In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pp.395-406, 2013.
|
24 |
NVIDIA SDK, available at http://developer.download.nvidia.com/compute/cuda/sdk/website/samples.html
|
25 |
H. J. Choi, H. G. Jeon, and C. H. Kim, "Quantitative Analysis of the Negative Factors on the GPU Performance," Journal of KIISE : Computing Practices and Letters, Vol.18, No.4, pp.282-287, 2012.
|
26 |
J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch : Enabling Energy Optimizations in GPGPUs," In Proceedings of International Symposium on Computer Architecture, pp.487-498, 2013.
|
27 |
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009.
|
28 |
S. Che, M. Boyer, M. Jiayuan, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K.Skadron, "Rodinia: A Benchmark Suite for Heterogeneous Computing," In Proceedings of the International Symposium on Workload Characterization, pp.44-54, 2009.
|