1 |
J. Y. Chang, W. J. Kim, K. J. Byun, and N. W. Eum, "Performance Analysis for Multimedia Video Codec on On-Chip Network," KISM Smart Media Journal, Vol. 1, No.1, pp. 27-35, 2012.
|
2 |
S. B. Heo, J. H. Park, and H. S. Jo, "An performance analysis on SSD caching mechanism in Linux," KISM Smart Media Journal, Vol. 4, No. 2, pp. 62-67, 2015.
|
3 |
V. Agarwal, M.S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microarchitectures," In Proceedings of the 27th International Symposium on Computer Architecture, pp. 248-259, 2000.
|
4 |
K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The Case for a Single-Chip Multi processor," In Proceedings of 7th Conference on Architectural Support for Programming Languages and Operating Systems, pp. 2-11, 1996.
|
5 |
V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microArchitectures," In Proceedings of 27th International Symposium on Computer Architecture, pp. 248-259, 2000.
|
6 |
M. D. Hill and M. R. Marty, "Amdahl's law in the multicore era," IEEE Computer, Vol. 41, No. 7, pp. 33-38, 2008.
|
7 |
S. Y. Lee, A. Arunkumar, and C. J. Wu, "CAWA:coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads," In Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 515-527, 2015.
|
8 |
D. Voitsechov, and Y. Etsion, "Single-graph multiple flows: Energy efficient design alternative for GPGPUs," In Proceedings of the 41st Annual International Symposium on Computer Architecture, pp. 205-216, 2014.
|
9 |
S. Che, M. Boyer, M. Jiayuan, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K.Skadron, "Rodinia: A Benchmark Suite for Heterogeneous Computing," In Proceedings of the International Symposium on Workload Characterization (IISWC), pp. 44-54, 2009.
|
10 |
M. Lee, S. Song, J. Moon, J. Kim, W. Seo, Y. G. Cho, and S. Ryu, "Improving GPGPU resource utilization through alternative thread block scheduling," In Proceedings of 20th IEEE International Symposium on High Performance Computer Architecture, pp. 260-271, 2014.
|
11 |
H. J. Lee, K. J. Brown, A. K. Sujeeth, T. Rompf, and K. Olukotun, "Locality-Aware Mapping of Nested Parallel Patterns on GPUs," In Proceedings of 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 63-74, 2014.
|
12 |
H. J. Choi, and C. H. Kim, "Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units," KISM Smart Media Journal, Vol. 3, No. 1, pp. 33-38, 2014.
|
13 |
I. A. Buck, "Programming CUDA," In Supercomputing 2007 Tutorial Notes, 2007.
|
14 |
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp. 163-174, 2009.
|
15 |
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," In Proceedings of the International Symposium on Microarchitecture, pp. 469-480, 2009.
|
16 |
NVIDIA's Next Generation CUDA Compute Architecture: Fermi, available at www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf
|
17 |
CUDA SDK, available at http://developerdownload.nvidia.com/compute/cuda/sdk/website/samples.hrml
|