1 |
M. Lee, S. Song, J. Moon, J. Kim, W. Seo, Y. Cho, and S. Ryu, "Improving GPGPU resource utilization through alternative thread block scheduling," High Performance Computer Architecture(HPCA), International Symposium on. pp.260-271, 2014.
2 |
V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt, "Improving GPU performance via large warps and two-level warp scheduling," in Microarchitecture (MICRO), Annual IEEE/ACM International Symposium on. IEEE, pp.308-317, 2011.
3 |
Y. Zhang, Z. Xing, C. Liu, C. Tang, and Q. Wang, "Locality based warp scheduling in GPGPUs," Future Generation Computer Systems. 2017.
4 |
B. Wang, Y. Zhu, and W. Yu, "OAWS: Memory occlusion aware warp scheduling," Parallel Architecture and Compilation Techniques (PACT), 2016 International Conference on. IEEE, pp.45-55, 2016.
5 |
J. Wang, N. Rubin, A. Sidelnik, and S. Yalamanchili, "LaPerm: Locality aware scheduler for dynamic parallelism on GPUs," ACM SIGARCH Computer Architecture News 44.3, pp.584-595, 2016.
6 |
Y. Liu et al. "Barrier-aware warp scheduling for throughput processors," in Proceedings of the 2016 International Conference on Supercomputing. ACM, pp.42, 2016.
7 |
NVIDIA CUDA Programming [internet], http://www.nvidia.com/object/cuda_home_new.html
8 |
M. Lee et al. "iPAWS: Instruction-issue pattern-based adaptive warp scheduling for GPGPUs," High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on. IEEE, pp.370-381, 2016.
9 |
T. G. Rogers, M. O'Connor, and T. M. Aamodt, "Cacheconscious wavefront scheduling," in Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, pp.72-83, 2012.
10 |
J. Zhang, Y. He, F. Shen, Q. A. Li, and H. Tan, "Memory Request Priority Based Warp Scheduling for GPUs," Chinese Journal of Electronics, Vol.27, No.7, pp.985-994, 2018.
11 |
B. Wang, W. Yu, X. H. Sun, and X. Wang, "Dacache: Memory divergence-aware gpu cache management," in Proceedings of the 29th ACM on International Conference on Supercomputing, pp.89-98, 2015.
12 |
K. Choo, D. Troendle, E. A. Gad, and B. Jang, "Contention-Aware Selective Caching to Mitigate Intra-Warp Contention on GPUs," Parallel and Distributed Computing(ISPDC), pp.1-8, 2017.
13 |
X. Chen, L. W. Chang, C. I. Rodrigues, J. Lv, Z. Wang, and W. M. Hwu, "Adaptive cache manageent for energy-efficient gpu computing," in Proceedings of the 47th annual IEEE/ACM International Symposium Microarchitecture, pp.343-355, 2014.
14 |
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," in Proceedings of International Symposium, pp.163-174, 2009.
15 |
S. Che, M. Boyer, M. Jiayuan, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Proceedings of the International Symposium on Workload Characterization (IISWC), pp.44-54, 2009.
16 |
S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos, "Auto-tuning a high-level language targeted to gpu codes," Innovative Parallel Computing(InPar), pp.1-10, 2012.
17 |
J. A. Stratton et al. "Parboil: A revised benchmark suite for scientific and commercial throughput computing," Center for Reliable and High-Performance Computing 127, 2012.