1 |
E. Ebrahimi et al., "Parallel Application Memory Scheduling," IEEE/ACM Int. Symp. Microarchit., Porto Alegre, Brazil, Dec. 3-7, 2011, pp. 362-373.
|
2 |
W. Zhang, F. Liu, and R. Fan, "Cache Matching: Thread Scheduling to Maximize Data Reuse," Proc. High Performance Comput. Symp., Tampa, FL, USA, Apr. 13-16, 2014, pp. 47-54.
|
3 |
T. Zhang et al., "Half-DRAM: a High-Bandwidth and Low-Power DRAM Architecture form the Rethinking of Fine-Grained Activation," IEEE/ACM Int. Symp. Comput. Archit., Minneapolis, MN, USA, June 14-18, 2014, pp. 349-360.
|
4 |
Y. Kim et al., "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," Annu. Int. Symp. Comput. Archit., Portland, OR, USA, June 9-13, 2012, pp. 368-379.
|
5 |
Y.H. Son et al., "CiDRA: a Cache-Inspired DRAM Resilience Architecture," IEEE Int. Symp. High Performance Comput. Archit., San Francisco, CA, USA, Feb. 7-11, 2015, pp. 502-513.
|
6 |
J. Yu and W. Jang, "FDRAM: DRAM Architecture Flexible in Successive Row and Column Accesses," IEEE Int. Conf. Comput. Des., New York, USA, Oct. 18-21, 2015, pp. 480-483.
|
7 |
T.G. Rogers, M. O'Connor, and T.M. Aamodt, "Cache-Conscious Wavefront Scheduling," IEEE/ACM Int. Symp. Microarchit., Cambridge, UK, Dec. 1-5, 2014, pp. 72-83.
|
8 |
W.J. Starke et al., "The Cache and Memory Subsystems of the IBM POWER8 Processor," IBM J. Res. Develop., vol. 59, no. 1, Jan./Feb. 2015, pp. 3:1-3:13.
|
9 |
M. Hashemi et al., "Accelerating Dependent Cache Misses with an Enhanced Memory Controller," Annu. Int. Symp. Comput. Archit., Seoul, Rep. of Korea, June 18-22, 2016, pp. 444-455.
|
10 |
S. Wasly and R. Pellizzoni. "Hiding Memory Latency Using Fixed Priority Scheduling," IEEE Real-Time Embedded Technol. Applicat. Symp., Berlin, Germany, Apr. 15-17, 2014, pp. 75-86.
|
11 |
C.H. Hahm et al., "Memory Access Scheduling for a Smart TV," IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 2, Feb. 2016, pp. 399-411.
DOI
|
12 |
ITRS, "International Technology Roadmap for Semiconductors," 2013.
|
13 |
JEDEC, DDR 1, 2, 3, and 4 SDRAM Standard, Accessed 2016. http://www.jedec.org
|
14 |
ARM, ARM Processor Architecture, Accessed 2015. http://www.arm.com
|
15 |
C. Bienia et al., "The PARSEC Benchmark Suite: Characterization and Architectural Implications," Int. Conf. Parallel Archit. Compilation Techn., Toronto, Canada, Oct. 25-29, 2008, pp. 72-81.
|
16 |
A. Patel et al., "MARSS: a Full System Simulator for Multicore x86 CPUs," ACM/IEEE Des. Autom. Conf., San Diego, CA, USA, June 5-10, 2011, pp. 1050-1055.
|
17 |
P. Rosenfeld, E. Cooper-Balis, and B. Jacob, "DRAMSim2: a Cycle Accurate Memory System Simulator," Comput. Archit. Lett., vol. 10, no. 1, Jan. 2011, pp. 16-19.
DOI
|
18 |
Y.H. Son et al., "Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations," IEEE Int. Symp. Comput. Archit., Tel Aviv, Israel, June 23-27, 2013, pp. 380-391.
|
19 |
D. Lee et al., "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case," IEEE Int. Symp. High Performance Comput. Archit., San Francisco, CA, USA, Feb. 7-11, 2015, pp. 489-501.
|
20 |
D. Lee et al., "Tiered-Latency DRAM: a Low Latency and Low Cost DRAM Architecture," IEEE Int. Symp. High Performance Comput. Archit., Shenzhen, China, Feb. 23-27, 2013, pp. 615-626.
|