[KSCI] Korea Science Citation Index Service

The Main Memory System: Challenges and Opportunities

Mutlu, Onur (Carnegie Mellon University)
Meza, Justin (Carnegie Mellon University)
Subramanian, Lavanya (Carnegie Mellon University)

Publication Information

Communications of the Korean Institute of Information Scientists and Engineers / v.33, no.2, 2015 , pp. 16-41 More about this Journal

Keywords

Memory systems; scaling; DRAM; flash; non-volatile memory; QoS; reliability; hybrid memory; storage;

Citations & Related Records

Reference

1	O. Mutlu, "Memory scaling: A systems architecture perspective," in MemCon, 2013.
2	O. Mutlu et aI., "Memory systems in the many-core era: Challenges, opportunities, and solution directions," in ISMM, 2011, http://users.ece.cmu.edu/omutlu/pub/onur-ismm-mspc-keynote-june-5-2011-short.pptx.
3	O. Mutlu et aI., "Address-value delta (AVO) prediction: A hardware technique for efficiently parallelizing dependent cache misses," IEEE Transactions on Computers, vol. 55 , no. 12, Dec. 2006.
4	O. Mutlu and T. Moscibroda, "Stall-time fair memory access scheduling for chip multiprocessors," in MICRO, 2007.
5	O. Mutlu and T. Moscibroda, "Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems," in ISCA, 2008.
6	O. Mutlu et aI., "Address-value delta (AVO) prediction: Increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns," in MlCRO, 2005.
7	O. Mutlu et aI., "Techniques for efficient processing in run ahead execution engines," in ISCA, 2005.
8	O. Mutlu et aI., "Efficient runahead execution: Power-efficient memory latency tolerance," IEEE Micro (TOP PICKS Issue), vol. 26, no. 1, 2006.
9	O. Mutlu and T. Moscibroda, "Parallelism-aware batch scheduling: Enabling high-performance and fair memory controllers," IEEE Micro (TOP PICKS Issue), vol. 29, no. 1, 2009.
10	O. Mutlu et aI., "Runahead execution: An alternative to very large instruction windows for out-of-order processors," in HPCA, 2003.
11	O. Mutlu et aI., "Runahead execution: An effective alternative to large instruction windows," IEEE Micro (TOP PICKS Issue), vol. 23, no. 6, 2003.
12	C. Nachiappan et aI., "Application-aware prefetch prioritization in on-chip networks," in PACT Poster Session, 2012.
13	P. J. Nair et aI., "ArchShield: Architectural Framework for Assisting DRAM Scaling by Tolerating High Error Rates," in ISCA, 2013.
14	D. Narayanan and O. Hodson, "Whole-system persistence," in ASPLOS, 2012.
15	G. Nychis et aI., "Next generation on-chip networks: What kind of congestion control do we need?" in HotNets,2010.
16	G. Nychis et aI., "On-chip networks from a networking perspective: Congestion and scalability in many-core interconnects," in SIGCOMM, 2012.
17	T. Ohsawa et aI., "Optimizing the DRAM refresh count for merged DRAM/logic LSls," in ISLPED, 1998.
18	V. S. Pai and S. Adve, "Code transformations to improve memory parallelism," in MICRO, 1999.
19	Y. N. Patt et aI., "HPS, a new microarchitecture: Rationale and introduction," in MICRO, 1985.
20	Y. N. Patt et aI., "Critical issues regarding HPS, a high performance microarchitecture," in MICRO, 1985.
21	G. Pekhimenko et al., "Base-delta-immediate compression: A practical data compression mechanism for on-chip caches," in PACT, 2012.
22	G. Pekhimenko et aI. , "Linearly compressed pages: A main memory compression framework with low complexity and low latency." in MICRO, 2013.
23	G. Pekhimenko et aI., "Exploiting compressed block size as an indicator of future reuse," in HPCA, 2015.
24	S. Phadke and S. Narayanasamy, "MLP aware heterogeneous memory system," in DATE, 2011.
25	M. K. Qureshi et aI., "A case for MLP-aware cache replacement," in ISCA, 2006.
26	M. K. Qureshi et aI., "Line distillation: Increasing cache capacity by filtering unused words in cache lines," in HPCA, 2007.
27	M. K. Qureshi et aI., "Enhancing lifetime and security of phase change memories via start-gap wear leveling." in MICRO, 2009.
28	M. K. Qureshi et aI., "Scalable high performance main memory system using phase-change memory technology," in ISCA, 2009.
29	M. K. Qureshi and Y. N. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," in MICRO, 2006.
30	L. E. Ramos et aI., "Page placement in hybrid memory systems," in ICS, 2011.
31	S. Raoux et aI., "Phase-change random access memory: A scalable technology," IBM JR&D, vol. 52, Jul/Sep 2008.
32	B. Schroder et aI., "DRAM errors in the wild : A large-scale field study," in SIGMETRICS, 2009.
33	N. H. Seong et al., "Tri-Ievel-cell phase change memory: Toward an efficient and reliable memory system," in ISCA, 2013.
34	V. Seshadri et aI., "The evicted-address filter: A unified mechanism to address both cache pollution and thrashing," in PACT, 2012.
35	V. Seshadri et aI., "RowClone: Fast and efficient In-DRAM copy and initialization of bulk data," in MICRO, 2013.
36	V. Seshadri et aI., "The dirty-block index," in ISCA, 2014.
37	V. Seshadri et aI., "Mitigating prefetcher-caused pollution using informed caching policies for prefetched blocks," TACO, 2014.
38	V. Sridharan and D. Liberty, "A study of DRAM failures in the field," in SC, 2012.
39	F. Soltis, "Inside the AS/400," 29th Street Press, 1996.
40	N. H. Song et aI., "Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping," in ISCA, 2010.
41	V. Sridharan et aI., "Feng shui of supercomputer memory: Positional effects in DRAM and SRAM faults," in SC, 2013.
42	S. Srinath et aI., "Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers," in HPCA, 2007.
43	J. Stuecheli et aI., "The virtual write queue: Coordinating DRAM and last-level cache policies," in ISCA, 2010.
44	L. Subramanian et aI., "MISE: Providing performance predictability and improving fairness in shared main memory systems," in HPCA, 2013.
45	L. Subramanian et aI., "The blacklisting memory scheduler: Achieving high performance and fairness at low cost," in ICCD, 2014.
46	M. A. Suleman et aI., "Accelerating critical section execution with asymmetric multi-core architectures," in ASPLOS, 2009.
47	M. A. Suleman et aI., "Data marshaling for multi-core architectures," in ISCA, 2010.
48	M. A. Suleman et aI., "Data marshaling for multi-core systems," IEEE Micro (TOP PICKS Issue), vol. 31, no. 1, 2011.
49	M. A. Suleman et aI., "Accelerating critical section execution with asymmetric multi-core architectures," IEEE Micro (TOP PICKS Issue), vol. 30, no. 1, 2010.
50	L. Tang et aI., "The impact of memory subsystem resource sharing on datacenter applications," in ISCA, 2011.
51	J. Tendler et aI., "POWER4 system microarchitecture," IBM JRD, Oct. 2001.
52	M. Thottethodi et aI., "Exploiting global knowledge to achieve self-tuned congestion control for k-ary n-cube networks," IEEE TPDS, vol. 15, no. 3, 2004.
53	R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units," IBM JR&D, vol. 11, Jan. 1967.
54	T. Treangen and S. Salzberg, "Repetitive DNA and next-generation sequencing: computational challenges and solutions," in Nature Reviews Genetics, 2012.
55	A. Udipi et aI., "Rethinking DRAM design and organization for energy-constrained multi-cores," in ISCA, 2010.
56	A. Udipi et aI., "Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems," in ISCA, 2011.
57	V. Vasudevan et aI., "Using vector interfaces to deliver millions of lOPS from a networked key-value storage server," in SoCC, 2012.
58	R. K. Venkatesan et al., "Retention-aware placement in DRAM (RAPID): Software methods for quasi-nonvolatile DRAM," in HPCA, 2006.
59	H. Volos et al., "Mnemosyne: lightweight persistent memory," in ASPLOS, 2011.
60	X. Wang and J. Martinez, "XChange: Scalable dynamic multi-resource allocation in multicore architectures," in HPCA, 2015.
61	H.-S. P. Wong et al., "Phase change memory," in Proceedings of the IEEE, 2010.
62	H.-S. P. Wong et al., "Metal-oxide RRAM," in Proceedings of the IEEE, 2012.
63	D. Yoon et al., "Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput," in ISCA, 2011.
64	"SAFARI tools,"https://www.ece.cmu.edu/safatiltools.html.
65	M. Xie et al., "Improving system throughput and fairness simultaneously in shared memory CMP systems via dynamic bank partitioning," in HPCA, 2014.
66	Y. Xie and G. H. Loh, "PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches," in ISCA, 2009.
67	H. Xin et al., "Accelerating read mapping with FastHASH," in BMC Genomics, 2013.
68	J. Yang et al., "Frequent value compression in data caches," in MICRO, 2000.
69	D. Yoon et al., "The dynamic granularity memory system," in ISCA, 2012.
70	H. Yoon et al., "Row buffer locality aware caching policies for hybrid memories," in ICCD, 2012.
71	H. Yoon et aI., "Data mapping and buffering in multi-level cell memory for higher performance and energy efficiency." CMU SAFARI Tech. Report, 2013.
72	H. Yoon et aI., "Efficient data mapping and buffering techniques for multi-level cell phase-change memories," TACO, 2014.
73	J. Zhao et aI., "FIRM: Fair and high-performance memory control for persistent memory systems," in MICRO, 2014.
74	H. Zhou and T. M. Conte, "Enhancing memory level parallelism via recovery-free value prediction," in ICS, 2003.
75	S. Zhuravlev et aI., "Addressing shared resource contention in multicore processors via scheduling," in ASPLOS, 2010.
76	J.-H. Ahn et aI., "Adaptive self refresh scheme for battery operated high-density mobile DRAM applications," in ASSCC, 2006.
77	"International technology roadmap for semiconductors (ITRS)," 2011.
78	Hybrid Memory Consortium, 2012, http://www.hybridmemorycube.org.
79	Top 500, 2013, http://www.top500.org/featured/systems/tianhe-21.
80	A. R. Alameldeen and D. A. Wood, "Adaptive cache compression for high-performance processors," in ISCA, 2004.
81	C. Alkan et aI., "Personalized copy-number and segmental duplication maps using next-generation sequencing," in Nature Genetics, 2009.
82	G. Atwood, "Current and emerging memory technology landscape," in Flash Memory Summit, 2011.
83	R. Ausavarungnirun et aI., "Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems," in ISCA, 2012.
84	R. Ausavarungnirun et aI., "Design and evaluation of hierarchical rings with deflection routing," in SBACPAD, 2014.
85	S. Balakrishnan and G. S. Sohi, "Exploiting value locality in physical register files," in MICRO, 2003.
86	A. Bhattacharjee and M. Martonosi, "Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors," in ISCA, 2009.
87	R. Bitirgen et aI., "Coordinated management of multiple interacting resources in CMPs: A machine learning approach," in MICRO, 2008.
88	B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors," Communications of the ACM, vol. 13, no. 7, 1970.
89	Q. Cai et aI., "Meeting points: Using thread criticality to adapt multicore hardware to parallel regions," in PACT, 2008.
90	R. Bryant, "Data-intensive supercomputing: The case for DISC," CMU CS Tech. Report 07-128, 2007.
91	Y. Cai et aI., "FPGA-based solid-state drive prototyping platform," in FCCM, 2011.
92	Y. Cai et aI., "Error patterns in MLCNAND flash memory: Measurement, characterization, and analysis," in DATE, 2012.
93	Y. Cai et aI., "Flash Correct-and-Refresh: Retention-aware error management for increased flash memory lifetime," in ICCD, 2012.
94	Y. Cai et aI., "Error analysis and retention-aware error management for NAND flash memory," Intel Technology Journal, vol. 17, no. 1, May 2013.
95	Y. Cai et aI., "Program interference in MLCNAND flash memory: Characterization, modeling, and mitigation," in ICCD, 2013.
96	Y. Cai et aI., "Threshold voltage distribution in MLCNAND flash memory: Characterization, analysis and modeling," in DATE, 2013.
97	Y. Cai et aI., "Neighbor-cell assisted error correction for MLC NAND flash memories," in SIGMETRlCS, 2014.
98	Y. Cai et aI., "Data retention in MLC NAND flash memory: Characterization, optimization and recovery," in HPCA, 2015.
99	K. Chang et aI., "HAT:Heterogeneous adaptive throttling for on-chip networks," in SBAC-PAD, 2012.
100	K. Chang et aI., "Improving DRAM performance by parallelizing refreshes with accesses," in HPCA, 2014.
101	N. Chatteljee et aI., "Leveraging heterogeneity in DRAM main memories to accelerate critical word access," in MICRO, 2012.
102	Y. Chou et aI., "Store memory-level parallelism optimizations for commercial applications," in MICRO, 2005.
103	S. Chaudhry et aI., "High-performance throughput computing," IEEE Micro, vol. 25, no. 6, 2005.
104	E. Chen et aI., "Advances and future prospects of spin-transfer torque random access memory," IEEE Transactions on Magnetics, vol. 46, no. 6, 2010.
105	S. Chhabra and Y. Solihin, "i-NVMM: a secure non-volatile main memory system with incremental encryption," in ISCA, 2011.
106	Y. Chou et aI., "Microarchitecture optimizations for exploiting memory-level parallelism," in ISCA, 2004.
107	E. Chung et aI., "Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPUs?" in MICRO, 2010.
108	J. Coburn et aI., "NY-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories," in ASPLOS, 2011.
109	Grand Research Challenges in Information Systems, Computing Research Association, http://www.cra.org/reports/gc.systems.pdf.
110	J. Condit et aI., "Better I/O through byte-addressable, persistent memory," in SOSP, 2009.
111	K. V. Craeynest et aI., "Scheduling heterogeneous multi-cores through performance impact estimation (PIE)," in ISCA, 2012.
112	R. Das et aI., "Application-to-core mapping policies to reduce memory system interference in multi-core systems," in HPCA, 2013.
113	R. Das et al., "Application-aware prioritization mechanisms for on-chip networks," in MICRO, 2009.
114	R. Das et aI., "Aergia: Exploiting packet latency slack in on-chip networks," in ISCA, 2010.
115	Q. Deng et aI., "MemScale: active low-power modes for main memory," in ASPLOS, 2011.
116	R. Das et aI., "Aergia: A network-on-chip exploiting packet latency slack," IEEE Micro (TOP PICKS Issue), vol. 31, no. 1, 2011.
117	H. David et aI., "Memory power management via dynamic voltage/frequency scaling," in ICAC, 2011.
118	J. Dean and L. A. Barroso, "The tail at scale," Communications of the ACM, vol. 56, no. 2, 2013.
119	G. Dhiman et al., "PDRAM: A hybrid PRAM and DRAM main memory system," in DAC, 2009.
120	X. Dong et aI., "Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems," in SC, 2009.
121	K. Du Bois et aI., "Per-thread cycle accounting in multicore processors," TACO, 2013.
122	K. Du Bois et aI., "Criticality stacks: Identifying critical threads in parallel programs using synchronization behavior," in ISCA, 2013.
123	J. Dundas and T. Mudge, "Improving data cache performance by pre-executing instructions under a cache miss," in ICS, 1997.
124	J. Dusser et aI., "Zero-content augmented caches," in ICS, 2009.
125	E. Ebrahimi et aI., "Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems," in ASPLOS, 2010.
126	E. Ebrahimi et aI., "Parallel application memory scheduling," in MICRO, 2011.
127	E. Ebrahimi et aI., "Prefetch-aware shared-resource management for multi-core systems," in ISCA, 2011.
128	E. Ebrahimi et aI., "Coordinated control of multiple prefetchers in multi-core systems," in MICRO, 2009.
129	E. Ebrahimi et aI., "Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems," TOCS, 2012.
130	E. Ebrahimi et aI., "Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems," in HPCA, 2009.
131	M. Ekman, "A robust main-memory compression scheme," in ISCA, 2005.
132	S. Eyerman and L. Eeckhout, "Modeling critical sections in amdahl's law and its implications for multicore design," in ISCA, 2010.
133	C. Fallin et aI., "CHIPPER: a low-complexity bufferless deflection router," in HPCA, 2011.
134	R. Gallager, "Low density parity check codes," 1963, MIT Press.
135	A. Glew, "MLP yes! ILP no!" in ASPLOS Wild and Crazy Idea Session, 1998.
136	B. Grot et aI., "Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees," in ISCA, 2011.
137	B. Grot et aI., "Regional congestion awareness for load balance in networks-on-chip," in HPCA, 2008.
138	B. Grot et aI., "Preemptive virtual clock: A flexible, efficient, and cost-effective QoS scheme for networkson-chip," in MICRO, 2009.
139	B. Grot et aI., "Topology-aware quality-of-service support in highly integrated chip multiprocessors," in WIOSCA,2010.
140	G. Hinton et aI., "The microarchitecture of the Pentium 4 processor," Intel Technology Journal, Feb. 2001, Q1 2001 Issue.
141	R. Iyer, "CQoS: a framework for enabling QoS in shared caches of CMP platforms," in ICS, 2004.
142	S. Hong, "Memory technology trend and future challenges," in IEDM, 2010.
143	E. Ipek et aI., "Self-optimizing memory controllers: A reinforcement learning approach," in ISCA, 2008.
144	C. Isen and L. K. John, "Eskimo: Energy savings using semantic knowledge of inconsequential memory occupancy for DRAM subsystem," in MICRO, 2009.
145	R. Iyer et aI., "QoS policies and architecture for cache/memory in CMP platforms," in SIGMETRICS, 2007.
146	A. Jaleel et aI., "Adaptive insertion policies for managing shared caches," in PACT, 2008.
147	A. Jaleel et aI., "High performance cache replacement using re-reference interval prediction," in ISCA, 2010.
148	M. K. Jeong et aI., "Balancing DRAM locality and parallelism in shared memory CMP systems," in HPCA, 2012.
149	J. A. Joao et aI., "Bottleneck identification and scheduling in multithreaded applications," in ASPLOS, 2012.
150	J. A. Joao et aI., "Utility-based acceleration of multithreaded applications on asymmetric CMPs," in ISCA, 2013.
151	A. Jog et aI., "Orchestrated scheduling and prefetching for GPGPUs," in ISCA, 2013.
152	A. Jog et aI., "OWL: Cooperative thread array aware scheduling techniques for improving GPGPU performance," in ASPLOS, 2013.
153	T. L. Johnson et aI., "Run-time spatial locality detection and optimization," in MICRO, 1997.
154	S. Khan et al., "The efficacy of error mitigation techniques for DRAM retention failures: A comparative experimental study," in SIGMETRICS, 2014.
155	U. Kang et aI., "Co-architecting controllers and DRAM to enhance DRAM process scaling," in The Memory Forum, 2014.
156	D. Kaseridis et aI., "Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era," in MICRO, 2011.
157	O. Kayiran et aI., "Managing GPU concurrency in heterogeneous architectures," in MICRO, 2014.
158	S. Khan et aI., "Improving cache performance by exploiting read-write disparity," in HPCA, 2014.
159	H. Kim et aI., "Bounding memory interference delay in COTS-based multi-core systems," in RTAS, 2014.
160	J. Kim and M. C. Papaefthymiou, "Dynamic memory design for low data-retention power," in PATMOS, 2000.
161	K. Kim, "Future memory technology: challenges and opportunities," in VLSI-TSA, 2008.
162	K. Kim et aI., "Anew investigation of data retention time in truly nanoscaled DRAMs." IEEE Electron Device Letters, vol. 30, no. 8, Aug. 2009.
163	Y. Kim et aI., "ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers," in HPCA, 2010.
164	Y. Kim et aI., "Thread cluster memory scheduling: Exploiting differences in memory access behavior," in MICRO, 2010.
165	Y. Kim et aI., "Thread cluster memory scheduling," IEEE Micro (TOP PICKS Issue), vol. 31, no. 1, 2011.
166	Y. Kim et aI., "A case for subarray-level parallelism (SALP) in DRAM," in ISCA, 2012.
167	D. Kroft, "Lockup-free instruction fetch/prefetch cache organization," in ISCA, 1981.
168	Y. Kim et aI., "Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors," in ISCA, 2014.
169	Y Koh, "NAND Flash Scaling Beyond 20nm," in IMW, 2009.
170	J. Kong et aI., "Improving privacy and lifetime of PCM-based main memory," in DSN, 2010.
171	E. Kultursay et aI., "Evaluating SIT-RAM as an energy-efficient main memory alternative," in ISPASS, 2013.
172	S. Kumar and C. Wilkerson, "Exploiting spatial locality in data caches using spatial footprints," in ISCA, 1998.
173	B. C. Lee et aI., "Architecting Phase Change Memory as a Scalable DRAM Alternative," in ISCA, 2009.
174	B. C. Lee et aI., "Phase change memory architecture and the quest for scalability," Communications of the ACM, vol. 53, no. 7, 2010.
175	B. C. Lee et aI., "Phase change technology and the future of main memory," IEEE Micro (TOP PICKS Issue), vol. 30, no. 1, 2010.
176	C. J. Lee et aI., "Prefetch-aware DRAM controllers," in MICRO, 2008.
177	C. J. Lee et aI., "DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems," HPS, UT-Austin, Tech. Rep. TRHPS-2010-002, 2010.
178	C. J. Lee et aI., "Prefetch-aware memory controllers," TC, vol. 60, no. 10, 2011.
179	C. J. Lee et aI., "Improving memory bank-level parallelism in the presence of prefetching," in MICRO, 2009.
180	D. Lee et aI., "Tiered-latency DRAM: A low latency and low cost DRAM architecture," in HPCA, 2013.
181	D. Lee et aI., "Adaptive-latency DRAM: Optimizing DRAM timing for the common-case," in HPCA, 2015.
182	D. Lee et aI., "Fast and accurate mapping of Complete Genomics reads," in Methods, 2014.
183	C. Lefurgy et aI., "Energy management for commercial servers," in IEEE Computer, 2003.
184	K. Lim et aI., "Disaggregated memory for expansion and sharing in blade servers," in ISCA, 2009.
185	J. Liu et aI., "RAIDR: Retention-aware intelligent DRAM refresh," in ISCA, 2012.
186	J. Liu et aI., "An experimental study of data retention behavior in modern DRAM devices: Implications for retention time profiling mechanisms," in ISCA, 2013 .
187	L. Liu et aI., "A software memory partition approach for eliminating bank-level interference in multicore systems," in PACT, 2012.
188	S. Liu et aI., "Flikker: saving DRAM refresh-power through critical data partitioning," in ASPLOS, 2011.
189	G. Loh, "3D-stacked memory architectures for multicore processors," in ISCA, 2008.
190	G. H. Loh and M. D. Hill, "Efficiently enabling conventional block sizes for very large die-stacked DRAM caches," in MICRO, 2011.
191	Y. Lu et aI., "LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions," in ICCD, 2013.
192	Y. Lu et aI., "Loose-ordering consistency for persistent memory," in ICCD, 2014.
193	J. Meza et aI., "A case for small row buffers in non-volatile main memories," in ICCD, 2012.
194	Y. Luo et aI., "Characterizing application memory error vulnerability to optimize data center cost via heterogeneous-reliability memory," in DSN, 2014.
195	A. Maislos et aI., "A new era in embedded flash memory," in FMS, 2011.
196	J. Mandelman et aI., "Challenges and future directions for the scaling of dynamic random-access memory (DRAM)," in IBM JR&D, vol. 46, 2002.
197	J. Meza et aI., "Enabling efficient and scalable hybrid memories using fine-granularity DRAM cache management," IEEE CAL, 2012.
198	J. Meza et aI., "A case for efficient hardware-software cooperative management of storage and memory," in WEED, 2013 .
199	A. Mishra et aI., "A heterogeneous multiple networkon-chip design: An application-aware approach," in DAC,2013.
200	T. Moscibroda and O. Mutlu, "Memory performance attacks: Denial of memory service in multi-core systems," in USENIX Security, 2007.
201	T. Moscibroda and O. Mutlu, "Distributed order scheduling and its application to multi-core DRAM controllers," in PODC, 2008.
202	S. Muralidhara et aI., "Reducing memory interference in multi-core systems via application-aware memory channel partitioning," in MICRO, 2011.
203	O. Mutlu, "Asymmetry everywhere (with automatic resource management)," in CRA Workshop on Advanced Computer Architecture Research, 2010.
204	O. Mutlu, "Memory scaling: A systems architecture perspective," in IMW, 2013.