Low-power heterogeneous uncore architecture for future 3D chip-multiprocessors

Dorostkar, Aniseh;Asad, Arghavan;Fathy, Mahmood;Jahed-Motlagh, Mohammad Reza;Mohammadi, Farah;

doi:10.4218/etrij.2017-0095

ETRI Journal

Volume 40 Issue 6
/
Pages.759-773
/
2018
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Low-power heterogeneous uncore architecture for future 3D chip-multiprocessors

Dorostkar, Aniseh (Computer Engineering Department, Iran University of Science and Technology) ;
Asad, Arghavan (Computer Engineering Department, Iran University of Science and Technology) ;
Fathy, Mahmood (Computer Engineering Department, Iran University of Science and Technology) ;
Jahed-Motlagh, Mohammad Reza (Computer Engineering Department, Iran University of Science and Technology) ;
Mohammadi, Farah (Electrical and Computer Engineering Department, Ryerson University)

Received : 2017.06.29
Accepted : 2018.06.11
Published : 2018.12.06

https://doi.org/10.4218/etrij.2017-0095 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Uncore components such as on-chip memory systems and on-chip interconnects consume a large amount of energy in emerging embedded applications. Few studies have focused on next-generation analytical models for future chip-multiprocessors (CMPs) that simultaneously consider the impacts of the power consumption of core and uncore components. In this paper, we propose a convex-optimization approach to design heterogeneous uncore architectures for embedded CMPs. Our convex approach optimizes the number and placement of memory banks with different technologies on the memory layer. In parallel with hybrid memory architecting, optimizing the number and placement of through silicon vias as a viable solution in building three-dimensional (3D) CMPs is another important target of the proposed approach. Experimental results show that the proposed method outperforms 3D CMP designs with hybrid and traditional memory architectures in terms of both energy delay products (EDPs) and performance parameters. The proposed method improves the EDPs by an average of about 43% compared with SRAM design. In addition, it improves the throughput by about 7% compared with dynamic RAM (DRAM) design.

Keywords

References

H. Tajik, H. Homayoun, and N. Dutt, VAWOM: Temperature and process variation aware wearout management in 3D multicore architecture, Design Autom. Conf. (DAC), Austin, Texas, USA, 2013, pp. 1-8.
Z. Abbas and M. Olivieri, Impact of technology scaling on leakage power in nano-scale bulk CMOS digital standard cells, Microelectron. J. 45 (2014), no. 2, 179-195. https://doi.org/10.1016/j.mejo.2013.10.013
W. Wang and P. Mishra, System-wide leakage-aware energy minimization using dynamic voltage scaling and cache reconfiguration in multitasking systems, IEEE Trans, Very Large Scale Integr. VLSI Syst. 20 (2012), no. 5, 902-910. https://doi.org/10.1109/TVLSI.2011.2116814
H. Jeon, Y.-B. Kim, and M. Choi, Standby leakage power reduction technique for nanoscale CMOS VLSI systems, IEEE Trans. Instrum. Meas. 59 (2010), no. 5, 1127-1133. https://doi.org/10.1109/TIM.2010.2044710
P. Mishra, A. Muttreja, and N. K. Jha, FinFET circuit design, Nanoelectronic Circuit Design, N. K. Jha and D. Chen, Eds. Springer New York, New York, NY, USA, 2011, pp. 23-54.
O. Weber et al., Static and dynamic power management in 14 nm FDSOI technology, Int. Conf. IC Design Tech., Leuven, Belgium, 2015, pp. 1-4.
B. Sriram, Nanoscale thin-body MOSFET design and applications, University of California, Berkeley, 2006.
H. Esmaeilzadeh et al., Dark silicon and the end of multicore scaling, Int. Symp. Comput. Arch., San Jose, California, USA, June 4-8, 2011, pp. 365-376.
B. Raghunathan et al., Cherry-picking: Exploiting process variations in dark-silicon homogeneous chip multi-processors, Design, Autom. Test Eur. Conf. Exh. (DATE), Grenoble, France, 2013, pp. 39-44.
J. Henkel et al., New trends in dark silicon, Proc. Design Autom. Conf. (DAC), San Francisco, CA, USA, 2015, pp. 1-6.
A. Asad et al., Optimization-based power and thermal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache hierarchy, Microprocess. Microsyst. 51 (2017) 76-98. https://doi.org/10.1016/j.micpro.2017.03.011
H. Bokhari et al., darkNoC: Designing energy-efficient network-on-chip with multi-Vt cells for dark silicon, Design Autom. Conf. (DAC), San Francisco, CA, USA, 2014, pp. 1-6.
H. Bokhari et al., Malleable NoC: Dark silicon inspired adaptable network-on-chip, Design, Autom. Test Eur. Conf. Exhi. (DATE), Dresden, Germany, 2015, pp. 1245-1248.
H. Jang et al., A hybrid buffer design with STT-MRAM for onchip interconnects, Int. Symp. Net. Chip (NoCS), Lyngby, Denmark, 2012, pp. 193-200.
J. Zhan et al., DimNoC: A dim silicon approach towards powerefficient on-chip network, Design Autom. Conf. (DAC), San Francisco, CA, USA, 2015, pp. 1-6.
H. Lu et al., ShuttleNoC: Boosting on-chip communication efficiency by enabling localized power adaptation, Asia South Pacific Design Autom. Conf. (ASP-DAC), Chiba/Tokyo, Japan, Jan. 2015, pp. 142-147.
J. Zhan, Y. Xie, and G. Sun, NoC-sprinting: Interconnect for fine-grained sprinting in the dark silicon era, Design Autom. Conf. (DAC), San Francisco, CA, USA, 2014, pp. 1-6.
L. Chen et al., Power punch: Towards non-blocking power-gating of NoC routers, Int. Symp High Perfor. Comput. Arch. (HPCA), Burlingame, CA, USA, 2015, pp. 378-389.
J. Zhan et al., No${\Delta}$: Leveraging delta compression for end-to-end memory access in NoC based multicores, Asia South Pacific Design Autom. Conf. ASP-DAC, Singapore, 2014, pp. 586-591.
J. Ahn, S. Yoo, and K. Choi, Prediction hybrid cache: An energy-efficient STT-RAM cache architecture, IEEE Trans. Comput. 65 (2016), 940-951. https://doi.org/10.1109/TC.2015.2435772
C. Fu et al., Sleep-aware variable partitioning for energy-efficient hybrid PRAM and DRAM main memory, Int. Symp. Low Power Elec. Design, La Jolla, CA, USA, 2014, pp. 75-80.
S. Lee, K. Kang, and C.-M. Kyung, Runtime thermal management for 3-D chip-multiprocessors with hybrid SRAM/MRAM L2 Cache, IEEE Trans, Very Large Scale Integr. VLSI Syst. 23 (2015), 520-533. https://doi.org/10.1109/TVLSI.2014.2311798
I.-C. Lin and J.-N. Chiou, High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies, IEEE Trans, Very Large Scale Integr. VLSI Syst. 23 (2015), 2149-2161. https://doi.org/10.1109/TVLSI.2014.2361150
Z. Wang et al., Adaptive placement and migration policy for an STT-RAM-based hybrid cache, Int. Symp. High Perfor. Comput. Arch. (HPCA), Orlando, FL, USA, 2014, pp. 13-24.
J. Ahn, S. Yoo, and K. Choi, DASCA: Dead write prediction assisted STT-RAM cache architecture, Int. Symp. High Perfor. Comput. Arch. (HPCA), Orlando, FL, USA, 2014, pp. 25-36.
A. Valero et al., Design of hybrid second-level caches, IEEE Trans. Comput. 64 (2015), 1884-1897. https://doi.org/10.1109/TC.2014.2346185
M. S. Haque et al., Accelerating non-volatile/hybrid processor cache design space exploration for application specific embedded systems, Design Autom. Conf. (ASP-DAC), Chiba/Tokyo, Japan, 2015, pp. 435-440.
J. Meng and A. K. Coskun, Analysis and runtime management of 3D systems with stacked DRAM for boosting energy efficiency, Design, Autom. Test Eur. Conf. Exhi. (DATE), 2012, Dresden, Germany, 2012, pp. 611-616.
Y.-T. Chen et al., Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design, Design, Autom. Test Eur. Conf. Exhi. (DATE), Dresden, Germany, 2012, pp. 45-50.
M.-T. Chang et al., Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM, Int. Symp. High Perfor. Comput. Arch. (HPCA), Shenzhen, China, 2013, pp. 143-154.
C. Wilkerson et al., Reducing cache power with low-cost, multibit error-correcting codes, Int. Symp. Comput. Arch. (ISCA), Saint-Malo, France, 2010, pp. 83-93.
A. K. Mishra et al., Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs, Int. Symp. Comput. Arch. (ISCA), San Jose, CA, USA, 2011, p. 69-80.
X. Dong and Y. Xie, System-level cost analysis and design exploration for three-dimensional integrated circuits (3D ICs), Proc. Asia South Pacific Design Autom. Conf., Yokohama, Japan, 2009, pp. 234-241.
D. H. Woo et al., An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth, Int. Symp. High Perfor. Comput. Arch. (HPCA), Bangalore, India, 2010, pp. 1-12.
K. Manna et al., TSV placement and core mapping for 3D mesh based network-on-chip design using extended Kernighan-Lin partitioning, IEEE Comput. Soc. Symp. VLSI (ISVLSI), Montpellier, France, 2015, pp. 392-397.
C.-H. Cheng, C.-H. Kuo, and S.-H. Huang, TSV number minimization using alternative paths, Int. Conf. IC Design Tech., Ho Chi Minh, Vietnam, 2011, pp. 1-4.
B. Lee and T. Kim, Algorithms for TSV resource sharing and optimization in designing 3D stacked ICs, Integr. VLSI J. 47 (2014) 184-194. https://doi.org/10.1016/j.vlsi.2013.11.001
Z. Diao et al., Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory, J. Phys. Condens. Matter 19 (2007) 165-209.
S. P. Boyd and L. Vandenberghe, Convex optimization, Cambridge, UK; New York: Cambridge University Press, 2004.
M. Grant, S. Boyd, and Y. Ye, Matlab software for disciplined convex programming [Online]. Available: www.stanford.edu/boyd/cvx/.
N. Binkert et al., The gem5 simulator, ACM SIGARCH Comput. Archit. News 39 (2011) 1-7.
V. Catania et al., Noxim: An open, extensible and cycle-accurate network on chip simulator, Int. Conf. Applicat. -specific Syst., Architectures Processors (ASAP), Toronto, Canada, 2015, pp. 162-163.
S. Li et al., McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures, Proc. Int. Symp. Microarchitecture, New York, USA, 2009, pp. 469-480.
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, CACTI 6.0: A tool to model large caches, HP Lab., 2009, pp. 22-31.
A. B. Kahng, B. Lin, and S. Nath, ORION3.0: A comprehensive NoC router estimation tool, IEEE Embed. Syst. Lett. 7 (2015) 41-45. https://doi.org/10.1109/LES.2015.2402197
X. Dong et al., NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory, Emerging Memory Technologies, Y. Xie, Ed. Springer New York, New York, NY, USA, 2014, pp. 15-50.
W. Huang et al., HotSpot: A compact thermal modeling methodology for early-stage VLSI design, IEEE Trans. Very Large Scale Integr. VLSI Syst. 14 (2006) 501-513. https://doi.org/10.1109/TVLSI.2006.876103
M. Gebhart et al., Running PARSEC 2.1 on M5, Univ. Tex. Austin Dep. Comput. Sci. Tech Rep, 2009.

ETRI Journal

Low-power heterogeneous uncore architecture for future 3D chip-multiprocessors

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)