[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.2020-0128

40-TFLOPS artificial intelligence processor with function-safe programmable many-cores for ISO26262 ASIL-D

Han, Jinho (AI Processor Research Section, Electronics and Telecommunications Research Institute)
Choi, Minseok (AI Processor Research Section, Electronics and Telecommunications Research Institute)
Kwon, Youngsu (AI SoC Research Division, Electronics and Telecommunications Research Institute)

Publication Information

ETRI Journal / v.42, no.4, 2020 , pp. 468-479 More about this Journal

Abstract

The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function-safe architecture is proposed for a fault-tolerance system such as an electronics system for autonomous cars. The general-purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self-recovering cache and dynamic lockstep function. The function-safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28-nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function-safe design can have ISO26262 ASIL-D with the single-point fault-tolerance rate of 99.64%.

Keywords

AI Processor; function-safe; ISO26262; many-core architecture;

Citations & Related Records

Reference

1	P. Gupta, An Overview of NVIDIA's Autonomous Vehicles Platform, in Proc. HotChips (Cupertino, CA, USA), 2017.
2	J. Choquette, O. Giroux, and D. Foley, Volta: Performance and Programmability, IEEE Micro 38 (2018), no. 2, 42-52. DOI
3	P. Bannon et al., Compute and redundancy solution for the full self-driving computer, in Proc. 31th Hot Chips (Silicon Valley, CA, USA), 2019.
4	ISO26262 2nd Edition: Road vehicles - Functional Safety, 2018.
5	T. Luo et al., DaDianNao: a neural network supercomputer, IEEE Trans. Comput. 66 (2017), no. 1, 73-88. DOI
6	N. Jouppi et al., In-datacenter performance analysis of a tensor processing unit, in Proc. ACM/IEEE Annu. Int. Sym. Comput. Architecture (Toronto, Canada), June 2017, pp. 1-12.
7	C. Takahashi et al., A 16nm FinFET Heterogeneous Nona-Core SoC Complying with ISO26262 ASIL-B: Achieving 10-7 Random Hardware Failures per Hour Reliability, in Proc. IEEE Int. Solid-State Circuits Conf. (San Francisco, CA, USA), 2016, pp. 80-81.
8	Reliability data handbook, IEC TR 62380, 2004.
9	ISO/PAS21448: Road vehicles - Safety of The Intended Functionality, 2019.
10	Y. Kwon et al., Function-Safe Vehicle AI Processor with Nano Core-in-Memory Architecture, in Proc. IEEE Int. Conf. Artif. Intell. Circuits Syst. (Hsinchu, Taiwan), Mar. 2019, 127-131.
11	A. Golander et al., Synchronizing Redundant Cores in a Dynamic DMR Multicore Architecture, IEEE Trans. Circuits Syst. II Exp. Briefs 56 (2009), no. 6, 474-478. DOI
12	E. Rotenberg, AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors, in Proc. Int. Symp. Fault-Tolerant Comput. (Madison, WI, USA), June 1999, pp. 84-91.
13	M. Zhang et al., Reliable ultra-low-voltage cache design for many-core systems, IEEE Trans. Circuits Syst. II Exp. Briefs 59 (2010), no. 12, 858-862. DOI
14	M. R. Kakoee et al., Variation-tolerant architecture for ultra low power shared-L1 processor clusters, IEEE Trans. Circuits Syst. II Exp. Briefs 59 (2012), no. 12, 927-931. DOI
15	Y. Kwon et al., 80mW/MHz 0.68V ultra low-power variation-tolerant superscalar dual-core processor, IEIE Trans. Smart Process. Comput. 4 (2015), no. 2, 71-77. DOI
16	A. R. Alameldeen et al., Energy-efficient cache design using variable-strength error-correcting codes, in Proc. Annu. Int. Symp. Comput. Architecture (San Jose, CA, USA), June 2011, pp. 461-471.
17	D. Rossi et al., Error correcting code analysis for cache memory high reliability and performance, in Proc. Design, Autom. Test Eur. (Grenoble, France), Mar 2011, pp. 1620-1625.
18	A. Neale et al., Adjacent-MBU-Tolerant SEC-DED-TAEC-yAED Codes for Embedded SRAMs, IEEE Trans. Circuits Syst. II Exp. Briefs 62 (2015), no. 4, 387-391. DOI
19	J. Han et al., 80 ${\mu}m$ /MHz, 850MHz fault tolerant processor with fault monitor systems, J. Semiconductor Technol. Sci. 17 (2017), no. 5, 627-635. DOI
20	J. Han et al., A fault tolerant cache system of automotive vision processor complying with ISO26262, IEEE Trans. Circuits Syst. II: Express Briefs 63 (2016), no. 12, 1146-1150. DOI
21	J. Han, Y. Kwon, and H.-J. Yoo, A 1GHz fault tolerant processor with dynamic lockstep and self-recovering cache for ADAS SoC complying with ISO26262 in automotive electronics, in Proc. IEEE Asian Solid State Circuits Conf. (Seoul, Rep. of Korea), Nov. 2017, pp. 313-316.
22	H. Kimura et al., A 40 nm flash microcontroller with 0.80 ${\mu}$ s field-oriented-control intelligent motor timer and functional safety system for next-generation EV/HEV, in Proc. IEEE Int. Solid-State Circuits Conf. (San Francisco, CA, USA), Feb. 2017, pp. 58-59.
23	A. Krizhevsky et al., ImageNet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012), 1106-1114.
24	R. Venkatasubramanian et al., A 16 nm 3.5B+ transistor >14TOPS 2-to-10W multicore SoC platform for automotive and embedded applications with integrated safety MCU, 512b vector VLIW DSP, embedded vision and imaging acceleration, in Proc. IEEE Int. Solid-State Circuits Conf. (San Francisco, CA, USA), Feb. 2020, pp. 52-54.
25	A. Parashar et al., SCNN: An accelerator for compressed-sparse convolutional neural networks, in Proc. ACM/IEEE Annu. Int. Symp. Comput. Architecture (Toronto, Canada), June 2017, pp. 27-40.
26	K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. Int. Conf. Learn. Representations (San Diego, CA, US), May 2015.
27	J. Redmon et al., You only look once: unified, real-time object detection, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Las Vegas, NV, USA), 2016, pp. 779-788.
28	C. Szegedy et al., Going deeper with convolutions, ArXiv:1409.4842, 17th Sep 2014.

2	(2021) Micromachines A Survey of Software-Defined Networks-on-Chip: Motivations, Challenges and Opportunities / 12 (2) , 183
2	(2020) 전자통신동향분석 인공지능 프로세서 컴파일러 개발 동향 / 36 (2) , 32
4	(2020) ETRI journal DiLO: Direct light detection and ranging odometry based on spherical range images for autonomous driving / 43 (4) , 603
14	(2020) Journal of circuits, systems, and computers SoC-Level Safety-Oriented Design Process in Electronic System Level Development Environment / 30 (14) , 2150254