Search | Korea Science

AB9: A neural processor for inference acceleration

Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.491-504
- /
- 2020
We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.
https://doi.org/10.4218/etrij.2020-0134 인용 PDF KSCI

Trends in AI Processor Technology (인공지능프로세서 기술 동향)

Lee, M.Y.;Chung, J.;Lee, J.H.;Han, J.H.;Kwon, Y.S.
- Electronics and Telecommunications Trends
- /
- v.35 no.3
- /
- pp.66-75
- /
- 2020
As the increasing expectations of a practical AI (Artificial Intelligence) service makes AI algorithms more complicated, an efficient processor to process AI algorithms is required. To meet this requirement, processors optimized for parallel processing, such as GPUs (Graphics Processing Units), have been widely employed. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted. This paper briefly introduces an AI processor especially for inference acceleration, developed by the Electronics and Telecommunications Research Institute, South Korea., and other global vendors for mobile and server platforms. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted.
https://doi.org/10.22648/ETRI.2020.J.350307 인용 PDF

Trends in AI Computing Processor Semiconductors Including ETRI's Autonomous Driving AI Processor (인공지능 컴퓨팅 프로세서 반도체 동향과 ETRI의 자율주행 인공지능 프로세서)

Yang, J.M.;Kwon, Y.S.;Kang, S.W.
- Electronics and Telecommunications Trends
- /
- v.32 no.6
- /
- pp.57-65
- /
- 2017
Neural network based AI computing is a promising technology that reflects the recognition and decision operation of human beings. Early AI computing processors were composed of GPUs and CPUs; however, the dramatic increment of a floating point operation requires an energy efficient AI processor with a highly parallelized architecture. In this paper, we analyze the trends in processor architectures for AI computing. Some architectures are still composed using GPUs. However, they reduce the size of each processing unit by allowing a half precision operation, and raise the processing unit density. Other architectures concentrate on matrix multiplication, and require the construction of dedicated hardware for a fast vector operation. Finally, we propose our own inAB processor architecture and introduce domestic cutting-edge processor design capabilities.
https://doi.org/10.22648/ETRI.2017.J.320607 인용 PDF

40-TFLOPS artificial intelligence processor with function-safe programmable many-cores for ISO26262 ASIL-D

Han, Jinho;Choi, Minseok;Kwon, Youngsu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.468-479
- /
- 2020
The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function-safe architecture is proposed for a fault-tolerance system such as an electronics system for autonomous cars. The general-purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self-recovering cache and dynamic lockstep function. The function-safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28-nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function-safe design can have ISO26262 ASIL-D with the single-point fault-tolerance rate of 99.64%.
https://doi.org/10.4218/etrij.2020-0128 인용 PDF KSCI

Performance Analyzer for Embedded AI Processor (내장형 인공지능 프로세서를 위한 성능 분석기)

Hwang, Dong Hyun;Yoon, Young Hyun;Han, Chang Yeop;Lee, Seung Eun
- Journal of Internet Computing and Services
- /
- v.21 no.5
- /
- pp.149-157
- /
- 2020
Recently, as interest in artificial intelligence has increased, many studies have been conducted to implement AI processors. However, the AI processor requires functional verification as well as performance verification on whether the AI processor is suitable for the application. In this paper, We propose an AI processor performance analyzer that can verify the application performance and explore the limitations of the processor. By Using the performance analyzer, we explore the limitations of the AI processor and optimize the AI model to fit an AI processor in image recognition and speech recognition applications.
https://doi.org/10.7472/jksii.2020.21.5.149 인용 PDF KSCI HTML

Design of Stand-alone AI Processor for Embedded System (독립운용이 가능한 임베디드 인공지능 프로세서 설계)

Cho, Kwon Neung;Choi, Do Young;Jeong, Young Woo;Lee, Seung Eun
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2021.05a
- /
- pp.600-602
- /
- 2021
With the development of the mobile industry and growing interest in artificial intelligence (AI) technology, a lot of research for AI processors which applicable to embedded systems is under study. When implementing AI to embedded systems, the design should be considered the restriction of resource and power consumption. Moreover, it is efficient to include a dedicated hardware accelerator in order to complement the low computational performance of the embedded system. In this paper, we propose an stand-alone embedded AI processor. The proposed AI processor includes a hardware accelerator that is dedicated to the distance-based AI algorithm and a general-purpose MCU that supports flexible programmability for application to various embedded systems. The AI processor was designed with Verilog HDL and verified by implementing on Field Programmable Gate Array (FPGA).
PDF

AI Processor Technology Trends (인공지능 프로세서 기술 동향)

Kwon, Youngsu
- Electronics and Telecommunications Trends
- /
- v.33 no.5
- /
- pp.121-134
- /
- 2018
The Von Neumann based architecture of the modern computer has dominated the computing industry for the past 50 years, sparking the digital revolution and propelling us into today's information age. Recent research focus and market trends have shown significant effort toward the advancement and application of artificial intelligence technologies. Although artificial intelligence has been studied for decades since the Turing machine was first introduced, the field has recently emerged into the spotlight thanks to remarkable milestones such as AlexNet-CNN and Alpha-Go, whose neural-network based deep learning methods have achieved a ground-breaking performance superior to existing recognition, classification, and decision algorithms. Unprecedented results in a wide variety of applications (drones, autonomous driving, robots, stock markets, computer vision, voice, and so on) have signaled the beginning of a golden age for artificial intelligence after 40 years of relative dormancy. Algorithmic research continues to progress at a breath-taking pace as evidenced by the rate of new neural networks being announced. However, traditional Von Neumann based architectures have proven to be inadequate in terms of computation power, and inherently inefficient in their processing of vastly parallel computations, which is a characteristic of deep neural networks. Consequently, global conglomerates such as Intel, Huawei, and Google, as well as large domestic corporations and fabless companies are developing dedicated semiconductor chips customized for artificial intelligence computations. The AI Processor Research Laboratory at ETRI is focusing on the research and development of super low-power AI processor chips. In this article, we present the current trends in computation platform, parallel processing, AI processor, and super-threaded AI processor research being conducted at ETRI.
https://doi.org/10.22648/ETRI.2018.J.330513 인용 PDF

Trends of Compiler Development for AI Processor (인공지능 프로세서 컴파일러 개발 동향)

Kim, J.K.;Kim, H.J.;Cho, Y.C.P.;Kim, H.M.;Lyuh, C.G.;Han, J.;Kwon, Y.
- Electronics and Telecommunications Trends
- /
- v.36 no.2
- /
- pp.32-42
- /
- 2021
The rapid growth of deep-learning applications has invoked the R&D of artificial intelligence (AI) processors. A dedicated software framework such as a compiler and runtime APIs is required to achieve maximum processor performance. There are various compilers and frameworks for AI training and inference. In this study, we present the features and characteristics of AI compilers, training frameworks, and inference engines. In addition, we focus on the internals of compiler frameworks, which are based on either basic linear algebra subprograms or intermediate representation. For an in-depth insight, we present the compiler infrastructure, internal components, and operation flow of ETRI's "AI-Ware." The software framework's significant role is evidenced from the optimized neural processing unit code produced by the compiler after various optimization passes, such as scheduling, architecture-considering optimization, schedule selection, and power optimization. We conclude the study with thoughts about the future of state-of-the-art AI compilers.
https://doi.org/10.22648/ETRI.2021.J.360204 인용 PDF

Trends of Low-Precision Processing for AI Processor (NPU 반도체를 위한 저정밀도 데이터 타입 개발 동향)

Kim, H.J.;Han, J.H.;Kwon, Y.S.
- Electronics and Telecommunications Trends
- /
- v.37 no.1
- /
- pp.53-62
- /
- 2022
With increasing size of transformer-based neural networks, a light-weight algorithm and efficient AI accelerator has been developed to train these huge networks in practical design time. In this article, we present a survey of state-of-the-art research on the low-precision computational algorithms especially for floating-point formats and their hardware accelerator. We describe the trends by focusing on the work of two leading research groups-IBM and Seoul National University-which have deep knowledge in both AI algorithm and hardware architecture. For the low-precision algorithm, we summarize two efficient floating-point formats (hybrid FP8 and radix-4 FP4) with accuracy-preserving algorithms for training on the main research stream. Moreover, we describe the AI processor architecture supporting the low-bit mixed precision computing unit including the integer engine.
https://doi.org/10.22648/ETRI.2022.J.370106 인용 PDF

ETRI AI Strategy #2: Strengthening Competencies in AI Semiconductor & Computing Technologies (ETRI AI 실행전략 2: AI 반도체 및 컴퓨팅시스템 기술경쟁력 강화)

Choi, S.S.;Yeon, S.J.
- Electronics and Telecommunications Trends
- /
- v.35 no.7
- /
- pp.13-22
- /
- 2020
There is no denying that computing power has been a crucial driving force behind the development of artificial intelligence today. In addition, artificial intelligence (AI) semiconductors and computing systems are perceived to have promising industrial value in the market along with rapid technological advances. Therefore, success in this field is also meaningful to the nation's growth and competitiveness. In this context, ETRI's AI strategy proposes implementation directions and tasks with the aim of strengthening the technological competitiveness of AI semiconductors and computing systems. The paper contains a brief background of ETRI's AI Strategy #2, research and development trends, and key tasks in four major areas: 1) AI processors, 2) AI computing systems, 3) neuromorphic computing, and 4) quantum computing.
https://doi.org/10.22648/ETRI.2020.J.350703 인용 PDF

Search Result 31, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)