• Title/Summary/Keyword: AI 프로세서

Search Result 25, Processing Time 0.027 seconds

Performance Analyzer for Embedded AI Processor (내장형 인공지능 프로세서를 위한 성능 분석기)

  • Hwang, Dong Hyun;Yoon, Young Hyun;Han, Chang Yeop;Lee, Seung Eun
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.149-157
    • /
    • 2020
  • Recently, as interest in artificial intelligence has increased, many studies have been conducted to implement AI processors. However, the AI processor requires functional verification as well as performance verification on whether the AI processor is suitable for the application. In this paper, We propose an AI processor performance analyzer that can verify the application performance and explore the limitations of the processor. By Using the performance analyzer, we explore the limitations of the AI processor and optimize the AI model to fit an AI processor in image recognition and speech recognition applications.

Trends of Compiler Development for AI Processor (인공지능 프로세서 컴파일러 개발 동향)

  • Kim, J.K.;Kim, H.J.;Cho, Y.C.P.;Kim, H.M.;Lyuh, C.G.;Han, J.;Kwon, Y.
    • Electronics and Telecommunications Trends
    • /
    • v.36 no.2
    • /
    • pp.32-42
    • /
    • 2021
  • The rapid growth of deep-learning applications has invoked the R&D of artificial intelligence (AI) processors. A dedicated software framework such as a compiler and runtime APIs is required to achieve maximum processor performance. There are various compilers and frameworks for AI training and inference. In this study, we present the features and characteristics of AI compilers, training frameworks, and inference engines. In addition, we focus on the internals of compiler frameworks, which are based on either basic linear algebra subprograms or intermediate representation. For an in-depth insight, we present the compiler infrastructure, internal components, and operation flow of ETRI's "AI-Ware." The software framework's significant role is evidenced from the optimized neural processing unit code produced by the compiler after various optimization passes, such as scheduling, architecture-considering optimization, schedule selection, and power optimization. We conclude the study with thoughts about the future of state-of-the-art AI compilers.

Design of Stand-alone AI Processor for Embedded System (독립운용이 가능한 임베디드 인공지능 프로세서 설계)

  • Cho, Kwon Neung;Choi, Do Young;Jeong, Young Woo;Lee, Seung Eun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.600-602
    • /
    • 2021
  • With the development of the mobile industry and growing interest in artificial intelligence (AI) technology, a lot of research for AI processors which applicable to embedded systems is under study. When implementing AI to embedded systems, the design should be considered the restriction of resource and power consumption. Moreover, it is efficient to include a dedicated hardware accelerator in order to complement the low computational performance of the embedded system. In this paper, we propose an stand-alone embedded AI processor. The proposed AI processor includes a hardware accelerator that is dedicated to the distance-based AI algorithm and a general-purpose MCU that supports flexible programmability for application to various embedded systems. The AI processor was designed with Verilog HDL and verified by implementing on Field Programmable Gate Array (FPGA).

  • PDF

Trends in AI Computing Processor Semiconductors Including ETRI's Autonomous Driving AI Processor (인공지능 컴퓨팅 프로세서 반도체 동향과 ETRI의 자율주행 인공지능 프로세서)

  • Yang, J.M.;Kwon, Y.S.;Kang, S.W.
    • Electronics and Telecommunications Trends
    • /
    • v.32 no.6
    • /
    • pp.57-65
    • /
    • 2017
  • Neural network based AI computing is a promising technology that reflects the recognition and decision operation of human beings. Early AI computing processors were composed of GPUs and CPUs; however, the dramatic increment of a floating point operation requires an energy efficient AI processor with a highly parallelized architecture. In this paper, we analyze the trends in processor architectures for AI computing. Some architectures are still composed using GPUs. However, they reduce the size of each processing unit by allowing a half precision operation, and raise the processing unit density. Other architectures concentrate on matrix multiplication, and require the construction of dedicated hardware for a fast vector operation. Finally, we propose our own inAB processor architecture and introduce domestic cutting-edge processor design capabilities.

Trends of Low-Precision Processing for AI Processor (NPU 반도체를 위한 저정밀도 데이터 타입 개발 동향)

  • Kim, H.J.;Han, J.H.;Kwon, Y.S.
    • Electronics and Telecommunications Trends
    • /
    • v.37 no.1
    • /
    • pp.53-62
    • /
    • 2022
  • With increasing size of transformer-based neural networks, a light-weight algorithm and efficient AI accelerator has been developed to train these huge networks in practical design time. In this article, we present a survey of state-of-the-art research on the low-precision computational algorithms especially for floating-point formats and their hardware accelerator. We describe the trends by focusing on the work of two leading research groups-IBM and Seoul National University-which have deep knowledge in both AI algorithm and hardware architecture. For the low-precision algorithm, we summarize two efficient floating-point formats (hybrid FP8 and radix-4 FP4) with accuracy-preserving algorithms for training on the main research stream. Moreover, we describe the AI processor architecture supporting the low-bit mixed precision computing unit including the integer engine.

Design of Multipliers Optimized for CNN Inference Accelerators (CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계)

  • Lee, Jae-Woo;Lee, Jaesung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1403-1408
    • /
    • 2021
  • Recently, FPGA-based AI processors are being studied actively. Deep convolutional neural networks (CNN) are basic computational structures performed by AI processors and require a very large amount of multiplication. Considering that the multiplication coefficients used in CNN inference operation are all constants and that an FPGA is easy to design a multiplier tailored to a specific coefficient, this paper proposes a methodology to optimize the multiplier. The method utilizes 2's complement and distributive law to minimize the number of bits with a value of 1 in a multiplication coefficient, and thereby reduces the number of required stacked adders. As a result of applying this method to the actual example of implementing CNN in FPGA, the logic usage is reduced by up to 30.2% and the propagation delay is also reduced by up to 22%. Even when implemented with an ASIC chip, the hardware area is reduced by up to 35% and the delay is reduced by up to 19.2%.

Buffering analysis of CNN module based on RISC-V platform (RISC-V 플랫폼 기반 CNN 모듈의 버퍼링 분석)

  • Kim, Jin-Young;Lim, Seung-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.9-11
    • /
    • 2021
  • 최근 임베디드 엣지 컴퓨팅 디바이스에서 AI와 같은 인공지은 연산을 수행하여 AI 추론 연산의 가속화 및 분산화가 많이 이루어지고 있다. 엣지 디바이스는 임베디드 프로세서를 기반으로 AI의 가속 연산을 위해서 내부에 딥러닝 가속기를 포함하여 가속화시키는 시스템 구성을 하고 있다. 딥러닝 가속기는 복잡한 Neural Network 연산을 위한 데이터 이동이 많으며 외부 메모리와 내부 딥러닝 가속기간의 효율적인 데이터 이동 및 버퍼링이 필요하다. 본 연구에서는 엣지 디바이스 딥러닝 가속기 내부의 버퍼 구조를 모델링하고, 버퍼의 크기에 따른 버퍼링 효과를 분석해 보았다. 딥러닝 가속기 버퍼 구조는 RISC-V 프로세서 기반 가상 플랫폼에 구현되었다. 이를 통해서 딥러닝 모델에 따른 딥러닝 가속기 버퍼의 사용성을 분석할 수 있다.

Trends in AI Processor Technology (인공지능프로세서 기술 동향)

  • Lee, M.Y.;Chung, J.;Lee, J.H.;Han, J.H.;Kwon, Y.S.
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.3
    • /
    • pp.66-75
    • /
    • 2020
  • As the increasing expectations of a practical AI (Artificial Intelligence) service makes AI algorithms more complicated, an efficient processor to process AI algorithms is required. To meet this requirement, processors optimized for parallel processing, such as GPUs (Graphics Processing Units), have been widely employed. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted. This paper briefly introduces an AI processor especially for inference acceleration, developed by the Electronics and Telecommunications Research Institute, South Korea., and other global vendors for mobile and server platforms. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted.

Technical Trends in Hyperscale Artificial Intelligence Processors (초거대 인공지능 프로세서 반도체 기술 개발 동향)

  • W. Jeon;C.G. Lyuh
    • Electronics and Telecommunications Trends
    • /
    • v.38 no.5
    • /
    • pp.1-11
    • /
    • 2023
  • The emergence of generative hyperscale artificial intelligence (AI) has enabled new services, such as image-generating AI and conversational AI based on large language models. Such services likely lead to the influx of numerous users, who cannot be handled using conventional AI models. Furthermore, the exponential increase in training data, computations, and high user demand of AI models has led to intensive hardware resource consumption, highlighting the need to develop domain-specific semiconductors for hyperscale AI. In this technical report, we describe development trends in technologies for hyperscale AI processors pursued by domestic and foreign semiconductor companies, such as NVIDIA, Graphcore, Tesla, Google, Meta, SAPEON, FuriosaAI, and Rebellions.