• 제목/요약/키워드: AI Processor

검색결과 29건 처리시간 0.025초

AB9: A neural processor for inference acceleration

  • Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
    • ETRI Journal
    • /
    • 제42권4호
    • /
    • pp.491-504
    • /
    • 2020
  • We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.

인공지능프로세서 기술 동향 (Trends in AI Processor Technology)

  • 이미영;정재훈;이주현;한진호;권영수
    • 전자통신동향분석
    • /
    • 제35권3호
    • /
    • pp.66-75
    • /
    • 2020
  • As the increasing expectations of a practical AI (Artificial Intelligence) service makes AI algorithms more complicated, an efficient processor to process AI algorithms is required. To meet this requirement, processors optimized for parallel processing, such as GPUs (Graphics Processing Units), have been widely employed. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted. This paper briefly introduces an AI processor especially for inference acceleration, developed by the Electronics and Telecommunications Research Institute, South Korea., and other global vendors for mobile and server platforms. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted.

인공지능 컴퓨팅 프로세서 반도체 동향과 ETRI의 자율주행 인공지능 프로세서 (Trends in AI Computing Processor Semiconductors Including ETRI's Autonomous Driving AI Processor)

  • 양정민;권영수;강성원
    • 전자통신동향분석
    • /
    • 제32권6호
    • /
    • pp.57-65
    • /
    • 2017
  • Neural network based AI computing is a promising technology that reflects the recognition and decision operation of human beings. Early AI computing processors were composed of GPUs and CPUs; however, the dramatic increment of a floating point operation requires an energy efficient AI processor with a highly parallelized architecture. In this paper, we analyze the trends in processor architectures for AI computing. Some architectures are still composed using GPUs. However, they reduce the size of each processing unit by allowing a half precision operation, and raise the processing unit density. Other architectures concentrate on matrix multiplication, and require the construction of dedicated hardware for a fast vector operation. Finally, we propose our own inAB processor architecture and introduce domestic cutting-edge processor design capabilities.

40-TFLOPS artificial intelligence processor with function-safe programmable many-cores for ISO26262 ASIL-D

  • Han, Jinho;Choi, Minseok;Kwon, Youngsu
    • ETRI Journal
    • /
    • 제42권4호
    • /
    • pp.468-479
    • /
    • 2020
  • The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function-safe architecture is proposed for a fault-tolerance system such as an electronics system for autonomous cars. The general-purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self-recovering cache and dynamic lockstep function. The function-safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28-nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function-safe design can have ISO26262 ASIL-D with the single-point fault-tolerance rate of 99.64%.

내장형 인공지능 프로세서를 위한 성능 분석기 (Performance Analyzer for Embedded AI Processor)

  • 황동현;윤영현;한창엽;이승은
    • 인터넷정보학회논문지
    • /
    • 제21권5호
    • /
    • pp.149-157
    • /
    • 2020
  • 최근 인공지능에 대한 관심이 높아짐에 따라 인공지능 프로세서를 하드웨어로 구현하는 연구가 활발히 진행되고 있다. 하지만 인공지능 프로세서는 기존에 기능 검증을 위한 프로세서 시뮬레이션 외에 애플리케이션 단계에서 인공지능 프로세서가 해당 애플리케이션에 적합한지에 대한 성능 검증이 추가로 필요하다. 본 논문에서는 인공지능 프로세서를 활용한 애플리케이션 성능 검증과 프로세서의 한계점을 탐색할 수 있는 내장형 인공지능 프로세서를 위한 성능 분석기를 제안한다. 본 논문은 내장형 인공지능 프로세서를 위한 성능 분석기를 구현하기 위하여 기존에 구현된 인공지능 프로세서의 구조를 분석하고 이를 기반으로 인공지능 프로세서를 모사하는 내장형 인공지능 프로세서를 위한 성능 분석기를 구현한다. 내장형 인공지능 프로세서를 위한 성능 분석기를 활용해 이미지 인식, 음성 인식 애플리케이션에서 인공지능 프로세서의 성능 분석 및 한계점을 탐색하고, 제한된 메모리 크기 안에서 인공지능 프로세서의 구조를 최적화한다.

독립운용이 가능한 임베디드 인공지능 프로세서 설계 (Design of Stand-alone AI Processor for Embedded System)

  • 조권능;최도영;정영우;이승은
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2021년도 춘계학술대회
    • /
    • pp.600-602
    • /
    • 2021
  • 모바일 산업의 발달과 인공지능 기술에 대한 관심이 높아지면서 임베디드 시스템에 적용 가능한 인공지능 프로세서에 대한 연구가 활발히 진행되고 있다. 임베디드 시스템에서 인공지능을 구현하는 경우 제한된 자원과 소비 전력을 고려한 설계가 필수적이며, 낮은 연산 성능을 보완할 수 있는 전용 가속기를 포함하는 것이 효율적이다. 본 연구는 독립 운용이 가능한 임베디드 인공지능 프로세서를 제안한다. 제안하는 인공지능 프로세서는 거리연산 기반의 경량 인공지능 알고리즘이 적용된 하드웨어 가속기를 포함하며, 프로그래밍 가능한 범용 프로세서와 함께 운용되어 다양한 임베디드 시스템에 적용 가능하다. 인공지능 프로세서는 Verilog HDL을 사용하여 설계되었으며 Field Programmable Gate Array (FPGA)를 통해 기능을 검증하였다.

  • PDF

인공지능 프로세서 기술 동향 (AI Processor Technology Trends)

  • 권영수
    • 전자통신동향분석
    • /
    • 제33권5호
    • /
    • pp.121-134
    • /
    • 2018
  • The Von Neumann based architecture of the modern computer has dominated the computing industry for the past 50 years, sparking the digital revolution and propelling us into today's information age. Recent research focus and market trends have shown significant effort toward the advancement and application of artificial intelligence technologies. Although artificial intelligence has been studied for decades since the Turing machine was first introduced, the field has recently emerged into the spotlight thanks to remarkable milestones such as AlexNet-CNN and Alpha-Go, whose neural-network based deep learning methods have achieved a ground-breaking performance superior to existing recognition, classification, and decision algorithms. Unprecedented results in a wide variety of applications (drones, autonomous driving, robots, stock markets, computer vision, voice, and so on) have signaled the beginning of a golden age for artificial intelligence after 40 years of relative dormancy. Algorithmic research continues to progress at a breath-taking pace as evidenced by the rate of new neural networks being announced. However, traditional Von Neumann based architectures have proven to be inadequate in terms of computation power, and inherently inefficient in their processing of vastly parallel computations, which is a characteristic of deep neural networks. Consequently, global conglomerates such as Intel, Huawei, and Google, as well as large domestic corporations and fabless companies are developing dedicated semiconductor chips customized for artificial intelligence computations. The AI Processor Research Laboratory at ETRI is focusing on the research and development of super low-power AI processor chips. In this article, we present the current trends in computation platform, parallel processing, AI processor, and super-threaded AI processor research being conducted at ETRI.

인공지능 프로세서 컴파일러 개발 동향 (Trends of Compiler Development for AI Processor)

  • 김진규;김혜지;조용철;김현미;여준기;한진호;권영수
    • 전자통신동향분석
    • /
    • 제36권2호
    • /
    • pp.32-42
    • /
    • 2021
  • The rapid growth of deep-learning applications has invoked the R&D of artificial intelligence (AI) processors. A dedicated software framework such as a compiler and runtime APIs is required to achieve maximum processor performance. There are various compilers and frameworks for AI training and inference. In this study, we present the features and characteristics of AI compilers, training frameworks, and inference engines. In addition, we focus on the internals of compiler frameworks, which are based on either basic linear algebra subprograms or intermediate representation. For an in-depth insight, we present the compiler infrastructure, internal components, and operation flow of ETRI's "AI-Ware." The software framework's significant role is evidenced from the optimized neural processing unit code produced by the compiler after various optimization passes, such as scheduling, architecture-considering optimization, schedule selection, and power optimization. We conclude the study with thoughts about the future of state-of-the-art AI compilers.

NPU 반도체를 위한 저정밀도 데이터 타입 개발 동향 (Trends of Low-Precision Processing for AI Processor)

  • 김혜지;한진호;권영수
    • 전자통신동향분석
    • /
    • 제37권1호
    • /
    • pp.53-62
    • /
    • 2022
  • With increasing size of transformer-based neural networks, a light-weight algorithm and efficient AI accelerator has been developed to train these huge networks in practical design time. In this article, we present a survey of state-of-the-art research on the low-precision computational algorithms especially for floating-point formats and their hardware accelerator. We describe the trends by focusing on the work of two leading research groups-IBM and Seoul National University-which have deep knowledge in both AI algorithm and hardware architecture. For the low-precision algorithm, we summarize two efficient floating-point formats (hybrid FP8 and radix-4 FP4) with accuracy-preserving algorithms for training on the main research stream. Moreover, we describe the AI processor architecture supporting the low-bit mixed precision computing unit including the integer engine.

ETRI AI 실행전략 2: AI 반도체 및 컴퓨팅시스템 기술경쟁력 강화 (ETRI AI Strategy #2: Strengthening Competencies in AI Semiconductor & Computing Technologies)

  • 최새솔;연승준
    • 전자통신동향분석
    • /
    • 제35권7호
    • /
    • pp.13-22
    • /
    • 2020
  • There is no denying that computing power has been a crucial driving force behind the development of artificial intelligence today. In addition, artificial intelligence (AI) semiconductors and computing systems are perceived to have promising industrial value in the market along with rapid technological advances. Therefore, success in this field is also meaningful to the nation's growth and competitiveness. In this context, ETRI's AI strategy proposes implementation directions and tasks with the aim of strengthening the technological competitiveness of AI semiconductors and computing systems. The paper contains a brief background of ETRI's AI Strategy #2, research and development trends, and key tasks in four major areas: 1) AI processors, 2) AI computing systems, 3) neuromorphic computing, and 4) quantum computing.