Search | Korea Science

AB9: A neural processor for inference acceleration

Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.491-504
- /
- 2020
We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.
https://doi.org/10.4218/etrij.2020-0134 인용 PDF KSCI

Analysis of the Single Event Effect of the Science Technology Satellite-3 On-Board Computer under Proton Irradiation (과학기술위성 3호 온보드 컴퓨터의 양성자 빔에 의한 Single Event Effect 분석)

Kang, Dong-Soo;Oh, Dae-Soo;Ko, Dae-Ho;Baik, Jong-Chul;Kim, Hyung-Shin;Jhang, Kyoung-Son
- Journal of the Korean Society for Aeronautical & Space Sciences
- /
- v.39 no.12
- /
- pp.1174-1180
- /
- 2011
Field Programmable Gate Array(FPGA)s are replacing traditional integrated circuits for space applications due to their lower development cost as well as reconfigurability. However, they are very sensitive to single event upset (SEU) caused by space radiation environment. In order to mitigate the SEU, on-board computer of STSAT-3 employed a triple modular redundancy(TMR) and scrubbing scheme. Experimental results showed that upset threshold energy was improved from 10.6 MeV to 20.3 MeV when the TMR and the scrubbing were applied to the on-board computer. Combining the experimental results with the orbit simulation results, calculated bit-flip rate of on-board computer is 1.23 bit-flips/day assuming in the worst case of STSAT-3 orbit.
https://doi.org/10.5139/JKSAS.2011.39.12.1174 인용 PDF KSCI

Analysis of Random Variations and Variation-Robust Advanced Device Structures

Nam, Hyohyun;Lee, Gyo Sub;Lee, Hyunjae;Park, In Jun;Shin, Changhwan
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.14 no.1
- /
- pp.8-22
- /
- 2014
In the past few decades, CMOS logic technologies and devices have been successfully developed with the steady miniaturization of the feature size. At the sub-30-nm CMOS technology nodes, one of the main hurdles for continuously and successfully scaling down CMOS devices is the parametric failure caused by random variations such as line edge roughness (LER), random dopant fluctuation (RDF), and work-function variation (WFV). The characteristics of each random variation source and its effect on advanced device structures such as multigate and ultra-thin-body devices (vs. conventional planar bulk MOSFET) are discussed in detail. Further, suggested are suppression methods for the LER-, RDF-, and WFV-induced threshold voltage (VTH) variations in advanced CMOS logic technologies including the double-patterning and double-etching (2P2E) technique and in advanced device structures including the fully depleted silicon-on-insulator (FD-SOI) MOSFET and FinFET/tri-gate MOSFET at the sub-30-nm nodes. The segmented-channel MOSFET (SegFET) and junctionless transistor (JLT) that can suppress the random variations and the SegFET-/JLT-based static random access memory (SRAM) cell that enhance the read and write margins at a time, though generally with a trade-off between the read and the write margins, are introduced.
https://doi.org/10.5573/JSTS.2014.14.1.008 인용 PDF KSCI

Low-Complexity Deeply Embedded CPU and SoC Implementation (낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현)

Park, Chester Sungchung;Park, Sungkyung
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.17 no.3
- /
- pp.699-707
- /
- 2016
This paper proposes a low-complexity central processing unit (CPU) that is suitable for deeply embedded systems, including Internet of things (IoT) applications. The core features a 16-bit instruction set architecture (ISA) that leads to high code density, as well as a multicycle architecture with a counter-based control unit and adder sharing that lead to a small hardware area. A co-processor, instruction cache, AMBA bus, internal SRAM, external memory, on-chip debugger (OCD), and peripheral I/Os are placed around the core to make a system-on-a-chip (SoC) platform. This platform is based on a modified Harvard architecture to facilitate memory access by reducing the number of access clock cycles. The SoC platform and CPU were simulated and verified at the C and the assembly levels, and FPGA prototyping with integrated logic analysis was carried out. The CPU was synthesized at the ASIC front-end gate netlist level using a $0.18{\mu}m$ digital CMOS technology with 1.8V supply, resulting in a gate count of merely 7700 at a 50MHz clock speed. The SoC platform was embedded in an FPGA on a miniature board and applied to deeply embedded IoT applications.
https://doi.org/10.5762/KAIS.2016.17.3.699 인용 PDF KSCI

A study on realtime Job Scheduling for Portable Devices (포터블 기기의 실시간 처리를 위한 Job Scheduling에 관한 연구)

장석우;박인규
- Proceedings of the IEEK Conference
- /
- 1999.11a
- /
- pp.989-992
- /
- 1999
Battery로 작동되고, 소형인 제품들도 다양한 기능은 물론이고, 다양한 입출력 장치를 갖추고, 실시간으로 처리하는 시스템이 많이 요구되고 있는 실정이며, 점차 더욱 더 요구될 것으로 전망된다. 더욱이 포터블 기기는 일반적으로 MCU의 내부에 제한된 ROM type 메모리를 내장하게 되면, 데이터 메모리로 SRAM 및 flash memory를 갗추고 있다. 따라서 이러한 제한된 하드웨어 환경하에서 많은 기능을 수행해야 하는 경우가 많다. 여러 기능을 시간적인 간격으로 배분하거나, 기능 자체를 서로 배분하면서, 서로 융합하는 등의 여러 가지 기능을 수행하려다보면, 당연히 메인 소프트웨어 구조가 복잡해지며 대부분 어셈블리나 C와 같은 linear한 구조를 가지는 language로 개발되기 때문에 효과적인 프로그램 구조를 세우기는 쉽지 않다. 본 논문에서는 이를 위해 좀더 규격화된 방법을 제시하고자 한다. 보다 구체적인 구조를 연구할 목적으로 다양한 테스크를 수행하여야 하는 시스템이면서 프로세서가 필요한 포터블 기기의 한 응용 제품인 MP3 Player 에서 요구되는 job scheduling을 연구한다. 필요한 작업의 종류는 가장 부하가 많이 걸리는 압축된 MP3 file을 다시 복원시켜주는 codec 부분과 일정 시간 간격을 가지고 수행하여야 하는 외부 키보드 입력과 실시간으로 시간을 계산하는 타이머 기능, 그리고 LCD에 시간의 변화를 표시하여 주어야한다. 이와같이 수시로 작업이 발생하지만 시간 점유율이 중간 정도인 LCD 컨트롤과 메모리 컨트롤 등이다. 프로세서의 속도를 최소한으로 줄이면서 스케줄링에 의해 시간 문제를 해결하는 방법을 제시하도록 한다. 이는 기초과학 수준이 높은 북방권 국가들의 과학자들이 주로 활용되고 있다는 점에서도 잘 알 수 있으며 우리의 과학기술 약점을 보완하는 원천으로써 외국인 연구 인력이 대안이 되고 있음을 시사한다. 본 연구에서는 한국 연구 조직에서 일하는 외국인 연구자들의 동기 및 성과에 영향을 미치는 많은 요인들을 확인할 수 있었다. 상관관계, 분산분석, 회귀분석 등을 통해 활용 성과에 미치는 영향 요인들을 도출하였다. 설문 분석을 통하여 동기 및 성과 사이에는 강한 상관관계가 존재하는 것을 확인할 수 있었으며 이는 전통적인 동기 이론들과 부합한다. 대부분의 변수가 동기 및 성과에 동시에 영향을 미치는 것으로 조사되었으며 그중에서도 조직 협력 문화, 외국인 연구자의 의사소통 및 협력성, 외국인 연구자의 연구 능력 관련 변수들 및 연구 프로젝트의 기술수명주기, 외국인 연구자의 기존 기술지식의 흡수 등이 가장 중요한 변수로 나타났다. 이는 우리가 주로 중국 및 러시아 과학자들을 활용하여 상업화하는 외국인 연구인력 활용 패턴과도 일치하는 결과이다. 즉 우호적인 조직문화를 가지고 있는 연구 조직에서, 이미 과학기술 지식을 많이 가지고 있고 연구 능력도 높은 외국인 과학기술자를, 한국에서 기술이 태동 또는 성장하고 있는 연구 분야에서 활용하는 것이 가장 성과가 좋다는 사실을 확인시켜 주고 있다. 국내에서 최초로 수행된 본 연구는 외국인 연구 인력의 활용 성과가 매우 높으며, 우리의 과학기술혁신시스템을 보완하는 유효한 수단으로써 외국인 연구 인력이 중요한 대안이 될 수 있음을 발견하였다. 외국인 연구 인력을 잘 활용하기 위하여 문제점 및 개선방안을 활용 환경, 연구 인력이 중요한 대안이 될 수
PDF

A Study on Efficient Cell Queueing and Scheduling Algorithms for Multimedia Support in ATM Switches (ATM 교환기에서 멀티미디어 트래픽 지원을 위한 효율적인 셀 큐잉 및 스케줄링 알고리즘에 관한 연구)

Park, Jin-Su;Lee, Sung-Won;Kim, Young-Beom
- Journal of IKEEE
- /
- v.5 no.1 s.8
- /
- pp.100-110
- /
- 2001
In this paper, we investigated several buffer management schemes for the design of shared-memory type ATM switches, which can enhance the utilization of switch resources and can support quality-of-service (QoS) functionalities. Our results show that dynamic threshold (DT) scheme demonstrate a moderate degree of robustness close to pushout(PO) scheme, which is known to be impractical in the perspective of hardware implementation, under various traffic conditions such as traffic loads, burstyness of incoming traffic, and load non-uniformity across output ports. Next, we considered buffer management strategies to support QoS functions, which utilize parameter values obtained via connection admission control (CAC) procedures to set tile threshold values. Through simulations, we showed that the buffer management schemes adopted behave well in the sense that they can protect regulated traffic from unregulated cell traffic in allocating buffer space. In particular, it was observed that dynamic partitioning is superior in terms of QoS support than virtual partitioning.
PDF

Search Result 176, Processing Time 0.02 seconds

AB9: A neural processor for inference acceleration

Analysis of the Single Event Effect of the Science Technology Satellite-3 On-Board Computer under Proton Irradiation (과학기술위성 3호 온보드 컴퓨터의 양성자 빔에 의한 Single Event Effect 분석)

Analysis of Random Variations and Variation-Robust Advanced Device Structures

Low-Complexity Deeply Embedded CPU and SoC Implementation (낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현)

A study on realtime Job Scheduling for Portable Devices (포터블 기기의 실시간 처리를 위한 Job Scheduling에 관한 연구)

A Study on Efficient Cell Queueing and Scheduling Algorithms for Multimedia Support in ATM Switches (ATM 교환기에서 멀티미디어 트래픽 지원을 위한 효율적인 셀 큐잉 및 스케줄링 알고리즘에 관한 연구)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)