• Title/Summary/Keyword: floating-point processor

Search Result 99, Processing Time 0.031 seconds

Trends in AI Computing Processor Semiconductors Including ETRI's Autonomous Driving AI Processor (인공지능 컴퓨팅 프로세서 반도체 동향과 ETRI의 자율주행 인공지능 프로세서)

  • Yang, J.M.;Kwon, Y.S.;Kang, S.W.
    • Electronics and Telecommunications Trends
    • /
    • v.32 no.6
    • /
    • pp.57-65
    • /
    • 2017
  • Neural network based AI computing is a promising technology that reflects the recognition and decision operation of human beings. Early AI computing processors were composed of GPUs and CPUs; however, the dramatic increment of a floating point operation requires an energy efficient AI processor with a highly parallelized architecture. In this paper, we analyze the trends in processor architectures for AI computing. Some architectures are still composed using GPUs. However, they reduce the size of each processing unit by allowing a half precision operation, and raise the processing unit density. Other architectures concentrate on matrix multiplication, and require the construction of dedicated hardware for a fast vector operation. Finally, we propose our own inAB processor architecture and introduce domestic cutting-edge processor design capabilities.

Realization of Block LMS Algorithm based on Block Floating Point (BFP 기반의 블록 LMS 알고리즘 구현)

  • Lee Kwang-Jae;Chakraborty Mriatyunjoy;Park Ju-Yong;Lee Moon-Ho
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.1 s.307
    • /
    • pp.91-100
    • /
    • 2006
  • A scheme is proposed for implementing the block LMS algorithm in a block floating point framework that permits processing of data over a wide dynamic range at a processor complexity and coat as low as that of a fixed point processor. The proposed scheme adopts appropriate formats for representing the filter coefficients and the data. Using these and a new upper bound on the step size, update relations for the filter weight mantissas and exponent are developed, taking care so that neither overflow occurs, nor are quantifies which are already very small multiplied directly. It is further shown how the mantissas of the filter coefficients and also the filter output can be evaluated faster by suitably modifying the approach of the fast block LMS algorithm

A Design of 8192-point FFT Processor using a new CBFP Scaling Method (새로운 CBFP 스케일링 방법을 적용한 8192점 FFT프로세서 설계)

  • 이승기;양대성;박광호;신경욱
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.113-116
    • /
    • 2002
  • This paper describes a design of 8192-Point pipelined FFT/IFFT processor (PFFTSk) core for DVB-T and DMT-based VBSL modems. A novel two-step convergent block floating -point (75_CBFP) scaling method is proposed to improve the signal- to-quantization-noise ratio (SeNR) of FFT/IFFT results. Our approach reduces about 80% of memory when compared with conventional CBFP methods. The PFFTSk core, which is designed in VHDL and synthesized using 0.25-${\mu}{\textrm}{m}$ CMOS library, has about 76,300 gates, 390k bits RAM, and Twiddle factor ROM of 39k bits. Simulation results show that it can safely operate up to 50-MHz clock frequency at 2.5-V supply, resulting that a 8192-point FFT/IFFT can be computed every 164-$mutextrm{s}$. The SQNR of about 60-dB is achieved.

  • PDF

Design-for-Testability of The Floating-Point DSP Processor (부동 소수점 DSP 프로세서의 테스트 용이 설계)

  • Yun, Dae-Han;Song, Oh-Young;Chang, Hoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.5B
    • /
    • pp.685-691
    • /
    • 2001
  • 본 논문은 4단계 파이프 라인과 VLIW (Very Long Instruction Word) 구조를 갖는 FLOVA라는 DSP 프로세서의 테스트용이 설계 기법을 다룬다. Full-scan design, BIST(Built-In-Self-Test), IEEE 1149.1의 기법들이 플립플롭과 floaing point unit, 내장된 메모리, I/O cell 등에 각각 적용되었다. 이러한 기법들은 테스트 용이도의 관점에서 FLOVA의 구조에 적절하게 적용되었다. 본 논문에서는 이와 같이 FLOVA에 적용된 테스트 용이 설계의 특징들을 중심으로 상세하게 기술한다.

  • PDF

The Design of a Structure of Network Co-processor for SDR(Software Defined Radio) (SDR(Software Defined Radio)에 적합한 네트워크 코프로세서 구조의 설계)

  • Kim, Hyun-Pil;Jeong, Ha-Young;Ham, Dong-Hyeon;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.2A
    • /
    • pp.188-194
    • /
    • 2007
  • In order to become ubiquitous world, the compatibility of wireless machines has become the significant characteristic of a communication terminal. Thus, SDR is the most necessary technology and standard. However, among the environment which has different communication protocol, it's difficult to make a terminal with only hardware using ASIC or SoC. This paper suggests the processor that can accelerate several communication protocol. It can be connected with main-processor, and it is specialized PHY layer of network The C-program that is modeled with the wireless protocol IEEE802.11a and IEEE802.11b which are based on widely used modulation way; OFDM and CDM is compiled with ARM cross compiler and done simulation and profiling with Simplescalar-Arm version. The result of profiling, most operations were Viterbi operations and complex floating point operations. According to this result we suggested a co-processor which can accelerate Viterbi operations and complex floating point operations and added instructions. These instructions are simulated with Simplescalar-Arm version. The result of this simulation, comparing with computing only one ARM core, the operations of Viterbi improved as fast as 4.5 times. And the operations of complex floating point improved as fast as twice. The operations of IEEE802.11a are 3 times faster, and the operations of IEEE802.11b are 1.5 times faster.

A DSP Implementation of the BICM Module for DVB-T2 Receivers (DVB-T2 수신기를 위한 BICM 모듈의 DSP 구현)

  • Lee, Jae-Ho
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.4
    • /
    • pp.591-595
    • /
    • 2011
  • In this paper, we design the hardware architecture of the BICM(Bit Interleaved Coded Modulation) module for next generation European broadcast system and implement the BICM module with DSP(Digital Signal Processor) TMS320C6474. Simulation result shows that the BER(Bit Error Rate) performance of the fixed-point BICM module using more than 8 bits is very similar to that of the floating-point BICM module.

A 95% accurate EEG-connectome Processor for a Mental Health Monitoring System

  • Kim, Hyunki;Song, Kiseok;Roh, Taehwan;Yoo, Hoi-Jun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.4
    • /
    • pp.436-442
    • /
    • 2016
  • An electroencephalogram (EEG)-connectome processor to monitor and diagnose mental health is proposed. From 19-channel EEG signals, the proposed processor determines whether the mental state is healthy or unhealthy by extracting significant features from EEG signals and classifying them. Connectome approach is adopted for the best diagnosis accuracy, and synchronization likelihood (SL) is chosen as the connectome feature. Before computing SL, reconstruction optimizer (ReOpt) block compensates some parameters, resulting in improved accuracy. During SL calculation, a sparse matrix inscription (SMI) scheme is proposed to reduce the memory size to 1/24. From the calculated SL information, a small world feature extractor (SWFE) reduces the memory size to 1/29. Finally, using SLs or small word features, radial basis function (RBF) kernel-based support vector machine (SVM) diagnoses user's mental health condition. For RBF kernels, look-up-tables (LUTs) are used to replace the floating-point operations, decreasing the required operation by 54%. Consequently, The EEG-connectome processor improves the diagnosis accuracy from 89% to 95% in Alzheimer's disease case. The proposed processor occupies $3.8mm^2$ and consumes 1.71 mW with $0.18{\mu}m$ CMOS technology.

Fast Laser Triangular Measurement System using ARM and FPGA (ARM 및 FPGA를 이용한 고속 레이저 삼각측량 시스템)

  • Lee, Sang-Moon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.1
    • /
    • pp.25-29
    • /
    • 2013
  • Recently ARM processor's processing power has been increasing rapidly as it has been applied to consumer electronics products. Because of its computing power and low power consumption, it is used to various embedded systems.( including vision processing systems.) Embedded linux that provides well-made platform and GUI is also a powerful tool for ARM based embedded systems. So short period to develop is one of major advantages to the ARM based embedded system. However, for real-time date processing applications such as an image processing system, ARM needs additional equipments such as FPGA that is suitable to parallel processing applications. In this paper, we developed an embedded system using ARM processor and FPGA. FPGA takes time consuming image preprocessing and numerical algorithms needs floating point arithmetic and user interface are implemented using the ARM processor. Overall processing speed of the system is 60 frames/sec of VGA images.

Implementation of the MPEG-1 Layer II Decoder Using the TMS320C64x DSP Processor (TMS320C64x 기반 MPEG-1 LayerII Decoder의 DSP 구현)

  • Cho, Choong-Sang;Lee, Young-Han;Oh, Yoo-Rhee;Kim, Hong-Kook
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.257-258
    • /
    • 2006
  • In this paper, we address several issues in the real time implementation of MPEG-1 Layer II decoder on a fixed-point digital signal processor (DSP), especially TMS320C6416. There is a trade-off between processing speed and the size of program/data memory for the optimal implementation. In a view of the speed optimization, we first convert the floating point operations into fixed point ones with little degradation in audio quality, and then the look-up tables used for the inverse quantization of the audio codec are forced to be located into the internal memory of the DSP. And then, window functions and filter coefficients in the decoder are precalculated and stored as constant, which makes the decoder faster even larger memory size is required. It is shown from the real-time experiments that the fixed-point implementation enables us to make the decoder with a sampling rate of 48 kHz operate with 3 times faster than real-time on TMS320C6416 at a clock rate of 600 MHz.

  • PDF

Design of Square Root and Inverse Square Root Arithmetic Units for Mobile 3D Graphic Processing (모바일 3차원 그래픽 연산을 위한 제곱근 및 역제곱근 연산기 구조 및 설계)

  • Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.3
    • /
    • pp.20-25
    • /
    • 2009
  • We propose hardware architecture of floating-point square root and inverse square root arithmetic units using lookup tables. They are used for lighting engines and shader processor for 3D graphic processing. The architecture is based on Taylor series expansion and consists of lookup tables and correction units so that the size of look-up tables are reduced. It can be applied to 32 bit floating point formats of IEEE-754 and reduced 24 bit floating point formats. The square root and inverse square root arithmetic units for 32 bit and 24 bit floating format number are designed as the proposed architecture. They can operation in a single cycle, and satisfy the precision of $10^{-5}$ required by OpenGL 1.x ES. They are designed using Verilog-HDL and the RTL codes are verified using an FPGA.