• Title/Summary/Keyword: SystemVerilog

Search Result 197, Processing Time 0.024 seconds

Design of Multipliers Optimized for CNN Inference Accelerators (CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계)

  • Lee, Jae-Woo;Lee, Jaesung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1403-1408
    • /
    • 2021
  • Recently, FPGA-based AI processors are being studied actively. Deep convolutional neural networks (CNN) are basic computational structures performed by AI processors and require a very large amount of multiplication. Considering that the multiplication coefficients used in CNN inference operation are all constants and that an FPGA is easy to design a multiplier tailored to a specific coefficient, this paper proposes a methodology to optimize the multiplier. The method utilizes 2's complement and distributive law to minimize the number of bits with a value of 1 in a multiplication coefficient, and thereby reduces the number of required stacked adders. As a result of applying this method to the actual example of implementing CNN in FPGA, the logic usage is reduced by up to 30.2% and the propagation delay is also reduced by up to 22%. Even when implemented with an ASIC chip, the hardware area is reduced by up to 35% and the delay is reduced by up to 19.2%.

Optimization of FPGA-based DDR Memory Interface for better Compatibility and Speed (호환성 및 속도 향상을 위한 FPGA 기반 DDR 메모리 인터페이스의 최적화)

  • Kim, Dae-Woon;Kang, Bong-Soon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1914-1919
    • /
    • 2021
  • With the development of advanced industries, research on image processing hardware is essential, and timing verification at the gate level is required for actual chip operation. For FPGA-based verification, DDR3 memory interface was previously applied. But recently, as the FPGA specification has improved, DDR4 memory is used. In this case, when a previously used memory interface is applied, the timing mismatch of signals may occur and thus cannot be used. This is due to the difference in performance between CPU and memory. In this paper, the problem is solved through state optimization of the existing interface system FSM. In this process, data read speed is doubled through AXI Data Width modification. For actual case analysis, ZC706 using DDR3 memory and ZCU106 using DDR4 memory among Xilinx's SoC boards are used.

Design and Implementation of 5G mmWave LTE-TDD HD Video Streaming System for USRP RIO SDR (USRP RIO SDR을 이용한 5G 밀리미터파 LTE-TDD HD 비디오 스트리밍 시스템 설계 및 구현)

  • Gwag, Gyoung-Hun;Shin, Bong-Deug;Park, Dong-Wook;Eo, Yun-Seong;Oh, Hyuk-Jun
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.27 no.5
    • /
    • pp.445-453
    • /
    • 2016
  • This paper presents the implementation and design of the 1T-1R wireless HD video streaming systems over 28 GHz mmWave frequency using 3GPP LTE-TDD standard on NI USRP RIO SDR platform. The baseband of the system uses USRP RIO that are stored in Xilinx Kintex-7 chip to implement LTE-TDD transceiver modem, the signal that are transceived from USRP RIO up or down converts to 28 GHz by using self-designed 28 GHz RF transceiver modules and it is finally communicated HD video data through self-designed $4{\times}8$ sub array antennas. It is that communication method between USRP RIO and Host PC use PCI express ${\times}4$ to minimize delay of data to transmit and receive. The implemented system show high error vector magnitude performance above 25.85 dBc and to transceive HD video in experiment environment anywhere.

Automatic On-Chip Glitch-Free Backup Clock Changing Method for MCU Clock Failure Protection in Unsafe I/O Pin Noisy Environment (안전하지 않은 I/O핀 노이즈 환경에서 MCU 클럭 보호를 위한 자동 온칩 글리치 프리 백업 클럭 변환 기법)

  • An, Joonghyun;Youn, Jiae;Cho, Jeonghun;Park, Daejin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.12
    • /
    • pp.99-108
    • /
    • 2015
  • The embedded microcontroller which is operated by the logic gates synchronized on the clock pulse, is gradually used as main controller of mission-critical systems. Severe electrical situations such as high voltage/frequency surge may cause malfunctioning of the clock source. The tolerant system operation is required against the various external electric noise and means the robust design technique is becoming more important issue in system clock failure problems. In this paper, we propose on-chip backup clock change architecture for the automatic clock failure detection. For the this, we adopt the edge detector, noise canceller logic and glitch-free clock changer circuit. The implemented edge detector unit detects the abnormal low-frequency of the clock source and the delay chain circuit of the clock pulse by the noise canceller can cancel out the glitch clock. The externally invalid clock source by detecting the emergency status will be switched to back-up clock source by glitch-free clock changer circuit. The proposed circuits are evaluated by Verilog simulation and the fabricated IC is validated by using test equipment electrical field radiation noise

Optimized Hardware Design using Sobel and Median Filters for Lane Detection

  • Lee, Chang-Yong;Kim, Young-Hyung;Lee, Yong-Hwan
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.9 no.1
    • /
    • pp.115-125
    • /
    • 2019
  • In this paper, the image is received from the camera and the lane is sensed. There are various ways to detect lanes. Generally, the method of detecting edges uses a lot of the Sobel edge detection and the Canny edge detection. The minimum use of multiplication and division is used when designing for the hardware configuration. The images are tested using a black box image mounted on the vehicle. Because the top of the image of the used the black box is mostly background, the calculation process is excluded. Also, to speed up, YCbCr is calculated from the image and only the data for the desired color, white and yellow lane, is obtained to detect the lane. The median filter is used to remove noise from images. Intermediate filters excel at noise rejection, but they generally take a long time to compare all values. In this paper, by using addition, the time can be shortened by obtaining and using the result value of the median filter. In case of the Sobel edge detection, the speed is faster and noise sensitive compared to the Canny edge detection. These shortcomings are constructed using complementary algorithms. It also organizes and processes data into parallel processing pipelines. To reduce the size of memory, the system does not use memory to store all data at each step, but stores it using four line buffers. Three line buffers perform mask operations, and one line buffer stores new data at the same time as the operation. Through this work, memory can use six times faster the processing speed and about 33% greater quantity than other methods presented in this paper. The target operating frequency is designed so that the system operates at 50MHz. It is possible to use 2157fps for the images of 640by360 size based on the target operating frequency, 540fps for the HD images and 240fps for the Full HD images, which can be used for most images with 30fps as well as 60fps for the images with 60fps. The maximum operating frequency can be used for larger amounts of the frame processing.

Design of Low-complexity FFT Processor for Multi-mode Radar Signal Processing (멀티모드 레이다 신호처리를 위한 저복잡도 FFT 프로세서 설계)

  • Park, Yerim;Jung, Yongchul;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.24 no.2
    • /
    • pp.85-91
    • /
    • 2020
  • Recently, a multi-mode radar system was designed for efficient operation of unmanned aerial vehicles (UAVs) in various environments, which has the advantage of being able to integrate and utilize methods of the pulse Doppler (PD) radar and the frequency modulated continuous wave (FMCW) radar. For the range detection part of the multi-mode radar signal processor (RSP), the hardware structure using the FFT processor and the IFFT processor is required to be designed in a way that improves efficiency on the area side. In addition, given the radar application environment that requires a variety of distance resolutions, FFT processors need to support variable-length operations. In this paper, the FFT processor and IFFT processor in multi-mode RSP range estimation are designed and proposed as hardware for a single FFT processor that supports variable length operation of 16-1024 points. The proposed FFT processor designed in hardware description language (HDL) and can be implemented with 7,452 logic elements and 5,116 registers.

Design of Low Power Optical Channel for DisplayPort Interface (저전력 광채널용 디스플레이포트 인터페이스 설계)

  • Seo, Jun-Hyup;Park, In-Hang;Jang, Hae-Jong;Bae, Gi-Yeol;Kang, Jin-Ku
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.11
    • /
    • pp.58-63
    • /
    • 2013
  • This paper presents a transceiver design for DisplayPort interface using an optical channel. By converting the electronic channel to the optical channel, the DisplayPort's main channel can provide a high-speed data transmission for long distance. The design converting the electronic channel to the optical channel in the main channel and AUX channel of the DisplayPort is presented in this paper. Futhermore, the HPD signal transmission by using AUX channel is proposed. In order to minimize power consumption, this paper also proposed a method of controlling the TX block in the main link. The proposed system is designed by a FPGA and an optical module. The FPGA used 651 ALUT(adaptive look-up table)s, 511 resisters and 324 block memory bits. The maximum operating rate of the FPGA is 250MHz. With the proposed power control scheme, 740mW of power dissipation reduction can be achieved at the main link optical TX module.