• Title/Summary/Keyword: Hardware accelerator

Search Result 115, Processing Time 0.026 seconds

System-level simulation of CDMA mobile station modem ASIC (CDMA 이동국 모뎀 ASIC의 시스템 시뮬레이션)

  • 남형진;장경희;박경룡;김재석
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.6
    • /
    • pp.220-229
    • /
    • 1996
  • We presetn sytem-level simulation methodology as well as environment setup established for CDMA digtial cellular mobile station in an effort to verify CDMA modem ASIC design. To make the system-level simulation feasible, behavioral modeling of a microcontroller was first carried out with VHDL. In addition, models written in C language were also developed to provide ASIC with realistic input data. Finally, the netlist of CDMA modem ASIC was loaded on the a hardware accelerator, which was interfaced with VHDL simulator, and ismulation was performed by excuting the actual CDMA call processing software. Simulation resutls thus obtained were confirmed by comparing them with the emulation resutls from the actual system constructed on hardware modeler. these methods were proved to be effective in both discovering in advance malfunctions when embedded in the system or design errors of ASIC and reducing simulation time by a factor of as much as 20 in case of simulation at gate-level.

  • PDF

Novel IME Instructions and their Hardware Architecture for Fast Search Algorithm (고속 탐색 알고리즘에 적합한 움직임 추정 전용 명령어 및 구조 설계)

  • Bang, Ho-Il;SunWoo, Myung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.12
    • /
    • pp.58-65
    • /
    • 2011
  • This paper presents an ASIP (Application-specific Instruction Processor) for motion estimation that employs specific IME instructions and its programmable and reconfigurable hardware architecture for various video codecs, such as H.264/AVC, MPEG4, etc. With the proposed specific instructions and variable point 2D SAD hardware accelerator, it can handle the real-time processing requirement of High Definition (HD) video. With the SAD unit and its parallel operations using pattern information, the proposed IME instructions support not only full search algorithms but also other fast search algorithms. The hardware size is 25.5K gates for each Processing Element Group (PEG) which has 128 SAD Processor Elements (PEs). The proposed ASIP has been verified by the Synopsys Processor Designer and implemented by the Design Compiler using the IBM 90nm process technology. The hardware size is 453K gates for the IME unit and the operating frequency is 188MHz for 1080p@30 frame in real time. The proposed ASIP can reduce the hardware size about 26% and the number of operation cycles about 18%.

FPGA-Based Hardware Accelerator for Feature Extraction in Automatic Speech Recognition

  • Choo, Chang;Chang, Young-Uk;Moon, Il-Young
    • Journal of information and communication convergence engineering
    • /
    • v.13 no.3
    • /
    • pp.145-151
    • /
    • 2015
  • We describe in this paper a hardware-based improvement scheme of a real-time automatic speech recognition (ASR) system with respect to speed by designing a parallel feature extraction algorithm on a Field-Programmable Gate Array (FPGA). A computationally intensive block in the algorithm is identified implemented in hardware logic on the FPGA. One such block is mel-frequency cepstrum coefficient (MFCC) algorithm used for feature extraction process. We demonstrate that the FPGA platform may perform efficient feature extraction computation in the speech recognition system as compared to the generalpurpose CPU including the ARM processor. The Xilinx Zynq-7000 System on Chip (SoC) platform is used for the MFCC implementation. From this implementation described in this paper, we confirmed that the FPGA platform is approximately 500× faster than a sequential CPU implementation and 60× faster than a sequential ARM implementation. We thus verified that a parallelized and optimized MFCC architecture on the FPGA platform may significantly improve the execution time of an ASR system, compared to the CPU and ARM platforms.

Sparse Matrix Compression Technique and Hardware Design for Lightweight Deep Learning Accelerators (경량 딥러닝 가속기를 위한 희소 행렬 압축 기법 및 하드웨어 설계)

  • Kim, Sunhee;Shin, Dongyeob;Lim, Yong-Seok
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.4
    • /
    • pp.53-62
    • /
    • 2021
  • Deep learning models such as convolutional neural networks and recurrent neual networks process a huge amounts of data, so they require a lot of storage and consume a lot of time and power due to memory access. Recently, research is being conducted to reduce memory usage and access by compressing data using the feature that many of deep learning data are highly sparse and localized. In this paper, we propose a compression-decompression method of storing only the non-zero data and the location information of the non-zero data excluding zero data. In order to make the location information of non-zero data, the matrix data is divided into sections uniformly. And whether there is non-zero data in the corresponding section is indicated. In this case, section division is not executed only once, but repeatedly executed, and location information is stored in each step. Therefore, it can be properly compressed according to the ratio and distribution of zero data. In addition, we propose a hardware structure that enables compression and decompression without complex operations. It was designed and verified with Verilog, and it was confirmed that it can be used in hardware deep learning accelerators.

Implementation and Performance Evaluation of PCI express on Xilinx FPGA (Xilinx FPGA용 PCI express 구현 및 성능 분석)

  • Lee, Jin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.12
    • /
    • pp.1667-1674
    • /
    • 2018
  • Recently, speeding up real time calculation using the specialized hardware accelerator is often used in the various engineering and science area, and the accelerators are required to include PCI express interconnection between FPGA and a host computer. The implementation of the high speed PCIe for the multi-giga bytes per second transmission is one of the most difficult issue in the development of the accelerators. There are several commercialized IP solutions and research results in the literature, but these solutions are required extra cost and design period to analyze the detailed implementation method. For the hardware accelerator on Xilinx FPGA, utilizing Xilinx's XDMA PCIe IP, which is provided without extra charge, can be the best solution in terms of the development period and cost. Consequently, this paper presents the evaluation system on Zynq-7000 FPGA and Windows 10 host computer, and analyze the performance of the PCIe IP with various configuration parameters.

Design of a High-Speed Data Packet Allocation Circuit for Network-on-Chip (NoC 용 고속 데이터 패킷 할당 회로 설계)

  • Kim, Jeonghyun;Lee, Jaesung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.459-461
    • /
    • 2022
  • One of the big differences between Network-on-Chip (NoC) and the existing parallel processing system based on an off-chip network is that data packet routing is performed using a centralized control scheme. In such an environment, the best-effort packet routing problem becomes a real-time assignment problem in which data packet arriving time and processing time is the cost. In this paper, the Hungarian algorithm, a representative computational complexity reduction algorithm for the linear algebraic equation of the allocation problem, is implemented in the form of a hardware accelerator. As a result of logic synthesis using the TSMC 0.18um standard cell library, the area of the circuit designed through case analysis for the cost distribution is reduced by about 16% and the propagation delay of it is reduced by about 52%, compared to the circuit implementing the original operation sequence of the Hungarian algorithm.

  • PDF

Communication Method for Torque Control of Commercial Diesel Engine in Range-Extended Electric Trash Truck (주행거리 연장형 청소용 전기자동차에 장착된 상용 디젤엔진의 토크제어를 위한 통신 방안)

  • Park, Young-Kug
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.7
    • /
    • pp.1-8
    • /
    • 2018
  • This paper describes new communication methods for transmitting torque commands between the vehicle controller that determines the amount of power generation in a range-extended electric vehicle and the engine controller that performs it. Generally, vehicles use CAN communication, but in this case, the hardware and software of the existing engine controller must be modified. For this reason, it is not easy to apply CAN communication to small and medium sized automotive reorganize companies. Therefore, this research presents a pin-pin communication method for applying the existing mass produced engine controller to range-extended electric vehicles. The pin-pin communication method converts the driver's demand torque control map inside an mass produced engine controller into a virtual accelerator opening position according to the target speed and target torque of the engine, and converts this to a voltage signal for the existing mass produced engine controller to recognize it. The virtual accelerator opening positions are mounted in the form of a control map in the vehicle controller through the reverse conversion process in an offline environment and are determined by the engine generating power requirements and engine optimal operating point algorithm. These algorithms and signal conversion circuits for engine torque transmission have been mounted on the vehicle controller to conduct the virtual accelerator opening position conversion process according to the engine target torque and to establish the virtual accelerator voltage signal using the signal converter.

IPsec Security Server Performance Analysis Model (IPSec보안서버의 성능분석 모델)

  • 윤연상;이선영;박진섭;권순열;김용대;양상운;장태주;유영갑
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.9
    • /
    • pp.9-16
    • /
    • 2004
  • This paper proposes a performance analysis model of security servers comprising IPSec accelerators. The proposed model is based on a M/M1 queueing system with traffic load of Poisson distribution. The decoding delay has been defined to cover parameters characterizing hardware of security sorrels. Decoding delay values of a commercial IPSec accelerator are extracted yielding less than 15% differences from measured data. The extracted data are used to simulate the server system with the proposed model. The simulated performance of the cryptographic processor BCM5820 is around 75% of the published claimed level. The performance degradation of 3.125% and 14.28% are observed for 64byte packets and 1024byte packets, respectively.

A High-speed Pattern Matching Acceleration System for Network Intrusion Prevention Systems (네트워크 침입방지 시스템을 위한 고속 패턴 매칭 가속 시스템)

  • Kim Sunil
    • The KIPS Transactions:PartA
    • /
    • v.12A no.2 s.92
    • /
    • pp.87-94
    • /
    • 2005
  • Pattern matching is one of critical parts of Network Intrusion Prevention Systems (NIPS) and computationally intensive. To handle a large number of attack signature fattens increasing everyday, a network intrusion prevention system requires a multi pattern matching method that can meet the line speed of packet transfer. In this paper, we analyze Snort, a widely used open source network intrusion prevention/detection system, and its pattern matching characteristics. A multi pattern matching method for NIPS should efficiently handle a large number of patterns with a wide range of pattern lengths and case insensitive patterns matches. It should also be able to process multiple input characters in parallel. We propose a multi pattern matching hardware accelerator based on Shift-OR pattern matching algorithm. We evaluate the performance of the pattern matching accelerator under various assumptions. The performance evaluation shows that the pattern matching accelerator can be more than 80 times faster than the fastest software multi-pattern matching method used in Snort.

Weight Recovery Attacks for DNN-Based MNIST Classifier Using Side Channel Analysis and Implementation of Countermeasures (부채널 분석을 이용한 DNN 기반 MNIST 분류기 가중치 복구 공격 및 대응책 구현)

  • Youngju Lee;Seungyeol Lee;Jeacheol Ha
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.6
    • /
    • pp.919-928
    • /
    • 2023
  • Deep learning technology is used in various fields such as self-driving cars, image creation, and virtual voice implementation, and deep learning accelerators have been developed for high-speed operation in hardware devices. However, several side channel attacks that recover secret information inside the accelerator using side-channel information generated when the deep learning accelerator operates have been recently researched. In this paper, we implemented a DNN(Deep Neural Network)-based MNIST digit classifier on a microprocessor and attempted a correlation power analysis attack to confirm that the weights of deep learning accelerator could be sufficiently recovered. In addition, to counter these power analysis attacks, we proposed a Node-CUT shuffling method that applies the principle of misalignment at the time of power measurement. It was confirmed through experiments that the proposed countermeasure can effectively defend against side-channel attacks, and that the additional calculation amount is reduced by more than 1/3 compared to using the Fisher-Yates shuffling method.