• Title/Summary/Keyword: Hardware Accelerator

Search Result 115, Processing Time 0.03 seconds

Implementation of a Window-Masking Method and the Soft-core Processor based TDD Switching Control SoC FPGA System (윈도 마스킹 기법과 Soft-core Processor 기반 TDD 스위칭 제어 SoC 시스템 FPGA 구현)

  • Hee-Jin Yang;Jeung-Sub Lee;Han-Sle Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.3
    • /
    • pp.166-175
    • /
    • 2024
  • In this paper, the Window-Masking Method and HAT (Hardware Attached Top) CPU SoM (System on Module) are used to improve the performance and reduce the weight of the MANET (Mobile Ad-hoc Network) network synchronization system using time division redundancy. We propose converting it into a RISC-V based soft-core MCU and mounting it on an FPGA, a hardware accelerator. It was also verified through experiment. In terms of performance, by applying the proposed technique, the synchronization acquisition range is from -50dBm to +10dBm to -60dBm to +10dBm, the lowest input level for synchronization is increased by 20% from -50dBm to -60dBm, and the detection delay (Latency) is 220ns. Reduced by 43% to 125ns. In terms of weight reduction, computing resources (48%), size (33%), and weight (27%) were reduced by an average of 36% by replacing with soft-core MCU.

Design of IPSec Hardware Accelerator IP

  • Ha Chang-Soo;Kim Joo-Hong;Cho Hyun-Sook;Park Myoung-Soo;Choi Byeong-Yoon
    • Proceedings of the Korean Institute of Communication Sciences Conference
    • /
    • 2004.07a
    • /
    • pp.341-341
    • /
    • 2004
  • PDF

MultiRing An Efficient Hardware Accelerator for Design Rule Checking (멀티링 설계규칙검사를 위한 효과적인 하드웨어 가속기)

  • 노길수;경종민
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.6
    • /
    • pp.1040-1048
    • /
    • 1987
  • We propose a hardware architecture called Multiring which is applicable for various geometrical operations on rectilinear objects such as design rule checking in VLSI layout and many image processing operations including noise suppression and coutour extraction. It has both a fast execution speed and extremely high flexibility. The whole architecture is mainly divided into four parts` I/O between host and Multiring, ring memory, linear processor array and instruction decoder. Data transmission between host and Multiring is bit serial thereby reducing the bandwidth requirement for teh channel and the number of external pins, while each row data in the bit map stored in ring memory is processed in the corresponding processor in full parallelism. Each processor is simultaneously configured by the instruction decoder/controller to perform one of the 16 basic instructions such as Boolean (AND, OR, NOT, and Copy), geometrical(Expand and Shrink), and I/O operations each ring cycle, which gives Multiring maximal flexibility in terms of design rule change or the instruction set enhancement. Correct functional behavior of Multiring was confirmed by successfully running a software simulator having one-to-one structural correspondence to the Multiring hardware.

  • PDF

CNN-based watermarking processor design optimization method (CNN기반의 워터마킹 프로세서 설계 최적화 방법)

  • Kang, Ji-Won;Lee, Jae-Eun;Seo, Young-Ho;Kim, Dong-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.644-645
    • /
    • 2021
  • In this paper, we propose a hardware structure of a watermarking processor based on deep learning technology to protect the intellectual property rights of ultra-high resolution digital images and videos. We propose an optimization methodology to implement a deep learning-based watermarking algorithm in hardware.

  • PDF

Multi-threaded system to support reconfigurable hardware accelerators on Zynq SoC (Zynq SoC에서 재구성 가능한 하드웨어 가속기를 지원하는 멀티쓰레딩 시스템 설계)

  • Shin, Hyeon-Jun;Lee, Joo-Heung
    • Journal of IKEEE
    • /
    • v.24 no.1
    • /
    • pp.186-193
    • /
    • 2020
  • In this paper, we propose a multi-threading system to support reconfigurable hardware accelerators on Zynq SoC. We implement high-performance JPEG decoder with reconfigurable 2D IDCT hardware accelerators to achieve maximum performance available on the platform. In this system, up to four reconfigurable hardware accelerators synchronized with SW threads can be dynamically reconfigured to provide adaptive computing capabilities according to the given image resolution and the compression ratio. JPEG decoding is operated using images with resolutions 480p, 720p, 1080p at the compression ratio of 7:1-109:1. We show that significant performance improvements are achieved as the image resolution or the compression ratio increase. For 1080p resolution, the performance improvement is up to 79.11 times with throughput speed of 99 fps at the compression ratio 17:1.

Design of FPGA Hardware Accelerator for Information Security System (정보보호 시스템을 위한 FPGA 기반 하드웨어 가속기 설계)

  • Cha, Jeong Woo;Kim, Chang Hoon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.18 no.2
    • /
    • pp.1-12
    • /
    • 2013
  • Information Security System is implemented in software, hardware and FPGA device. Implementation of S/W provides high flexibility about various information security algorithm, but it has very vulnerable aspect of speed, power, safety, and performing ASIC is really excellent aspect of speed and power but don't support various security platform because of feature's realization. To improve conflict of these problems, implementation of recent FPGA device is really performed. The goal of this thesis is to design and develop a FPGA hardware accelerator for information security system. It performs as AES, SHA-256 and ECC and is controlled by the Integrated Interface. Furthermore, since the proposed Security Information System can satisfy various requirements and some constraints, it can be applied to numerous information security applications from low-cost applications and high-speed communication systems.

Efficient Loop Accelerator for Motion Estimation Specific Instruction-set Processor (움직임 추정 전용 프로세서를 위한 효율적인 루프 가속기)

  • Ha, Jae Myung;Jung, Ho Sun;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.7
    • /
    • pp.159-166
    • /
    • 2013
  • This paper proposes an efficient loop accelerator for a motion estimation specific instruction-set processor. ME algorithms in nature contain complex and multiple loop operations. To support efficient hardware (HW) loop operations, this paper introduces four loop instructions and their specific HW architecture. The simulation results show that the proposed loop accelerator can reduce about 29% average instruction cycles for ME early-termination schemes compared with typical implementation having a combination of compare and conditional jump instructions. The proposed loop accelerator of the motion estimation specific instruction-set processor can significantly reduce the number of program memory accesses and greatly save power consumption. Hence, it can be quite suitable for low power and flexible ME implementation.

CNN Accelerator Architecture using 3D-stacked RRAM Array (3차원 적층 구조 저항변화 메모리 어레이를 활용한 CNN 가속기 아키텍처)

  • Won Joo Lee;Yoon Kim;Minsuk Koo
    • Journal of IKEEE
    • /
    • v.28 no.2
    • /
    • pp.234-238
    • /
    • 2024
  • This paper presents a study on the integration of 3D-stacked dual-tip RRAM with a CNN accelerator architecture, leveraging its low drive current characteristics and scalability in a 3D stacked configuration. The dual-tip structure is utilized in a parallel connection format in a synaptic array to implement multi-level capabilities. It is configured within a Network-on-chip style accelerator along with various hardware blocks such as DAC, ADC, buffers, registers, and shift & add circuits, and simulations were performed for the CNN accelerator. The quantization of synaptic weights and activation functions was assumed to be 16-bit. Simulation results of CNN operations through a parallel pipeline for this accelerator architecture achieved an operational efficiency of approximately 370 GOPs/W, with accuracy degradation due to quantization kept within 3%.

Implementation of OpenVG Accelerator based on Multi-Core GP-GPU (멀티코어 GP-GPU 기반의 OpenVG 가속기 구현)

  • Lee, Kwang-Yeob;Park, Jong-Il;Lee, Chan-Ho
    • Journal of IKEEE
    • /
    • v.15 no.3
    • /
    • pp.248-254
    • /
    • 2011
  • Recently, processing burden of CPU is growing because of graphical user interface according to enhance the performance of mobile devices and various graphical effects and creation of contents with 3D graphical effect or Flash animation. Therefore, the GPU are introduced to mobile device for support to variety contents. In this paper, OpenVG accelerator was implemented based on multi-core GP-GPU. OpenVG accelerator is verified using the sample image provided by Khronos group, and overall function is processed by only instruction set without dedicate hardware. The performance of processing the Tiger Image was 2 frames/sec.

Design and Implementation of Hangul Outline Font Generation Accelerator (한글 외곽선 글자체 생성 가속기의 설계 및 구현)

  • 배종홍;황규철;이윤태;경종민
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.29A no.2
    • /
    • pp.100-106
    • /
    • 1992
  • In this pape, we designed and implemented a hardware accelerator for the generation of bit map font from Hangul outline font description for LBP (Laser Beam Printer) and screen applications Whole system was implemented as a double size PC/AT application board which consists of processing bolck and display block. The processing block has a master processor (MC68000)and two slave processors which are MC56001 and KAFOG chip responsible for the short vector generation. In the display block, TMS34061 was used for monitor display and GP425 was used for LBP print out. The resolution of the monitor is 640$\times$480 and that of LBP is 2385$\times$3390. The current system called KHGB90-B generates about 100 characters per second where each character consists of 32$\times$32 bits

  • PDF