• Title/Summary/Keyword: FPGA 합성

Search Result 262, Processing Time 0.025 seconds

Implementation of a Logic Extraction Algorithm from a Bitstream Data for a Programmed FPGA (프로그램된 FPGA의 비트스트림 데이터로부터 로직추출 알고리즘 구현)

  • Jeong, Min-Young;Lee, Jae-Heum;Jang, Young-Jo;Jung, Eun-Gu;Cho, Kyoung-Rok
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.1
    • /
    • pp.10-18
    • /
    • 2018
  • This paper presents a method to resynthesize logic of a programmed FPGA from a bitstream file that is a downloaded file for Xilinx FPGA (Field Programmable Gate Array). It focuses on reconfiguring the LUT (Look Up Table) logic. The bitstream data is compared and analyzed considering various situations and various input variables such as composing other logics using the same netlist or synthesizing the same logic at various positions to find a structure of the bitstream. Based on the analyzed bitstream, we construct a truth table of the LUT by implementing various logic for one LUT. The proposed algorithm extracts the logic of the LUT based on the truth table of the generated LUT and the bitstream. The algorithm determines the input and output pins used to implement the logic in the LUT. As a result, we extract a gate level logic from a bitstream file for the targeted Xillinx FPGA.

A Lower Bound Estimation on the number of LUT′s in Time-Multiplexed FPGA Synthesis (시분할 FPGA 합성에서 LUT 개수에 대한 하한 추정 기법)

  • Eom, Seong-Yong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.7
    • /
    • pp.422-430
    • /
    • 2002
  • For a time-multiplexed FPGA, a circuit is partitioned into several subcircuits, so that they temporally share the same physical FPGA device by hardware reconfiguration. In these architectures, all the hardware reconfiguration information called contexts are generated and downloaded into the chip, and then the pre-scheduled context switches occur properly and timely. Since the maximum number of the LUT's required in the same time determines the size of the chip used in the synthesis, it needs to be minimized, if possible. Many previous work use their own approaches, which are very similar to either scheduling method in high level synthesis or multi-way circuit partitioning method, to solve the problem. In this paper, we propose a method which estimates the lower bound on the number of LUT's without performing any actual synthesis. The estimated lower bounds help to evaluate the results of the previous work. If the estimated lower bound on the number of LUT's exactly matches the number of LUT's of the result from the previous work, the result must be optimal. In contrast, if they do not match, the following two cases are expected : the more exact lower bound may exist, or we might find the new synthesis result better than the result from the previous work. Experimental results show that our lower bound estimation method is very accurate. In almost al] cases experimented, the estimated lower bounds on the number of LUT's exactly match those of the previous synthesis results respectively, implying that the best results from the previous work are optimal as well as our method predicted the exact lower bound for those examples.

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

  • Ahn, Saehyun;Kang, Suk-Ju
    • Journal of Broadcast Engineering
    • /
    • v.26 no.2
    • /
    • pp.167-174
    • /
    • 2021
  • Recently, deep learning technology is widely used in various computer vision applications, such as object recognition, classification, and image generation. In particular, the deep learning-based super-resolution has been gaining significant performance improvement. Fast super-resolution convolutional neural network (FSRCNN) is a well-known model as a deep learning-based super-resolution algorithm that output image is generated by a deconvolutional layer. In this paper, we propose an FPGA-based convolutional neural networks accelerator that considers parallel computing efficiency. In addition, the proposed method proposes Optimal-FSRCNN, which is modified the structure of FSRCNN. The number of multipliers is compressed by 3.47 times compared to FSRCNN. Moreover, PSNR has similar performance to FSRCNN. We developed a real-time image processing technology that implements on FPGA.

A Lower Bound Estimation on the Number of Micro-Registers in Time-Multiplexed FPGA Synthesis (시분할 FPGA 합성에서 마이크로 레지스터 개수에 대한 하한 추정 기법)

  • 엄성용
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.9
    • /
    • pp.512-522
    • /
    • 2003
  • For a time-multiplexed FPGA, a circuit is partitioned into several subcircuits, so that they temporally share the same physical FPGA device by hardware reconfiguration. In these architectures, all the hardware reconfiguration information called contexts are generated and downloaded into the chip, and then the pre-scheduled context switches occur properly and timely. Typically, the size of the chip required to implement the circuit depends on both the maximum number of the LUT blocks required to implement the function of each subcircuit and the maximum number of micro-registers to store results over context switches in the same time. Therefore, many partitioning or synthesis methods try to minimize these two factors. In this paper, we present a new estimation technique to find the lower bound on the number of micro-registers which can be obtained by any synthesis methods, respectively, without performing any actual synthesis and/or design space exploration. The lower bound estimation is very important in sense that it greatly helps to evaluate the results of the previous work and even the future work. If the estimated lower bound exactly matches the actual number in the actual design result, we can say that the result is guaranteed to be optimal. In contrast, if they do not match, the following two cases are expected: we might estimate a better (more exact) lower bound or we find a new synthesis result better than those of the previous work. Our experimental results show that there are some differences between the numbers of micro-registers and our estimated lower bounds. One reason for these differences seems that our estimation tries to estimate the result with the minimum micro-registers among all the possible candidates, regardless of usage of other resources such as LUTs, while the previous work takes into account both LUTs and micro-registers. In addition, it implies that our method may have some limitation on exact estimation due to the complexity of the problem itself in sense that it is much more complicated than LUT estimation and thus needs more improvement, and/or there may exist some other synthesis results better than those of the previous work.

Efficient Fixed-Point Representation for ResNet-50 Convolutional Neural Network (ResNet-50 합성곱 신경망을 위한 고정 소수점 표현 방법)

  • Kang, Hyeong-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.1
    • /
    • pp.1-8
    • /
    • 2018
  • Recently, the convolutional neural network shows high performance in many computer vision tasks. However, convolutional neural networks require enormous amount of operation, so it is difficult to adopt them in the embedded environments. To solve this problem, many studies are performed on the ASIC or FPGA implementation, where an efficient representation method is required. The fixed-point representation is adequate for the ASIC or FPGA implementation but causes a performance degradation. This paper proposes a separate optimization of representations for the convolutional layers and the batch normalization layers. With the proposed method, the required bit width for the convolutional layers is reduced from 16 bits to 10 bits for the ResNet-50 neural network. Since the computation amount of the convolutional layers occupies the most of the entire computation, the bit width reduction in the convolutional layers enables the efficient implementation of the convolutional neural networks.

Development of a small avionics unit based on FPGA with soft CPU (소프트 CPU 내장형 FPGA 기반의 소형 전장품 개발)

  • Jeon, Sang-Woon
    • Aerospace Engineering and Technology
    • /
    • v.12 no.2
    • /
    • pp.131-139
    • /
    • 2013
  • This paper describes the design and implementation of a small avionics unit based on soft CPU. A small avionics unit is developed with the soft CPU which can be wholly implemented in FPGA using logic synthesis. Design and integration of a modular architecture for versatile, reconfigurable and re-adaptable is presented with the Nios-II processor. To gain modular architecture, both at main board and sub-board level, attention has been paid to the selection of interfaces and an adequate data and power bus.

An Implementation on the Reconfigurable FPGA System of Accurate and Cost-effective Fuzzy Logic Controller (고정밀 저비용 퍼지 제어기의 재구성 가능한 FPGA 시스템 상에 구현)

  • 조인현;김대진
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1997.11a
    • /
    • pp.67-72
    • /
    • 1997
  • 본 논문은 저비용이면서 정확한 제어를 수행하는 새로운 퍼지 제어기의 재구성 가능한 FPGA 시스템상의 구현을 다룬다. 제안한 퍼지 제어기 (Fuzzy Logic Controller : FLC)의 시스템 구조와 이의 VHDL 설계 및 시뮬레이션은 다른 논문에 나타나 있다. 제안한 퍼지 제어기의 구현 과정은 다음과 같다. 각 모듈은 VHDL 언어에 의해서 기술된 뒤, Synopsys사의 FPGA 컴파일러에 의해 합성된다. 합성된 각 모듈은 Xilinx사의 XactStep 6.0에 의해 최적화 및 배치, 배선이 이루어진다. 얻어진 Xilinx rawbit 파일은 VCC사의 r2h에 의해 C 언어의 header 파일 형태의 하드웨어 object로 변환된다. C언어 형태의 하드웨어 object를 포함하는 응용 제어 프로그램이 C 컴파일러에 의해 컴파일된 후, 이 실행 파일이 재구성 가능한 FPGA 시스템 상에 다운로드된다. 제안한 퍼지 제어기를 EVCI 보드 상에 동적으로 구현하여 트럭 후진 주차 제어에 사용할 때 걸리는 시간을 Synopsys사의 VHDL 시뮬레이터와 워크스테이션상에서 C언어에 의해 구현하여 트럭 후진 주차 제어에 사용할 때 걸리는 시간을 각각 비교하였다.

  • PDF

Telemetry Standard 106-17 LDPC Decoder Design Using HLS (HLS를 이용한 텔레메트리 표준 106-17 LDPC 복호기 설계)

  • Gu, Young Mo;Kim, Seongjong;Kim, Bokki
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.49 no.4
    • /
    • pp.335-342
    • /
    • 2021
  • By using HLS when developing a communication system FPGA, HDL code can be automatically generated from a little modified C/C++ source code used for performance verification, which has the advantage of shortening the development period. In this paper, a method of designing a telemetry standard 106-17 LDPC decoder in C language is proposed using Xilinx's Vivado HLS, and by synthesizing Spartan-7 and Kintex-7 as target devices, throughput and FPGA utilization rate was compared.

FPGA Implementation and Verification of RISC-V Processor (RISC-V 프로세서의 FPGA 구현 및 검증)

  • Jongbok Lee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.5
    • /
    • pp.115-121
    • /
    • 2023
  • RISC-V is an open-source instruction set architecture, and anyone can freely design and implement a RISC-V microprocessor. This paper designes and simulates the RISC-V architecture, synthesizing it in FPGA and verifying it using logic analyzer (ILA). RISC-V core is written in SystemVerilog, which has efficient design and high reusability, and can be used in various application fields. The RISC-V core is implemented as hardware by synthesizing it on the Ultra96-V2 FPGA board using Vivado, and the accuracy and operation of the design are verified through Integrated Logic Analyzer(ILA). As a result of the experiment, it is confirmed that the designed RISC-V core performs the expected operation, and these results can contribute to the design and verification of RISC-V based systems.

New Technology Mapping Algorithm of Multiple-Output Functions for TLU-Type FPGAs (TLU형 FPGA를 위한 새로운 다출력 함수 기술 매핑 알고리즘)

  • Park, Jang-Hyun;Kim, Bo-Gwan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.11
    • /
    • pp.2923-2930
    • /
    • 1997
  • This paper describes two algorithms for technology mapping of multiple output functions into interesting and popular FPGAs (Field Programmable Gate Arrays) that lise look-up table memories. For improvement of technology mapping for FPGA, we use the functional decomposition method for multiple output functions. Two algorithms are proposed. The one is the Roth-Karp algorithm extended for multiple output functions. The other is the novel and efficient algorithm which looks for common decomposition functions through the decomposition procedure. The cost function is used to minimize the number of CLBs and nets and to improve performance of the network. Finally we compare our new algorithm with previous logic design technique. Experimental results show significant reduction in the number of CLBs and nets.

  • PDF