• Title/Summary/Keyword: Processor Array

Search Result 234, Processing Time 0.025 seconds

Design and Implementation of MDDI Protocol for Mobile System (모바일 시스템을 위한 MDDI 프로토콜 설계 및 구현)

  • Kim, Jong-Moon;Lee, Byung-Kwon;Jung, Hoe-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.5
    • /
    • pp.1089-1094
    • /
    • 2013
  • In this study, we propose how th implement a MDDI(Mobile Display Digital Interface) protocol packet generation method in software. MDDI protocol is widely used in mobile display device. MDDI protocol packets are generated by software within micro processor. This method needs the minimum hardware configuration. In order to implementation of this method, we design a hardware platform with a high performance microprocessor and a FPGA. The packets generated by software within microprocessor are converted into LVDS signals, and transmitted by hardware within FPGA. This study suggests the benefits of the way how software can easily create a variety of packet. But, this proposed method takes more time in packet transmission compared to the traditional method. This weakness remains as a future challenge, which can be soon improved.

IR Image Processing IP Design, Implementation and Verification For SoC Design

  • Yoon, Hee-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.1
    • /
    • pp.33-39
    • /
    • 2018
  • In this paper, We studied the possibility of SoC(System On Chip) design using infrared image processing IP(Intellectual Property). And, we studied NUC(Non Uniformity Correction), BPR(Bad Pixel Recovery), and CEM(Contrast Enhancement) processing, the infrared image processing algorithm implemented by IP. We showed the logic and timing diagram implemented through the hardware block designed based on each algorithm. Each algorithm was coded as RTL(Register Transfer Level) using Verilog HDL(Hardware Description Language), ALTERA QUARTUS synthesis, and programed in FPGA(Field Programmable Gated Array). In addition, we have verified that the image data is processed at each algorithm without any problems by integrating the infrared image processing algorithm. Particularly, using the directly manufactured electronic board, Processor, SRAM, and FLASH are interconnected and tested and the verification result is presented so that the SoC type can be realized later. The infrared image processing IP proposed and verified in this study is expected to be of high value in the future SoC semiconductor fabrication. In addition, we have laid the basis for future application in the camera SoC industry.

Optimization of Pipelined Discrete Wavelet Packet Transform Based on an Efficient Transpose Form and an Advanced Functional Sharing Technique

  • Nguyen, Hung-Ngoc;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of Information Processing Systems
    • /
    • v.15 no.2
    • /
    • pp.374-385
    • /
    • 2019
  • This paper presents an optimal implementation of a Daubechies-based pipelined discrete wavelet packet transform (DWPT) processor using finite impulse response (FIR) filter banks. The feed-forward pipelined (FFP) architecture is exploited for implementation of the DWPT on the field-programmable gate array (FPGA). The proposed DWPT is based on an efficient transpose form structure, thereby reducing its computational complexity by half of the system. Moreover, the efficiency of the design is further improved by using a canonical-signed digit-based binary expression (CSDBE) and advanced functional sharing (AFS) methods. In this work, the AFS technique is proposed to optimize the convolution of FIR filter banks for DWPT decomposition, which reduces the hardware resource utilization by not requiring any embedded digital signal processing (DSP) blocks. The proposed AFS and CSDBE-based DWPT system is embedded on the Virtex-7 FPGA board for testing. The proposed design is implemented as an intellectual property (IP) logic core that can easily be integrated into DSP systems for sub-band analysis. The achieved results conclude that the proposed method is very efficient in improving hardware resource utilization while maintaining accuracy of the result of DWPT.

Energy Efficient and Low-Cost Server Architecture for Hadoop Storage Appliance

  • Choi, Do Young;Oh, Jung Hwan;Kim, Ji Kwang;Lee, Seung Eun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4648-4663
    • /
    • 2020
  • This paper proposes the Lempel-Ziv 4(LZ4) compression accelerator optimized for scale-out servers in data centers. In order to reduce CPU loads caused by compression, we propose an accelerator solution and implement the accelerator on an Field Programmable Gate Array(FPGA) as heterogeneous computing. The LZ4 compression hardware accelerator is a fully pipelined architecture and applies 16 dictionaries to enhance the parallelism for high throughput compressor. Our hardware accelerator is based on the 20-stage pipeline and dictionary architecture, highly customized to LZ4 compression algorithm and parallel hardware implementation. Proposing dictionary architecture allows achieving high throughput by comparing input sequences in multiple dictionaries simultaneously compared to a single dictionary. The experimental results provide the high throughput with intensively optimized in the FPGA. Additionally, we compare our implementation to CPU implementation results of LZ4 to provide insights on FPGA-based data centers. The proposed accelerator achieves the compression throughput of 639MB/s with fine parallelism to be deployed into scale-out servers. This approach enables the low power Intel Atom processor to realize the Hadoop storage along with the compression accelerator.

An Edge AI Device based Intelligent Transportation System

  • Jeong, Youngwoo;Oh, Hyun Woo;Kim, Soohee;Lee, Seung Eun
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.3
    • /
    • pp.166-173
    • /
    • 2022
  • Recently, studies have been conducted on intelligent transportation systems (ITS) that provide safety and convenience to humans. Systems that compose the ITS adopt architectures that applied the cloud computing which consists of a high-performance general-purpose processor or graphics processing unit. However, an architecture that only used the cloud computing requires a high network bandwidth and consumes much power. Therefore, applying edge computing to ITS is essential for solving these problems. In this paper, we propose an edge artificial intelligence (AI) device based ITS. Edge AI which is applicable to various systems in ITS has been applied to license plate recognition. We implemented edge AI on a field-programmable gate array (FPGA). The accuracy of the edge AI for license plate recognition was 0.94. Finally, we synthesized the edge AI logic with Magnachip/Hynix 180nm CMOS technology and the power consumption measured using the Synopsys's design compiler tool was 482.583mW.

Performance exploration on the number of register for Coarse grained reconfigurable array processor (재구성형 프로세서 성능과 레지스터와의 상관 관계 탐구)

  • Kim, Yongjoo;Heo, Ingoo;Yang, Seungjun;Lee, Jongwon;Choi, Youngkyu;Paek, Yunheung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.22-25
    • /
    • 2010
  • 재구성형 프로세서는 파워를 적게 사용하면서도 높은 성능을 낼 수 있는 프로세서이다. 재구성형 프로세서는 하드웨어에 최대한 많은 계산 자원을 넣으면서도 구조를 최대한 간단하게 하여 저전력 소모와 고성능을 동시에 추구하였다. 하지만 구조를 최대한 간단히 하는 과정에서 프로그램의 수행을 관리하는 많은 하드웨어 로직이 빠지게 되었는데, 이 부분은 컴파일러에서 코드를 생성할 때 스케쥴링과 수행 순서까지 정해지도록 소프트웨어적 관점에서 처리하기로 하였다. 이를 사용하기 위해 컴파일러는 입력된 프로그램을 분석하고 재구성형 프로세서에서 수행될 수 있는 형태로 코드를 각 계산자원에 매핑하는 작업을 수행해 주어야 한다. 재구성형 프로세서의 레지스터는 이 매핑 과정에서 데이터의 전달을 위해서 주로 사용되게 된다. 이 논문에서는 다양한 멀티미디어 응용 프로그램을 사용하여 멀티미디어 환경에서 재구성형 프로세서가 사용될 때 레지스터 개수가 성능에 미치는 영향을 제시한다.

Development and Field Test of the NEXTSat-2 Synthetic Aperture Radar (SAR) Antenna Onboard Vehicle (차세대소형위성 2호 영상 레이다 안테나 개발 및 차량 탑재 시험)

  • Shin, Goo-Hwan;Lee, Jung-Su;Jang, Tae Seong;Kim, Dong-Guk;Jung, Young-Bae
    • Journal of Space Technology and Applications
    • /
    • v.1 no.1
    • /
    • pp.33-40
    • /
    • 2021
  • Based on the requirements of a total weight of 42 kg or less, the NEXTSat-2 SAR (synthetic aperture radar) system was developed. As the NEXTSat-2 is a small-sized satellite, the SAR system was designed to account for about 40% of the dry mass of the payload relative to the total mass. Among the major components of the SAR system - which are an antenna, an RF transceiver, a baseband signal processor, and a power unit - a part with a particularly large dry mass is the antenna, the core of the SAR system. Whereas various selections are possible in consideration of gain and efficiency when designing the antenna, the micro-strip patch array antenna was adopted by reflecting the dry mass, power, and resolution required by the NEXTSat-2 project. In order to meet the mission requirement of the NEXTSat-2, the antenna was developed with a frequency of 9.65 GHz, a gain of 42.7 dBi, and a return loss of -15 dB. The performance of the antenna was verified by conducting a field test onboard the vehicle.

Hardware Approach to Fuzzy Inference―ASIC and RISC―

  • Watanabe, Hiroyuki
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.975-976
    • /
    • 1993
  • This talk presents the overview of the author's research and development activities on fuzzy inference hardware. We involved it with two distinct approaches. The first approach is to use application specific integrated circuits (ASIC) technology. The fuzzy inference method is directly implemented in silicon. The second approach, which is in its preliminary stage, is to use more conventional microprocessor architecture. Here, we use a quantitative technique used by designer of reduced instruction set computer (RISC) to modify an architecture of a microprocessor. In the ASIC approach, we implemented the most widely used fuzzy inference mechanism directly on silicon. The mechanism is beaded on a max-min compositional rule of inference, and Mandami's method of fuzzy implication. The two VLSI fuzzy inference chips are designed, fabricated, and fully tested. Both used a full-custom CMOS technology. The second and more claborate chip was designed at the University of North Carolina(U C) in cooperation with MCNC. Both VLSI chips had muliple datapaths for rule digital fuzzy inference chips had multiple datapaths for rule evaluation, and they executed multiple fuzzy if-then rules in parallel. The AT & T chip is the first digital fuzzy inference chip in the world. It ran with a 20 MHz clock cycle and achieved an approximately 80.000 Fuzzy Logical inferences Per Second (FLIPS). It stored and executed 16 fuzzy if-then rules. Since it was designed as a proof of concept prototype chip, it had minimal amount of peripheral logic for system integration. UNC/MCNC chip consists of 688,131 transistors of which 476,160 are used for RAM memory. It ran with a 10 MHz clock cycle. The chip has a 3-staged pipeline and initiates a computation of new inference every 64 cycle. This chip achieved an approximately 160,000 FLIPS. The new architecture have the following important improvements from the AT & T chip: Programmable rule set memory (RAM). On-chip fuzzification operation by a table lookup method. On-chip defuzzification operation by a centroid method. Reconfigurable architecture for processing two rule formats. RAM/datapath redundancy for higher yield It can store and execute 51 if-then rule of the following format: IF A and B and C and D Then Do E, and Then Do F. With this format, the chip takes four inputs and produces two outputs. By software reconfiguration, it can store and execute 102 if-then rules of the following simpler format using the same datapath: IF A and B Then Do E. With this format the chip takes two inputs and produces one outputs. We have built two VME-bus board systems based on this chip for Oak Ridge National Laboratory (ORNL). The board is now installed in a robot at ORNL. Researchers uses this board for experiment in autonomous robot navigation. The Fuzzy Logic system board places the Fuzzy chip into a VMEbus environment. High level C language functions hide the operational details of the board from the applications programme . The programmer treats rule memories and fuzzification function memories as local structures passed as parameters to the C functions. ASIC fuzzy inference hardware is extremely fast, but they are limited in generality. Many aspects of the design are limited or fixed. We have proposed to designing a are limited or fixed. We have proposed to designing a fuzzy information processor as an application specific processor using a quantitative approach. The quantitative approach was developed by RISC designers. In effect, we are interested in evaluating the effectiveness of a specialized RISC processor for fuzzy information processing. As the first step, we measured the possible speed-up of a fuzzy inference program based on if-then rules by an introduction of specialized instructions, i.e., min and max instructions. The minimum and maximum operations are heavily used in fuzzy logic applications as fuzzy intersection and union. We performed measurements using a MIPS R3000 as a base micropro essor. The initial result is encouraging. We can achieve as high as a 2.5 increase in inference speed if the R3000 had min and max instructions. Also, they are useful for speeding up other fuzzy operations such as bounded product and bounded sum. The embedded processor's main task is to control some device or process. It usually runs a single or a embedded processer to create an embedded processor for fuzzy control is very effective. Table I shows the measured speed of the inference by a MIPS R3000 microprocessor, a fictitious MIPS R3000 microprocessor with min and max instructions, and a UNC/MCNC ASIC fuzzy inference chip. The software that used on microprocessors is a simulator of the ASIC chip. The first row is the computation time in seconds of 6000 inferences using 51 rules where each fuzzy set is represented by an array of 64 elements. The second row is the time required to perform a single inference. The last row is the fuzzy logical inferences per second (FLIPS) measured for ach device. There is a large gap in run time between the ASIC and software approaches even if we resort to a specialized fuzzy microprocessor. As for design time and cost, these two approaches represent two extremes. An ASIC approach is extremely expensive. It is, therefore, an important research topic to design a specialized computing architecture for fuzzy applications that falls between these two extremes both in run time and design time/cost. TABLEI INFERENCE TIME BY 51 RULES {{{{Time }}{{MIPS R3000 }}{{ASIC }}{{Regular }}{{With min/mix }}{{6000 inference 1 inference FLIPS }}{{125s 20.8ms 48 }}{{49s 8.2ms 122 }}{{0.0038s 6.4㎲ 156,250 }} }}

  • PDF

Design and Implementation of a 128-bit Block Cypher Algorithm SEED Using Low-Cost FPGA for Embedded Systems (내장형 시스템을 위한 128-비트 블록 암호화 알고리즘 SEED의 저비용 FPGA를 이용한 설계 및 구현)

  • Yi, Kang;Park, Ye-Chul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.7
    • /
    • pp.402-413
    • /
    • 2004
  • This paper presents an Implementation of Korean standard 128-bit block cipher SEED for the small (8 or 16-bits) embedded system using a low-cost FPGA(Field Programmable Gate Array) chip. Due to their limited computing and storage capacities most of the 8-bits/16-bits small embedded systems require a separate and dedicated cryptography processor for data encryption and decryption process which require relatively heavy computation job. So, in order to integrate the SEED with other logic circuit block in a single chip we need to invent a design which minimizes the area demand while maintaining the proper performance. But, the straight-forward mapping of the SEED specification into hardware design results in exceedingly large circuit area for a low-cost FPGA capacity. Therefore, in this paper we present a design which maximize the resource sharing and utilizing the modern FPGA features to reduce the area demand resulting in the successful implementation of the SEED plus interface logic with single low-cost FPGA. We achieved 66% area accupation by our SEED design for the XC2S100 (a Spartan-II series FPGA from Xilinx) and data throughput more than 66Mbps. This Performance is sufficient for the small scale embedded system while achieving tight area requirement.

The Design of 32 Bit Microprocessor for Sequence Control Using FPGA (FPGA를 이용한 시퀀스 제어용 32비트 마이크로프로세서 설계)

  • Yang, Oh
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.6
    • /
    • pp.431-441
    • /
    • 2003
  • This paper presents the design of 32 bit microprocessor for a sequence control using a field programmable gate array(FPGA). The microprocessor was designed by a VHDL with top down method, the program memory was separated from the data memory for high speed execution of sequence instructions. Therefore it was possible that sequence instructions could be operated at the same time during the instruction fetch cycle. In order to reduce the instruction decoding time and the interface time of the data memory interface, an instruction code size was implemented by 32 bits. And the real time debug operation was implemented for easeful debugging the designed processor with a single step run, PC break point run, data memory break point run. Also in this designed microprocessor, pulse instructions, step controllers, master controllers, BM and BCD type arithmetic instructions, barrel shift instructions were implemented for sequence logic control. The FPGA was synthesized under a Xilinx's Foundation 4.2i Project Manager using a V600EHQ240 which contains 600,000 gates. Finally simulation and experiment were successfully performed respectively. For showing good performance, the designed microprocessor for the sequence logic control was compared with the H8S/2148 microprocessor which contained many bit instructions for sequence logic control. The designed processor for the sequence logic showed good performance.