• Title/Summary/Keyword: Hardware Accelerator

Search Result 111, Processing Time 0.033 seconds

3D Game Rendering Engine Degine using Empty space BSP tree (Empty space BSP트리를 이용한 3D 게임 렌더링 엔진 설계)

  • Kim Hak-Ran;Park Hwa-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.3 s.35
    • /
    • pp.345-352
    • /
    • 2005
  • This paper aims to design Game Rendering Engine for real-time 3D online games. Previous, in order to raise rendering speed, BSP tree was used to partitioned space in Quake Game Engine. A game engine is required to develop for rapidly escalating of 3D online games in Korea. too. Currently rendering time is saved with the hardware accelerator which is working on the high-level computer system. On the other hand, a game engine is needed to save rendering time for users with low-level computer system. Therefore, a game rendering engine is which reduces rendering time by PVS look-up table using Empty space BSP tree designed and implemented in this paper

  • PDF

Implementation of a 3D Graphics Simulator for GP-GPU (GP-GPU 개발을 위한 3차원 그래픽 시뮬레이터 구현)

  • Yeo, Dong-young;Kim, Woo-young;Jung, Hyung-Ki;Lee, Kwang-Yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.337-340
    • /
    • 2009
  • Since a hardware accelerator for 3D graphics processing GPU(Graphics Processing Unit)'s performance has been improving constantly. This is the efficient way was introduced for complex graphics application, but it is rarely used to utilize 100% resources on GPU. GP-GPU(general-purpose GPU), including operations on the GPU and supporting common operations can be handled by the processor, is noted by depending on the distribution of resources that can be effectively controlled. In this paper, the simulator was implemented that supports virtual environment of GP-GPU and available for program design and debugging. Through this, the co-design development environment support simultaneous design fast and reliable verification that are available to build the interface of three-dimensional graphics display.

  • PDF

Cascade CNN with CPU-FPGA Architecture for Real-time Face Detection (실시간 얼굴 검출을 위한 Cascade CNN의 CPU-FPGA 구조 연구)

  • Nam, Kwang-Min;Jeong, Yong-Jin
    • Journal of IKEEE
    • /
    • v.21 no.4
    • /
    • pp.388-396
    • /
    • 2017
  • Since there are many variables such as various poses, illuminations and occlusions in a face detection problem, a high performance detection system is required. Although CNN is excellent in image classification, CNN operatioin requires high-performance hardware resources. But low cost low power environments are essential for small and mobile systems. So in this paper, the CPU-FPGA integrated system is designed based on 3-stage cascade CNN architecture using small size FPGA. Adaptive Region of Interest (ROI) is applied to reduce the number of CNN operations using face information of the previous frame. We use a Field Programmable Gate Array(FPGA) to accelerate the CNN computations. The accelerator reads multiple featuremap at once on the FPGA and performs a Multiply-Accumulate (MAC) operation in parallel for convolution operation. The system is implemented on Altera Cyclone V FPGA in which ARM Cortex A-9 and on-chip SRAM are embedded. The system runs at 30FPS with HD resolution input images. The CPU-FPGA integrated system showed 8.5 times of the power efficiency compared to systems using CPU only.

Design and Implementation of CW Radar-based Human Activity Recognition System (CW 레이다 기반 사람 행동 인식 시스템 설계 및 구현)

  • Nam, Jeonghee;Kang, Chaeyoung;Kook, Jeongyeon;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.5
    • /
    • pp.426-432
    • /
    • 2021
  • Continuous wave (CW) Doppler radar has the advantage of being able to solve the privacy problem unlike camera and obtains signals in a non-contact manner. Therefore, this paper proposes a human activity recognition (HAR) system using CW Doppler radar, and presents the hardware design and implementation results for acceleration. CW Doppler radar measures signals for continuous operation of human. In order to obtain a single motion spectrogram from continuous signals, an algorithm for counting the number of movements is proposed. In addition, in order to minimize the computational complexity and memory usage, binarized neural network (BNN) was used to classify human motions, and the accuracy of 94% was shown. To accelerate the complex operations of BNN, the FPGA-based BNN accelerator was designed and implemented. The proposed HAR system was implemented using 7,673 logics, 12,105 registers, 10,211 combinational ALUTs, and 18.7 Kb of block memory. As a result of performance evaluation, the operation speed was improved by 99.97% compared to the software implementation.

Design and Implementation of BNN-based Gait Pattern Analysis System Using IMU Sensor (관성 측정 센서를 활용한 이진 신경망 기반 걸음걸이 패턴 분석 시스템 설계 및 구현)

  • Na, Jinho;Ji, Gisan;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.5
    • /
    • pp.365-372
    • /
    • 2022
  • Compared to sensors mainly used in human activity recognition (HAR) systems, inertial measurement unit (IMU) sensors are small and light, so can achieve lightweight system at low cost. Therefore, in this paper, we propose a binary neural network (BNN) based gait pattern analysis system using IMU sensor, and present the design and implementation results of an FPGA-based accelerator for computational acceleration. Six signals for gait are measured through IMU sensor, and a spectrogram is extracted using a short-time Fourier transform. In order to have a lightweight system with high accuracy, a BNN-based structure was used for gait pattern classification. It is designed as a hardware accelerator structure using FPGA for computation acceleration of binary neural network. The proposed gait pattern analysis system was implemented using 24,158 logics, 14,669 registers, and 13.687 KB of block memory, and it was confirmed that the operation was completed within 1.5 ms at the maximum operating frequency of 62.35 MHz and real-time operation was possible.

The Exploratory study of Capacity Building for Creative Incubation Center: Focus on the University Business Incubator (창조적 보육센터 역량강화 방안에 관한 탐색적 연구: 대학 보육센터를 중심으로)

  • Choi, Jong-in;Byun, YoungJo
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.11 no.2
    • /
    • pp.135-144
    • /
    • 2016
  • Korean government has invested about 400 million dollars to the business incubator using the facilities and human resources of universities and research institutes and successfully operated to contribute the economic development(1.6 billion dollars sales, 5,500 companies) and job creation(16,000 employee) in the end of 2013. Although incubators have grown rapidly, there is limited performance, like a hardware centered support, limited exploitation of resource in the university, less collaboration with the community, less star companies. This research provide a alternative capacity building direction based on the creativity and resource dependence theory. Specifically this paper suggest a building creative incubating center, traversing the valley of death, accelerator for the CPM(Capability, Product, Market) linkage, organic implementation of university resources.

  • PDF

Hardware Design of Special-Purpose Arithmetic Unit for 3-Dimensional Graphics Processor (3차원 그래픽프로세서용 특수 목적 연산장치의 하드웨어 설계)

  • Choi, Byeong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.140-142
    • /
    • 2011
  • In this paper, special purpose arithmetic unit for mobile graphics accelerator is designed. The designed processor supports six operations, such as $1/{\chi}$, $\frac{1}{{\sqrt{x}}$, $log_2x$, $2^x$, $sin(x)$, $cos(x)$. The processor adopts 2nd-order polynomial minimax approximation scheme based on IEEE floating point data format to satisfy accuracy conditions and has 5-stage pipeline structure to meet high operational rates. The SFAU processor consists of 23,000 gates and its estimated operating frequency is about 400 Mhz at operating condition of 65nm CMOS technology. Because the processor can execute all operations with 5-stage pipeline scheme, it has about 400 MOPS(million operations per second) execution rate. Thus, it can be applicable to the 3D mobile graphics processors.

  • PDF

An Efficient VLSI Architecture of Deblocking Filter in H.264 Advanced Video Coding (H.264/AVC를 위한 디블록킹 필터의 효율적인 VLSI 구조)

  • Lee, Sung-Man;Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.7
    • /
    • pp.52-60
    • /
    • 2008
  • The deblocking filter in the H.264/AVC video coding standard helps to reduce the blocking artifacts produced in the decoding process. But it consumes one third of the computational complexity in H.624/AVC decoder, which advocates an efficient design of a hardware accelerator for filtering. This paper proposes an architecture of deblocking filter using two filters and some registers for data reuse. Our architecture improves the throughput and minimize the number of external memory access by increasing data reuse. After initialization, two filters are able to perform filtering operation simultaneously. It takes only 96 clocks to complete filtering for one macroblock. We design and synthesis our architecture using Dongbuanam $0.18{\mu}m$ standard cell library and the maximum clock frequency is 200MHz.

Efficient DSP Architecture For High- Quality Audio Algorithms (고음질 오디오 알고리즘을 위한 효율적인 DSP 설계)

  • Moon, Jong-Ha;SunWoo, Myung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.5
    • /
    • pp.112-117
    • /
    • 2007
  • This paper presents specialized DSP instructions and their hardware architecture for audio coding algorithms, such as the MPEG-2/4 Advanced Audio Coding(AAC), Dolby AC-3, MPEG-2 Backward Compatible(BC), etc. The proposed architecture is specially designed and optimized for the MDCT/IMDCT(Inverse Modified Discrete Cosine Transform), and Huffman decoding of the AAC decoding algorithm. Performance comparisons show a significant improvement compared with TMS320C62x and ASDSP21060 for the MDCT/IMDCT computation. In addition, the dedicated Huffman decoding accelerator performs decoding and preparing operand in only one cycle. The proposed DPU(Data Processing Unit) consists of 107,860 gates and achieves 150 MIPS.

Implementation of Uncompressed Video Transmission System for Wireless Video Mirroring Service in Portable Multimedia Devices (휴대용 멀티미디어 기기에서의 무선 영상 미러링 서비스를 위한 비압축 영상 전송 시스템의 구현)

  • Lee, Sangjae;Jeon, Youngae;Choi, Sangsung;Cho, Kyoung-Rok
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.4
    • /
    • pp.381-391
    • /
    • 2013
  • Wireless transmission of uncompressed video guarantees higher quality with lower latency than compressed video transmission. Although current wireless technologies cannot fully cover required data rates of about a few Gb/s for full high definition resolution, some wireless technologies such as ultra-wideband (UWB) provide 1 Gb/s data rate which is adequate for uncompressed video transmission in portable devices. In this paper, we propose an uncompressed video transmission system for wireless mirroring services in portable devices. We firstly simulated the performance of uncompressed video transmission using single or multiple 1 Gb/s UWB technology. Then we implemented hardware-based uncompressed video processing block and Gb/s wireless MAC accelerator. Finally, we show the implementation result and the demonstration of uncompressed HD video transmission using multiple 1 Gb/s UWB PHYs.