• Title/Summary/Keyword: Parallel Processing Structure

Search Result 303, Processing Time 0.024 seconds

On the Design Technique and VLSI Structure for a Multiplierless Quincuncial Interpolation Filter (무곱셈 대각 보간 필터의 설계 및 VLSI 구현에 관한 연구)

  • 최진우;이상욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.8
    • /
    • pp.54-65
    • /
    • 1992
  • A huge amount of multiplications is required for 2-D filtering on the image data, making it difficult to implement a real-time quincuncial interpolator. In this paper, efficient design technique and VLSI structures for 2-D multipleierless filter are presented. In the filter design, by introducing an efficient scheme for discretizing the frequency response of the prototype filter, it is shown that a significant amount of the computational burden required in the conventional techniques, such as local search, branch and bound techniques, could be saved. In the case of 5$\times$5 filter, it is found that the design technique described in this paper could save about 80% of the computation time, compared to the conventional methods, while providing a comparable performance. For a hardware implementation, two different VLSI structures for 2-D multiplierless filter are also introduced in the paper : One is for block parallel processing and the other for scan-line parallel processing. In both structure, the AP(area-period) figure improves over Wu's structure[4].

  • PDF

Design of a SIMT architecture GP-GPU Using Tile based on Graphic Pipeline Structure (타일 기반 그래픽 파이프라인 구조를 사용한 SIMT 구조 GP-GPU 설계)

  • Kim, Do-Hyun;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.20 no.1
    • /
    • pp.75-81
    • /
    • 2016
  • This paper proposes a design of the tile based on graphic pipeline to improve the graphic application performance in SIMT based GP-GPU. The proposed Tile based on graphics pipeline avoids unnecessary graphic processing operation, and processes the rasterization step in parallel. The massive data processing in parallel through SIMT architecture improve the computational performance, thereby improving the 3D graphic pipeline performance. The more vertex data of 3D model, the higher performance. The proposed structure was confirmed to improve processing performance of up to 3 times from about 1.18 times as compared to 'RAMP' and previous studies.

Current-Mode Serial-to-Parallel and Parallel-to-Serial Converter for Current-Mode OFDM FFT LSI (전류모드 OFDM FFT LSI를 위한 전류모드 직병렬/병직렬 변환기)

  • Park, Yong-Woon;Min, Jun-Gi;Hwang, Sung-Ho
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.1
    • /
    • pp.39-45
    • /
    • 2009
  • OFDM is used for achieving a high-speed data transmission in mobile wireless communication systems. Conventionally, fast Fourier transform that is the main signal processing of OFDM is implemented using digital signal processing. The DSP FFT LSI requires large power consumption. Current-mode FFT LSI with analog signal processing is one of the best solutions for high speed and low power consumption. However, for the operation of current-mode FFT LSI that has the structure of parallel-input and parallel-output, current-mode serial-to-parallel and parallel-to-serial converter are indispensable. We propose a novel current-mode SPC and PSC and full chip simulation results agree with experimental data. The proposed current-mode SPC and PSC promise the wide application of the current-mode analog signal processing in the field of low power wireless communication LSI.

  • PDF

Implementation of Reed-Solomon Decoder Using the efficient Modified Euclid Module (효율적 구조의 수정 유클리드 구조를 이용한 Reed-Solomon 복호기의 설계)

  • Kim, Dong-Sun;Chung, Duck-Jin
    • Proceedings of the KIEE Conference
    • /
    • 1998.11b
    • /
    • pp.575-578
    • /
    • 1998
  • In this paper, we propose a VLSI architecture of Reed-Solomon decoder. Our goal is the development of an architecture featuring parallel and pipelined processing to improve the speed and low power design. To achieve the this goal, we analyze the RS decoding algorithm to be used parallel and pipelined processing efficiently, and modified the Euclid's algorithm arithmetic part to apply the parallel structure in RS decoder. The overall RS decoder are compared to Shao's, and we show the 10% area efficiency than Shao's time domain decoder and three times faster, in addition, we approve the proposed RS decoders with Altera FPGA Flex 10K-50, and Implemeted with LG 0.6{\mu}$ processing.

  • PDF

FPGA Design of a Parallel Canny Edge Detector with Optimized Local Buffers (로컬 버퍼 최적화를 통한 병렬 처리 캐니 경계선 검출기의 FPGA 설계)

  • Ingi Min;Suhyun Sim;Seungwon Hwang;Sunhee Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.59-65
    • /
    • 2023
  • Edge detection in image processing and computer vision is one of the most fundamental operations. Canny edge detection algorithm has excellent performance and is currently widely used. However, it is difficult to process the algorithm in real-time because the algorithm is complex. In this study, the equations required in the algorithm were simplified to facilitate hardware implementation, and the calculation speed was increased by using a parallel structure. In particular, the size and management of local buffers were selected in consideration of parallel processing and filter size so that data could be processed without bottlenecks. It was designed in verilog and implemented in FPGA to verify operation and performance.

  • PDF

Implememtation of Fast Rasterizer processing using GPGPU based on SIMT structure (SIMT 구조 기반 GPGPU를 이용한 고속 Rasterizer 구현)

  • Kim, Chiyong
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.276-279
    • /
    • 2017
  • In this paper, SIMT structure based GPGPU (General Purpose Computing on Graphics Processing Units) is used for accelerating the Rasterizer which constitutes the screen of the display device in pixel unit. The GPU has a large number of ALUs, and the processing is very fast because of parallel processing. Therefore, in this paper, we implemented a rasterizer that generates a 3D graphics model using a CPU that performs operations sequentially and a GPU that performs operations in parallel. We confirmed that proposed rasterizer in this paper is 1.45 times better than rasterizer using Intel CPU when generating one frame.

Optimization of a Systolic Array BCH encoder with Tree-Type Structure

  • Lim, Duk-Gyu;Shakya, Sharad;Lee, Je-Hoon
    • International Journal of Contents
    • /
    • v.9 no.1
    • /
    • pp.33-37
    • /
    • 2013
  • BCH code is one of the most widely used error correcting code for the detection and correction of random errors in the modern digital communication systems. The conventional BCH encoder that is operated in bit-serial manner cannot adequate with the recent high speed appliances. Therefore, parallel encoding algorithms are always a necessity. In this paper, we introduced a new systolic array type BCH parallel encoder. To study the area and speed, several parallel factors of the systolic array encoder is compared. Furthermore, to prove the efficiency of the proposed algorithm using tree-type structure, the throughput and the area overhead was compared with its counterparts also. The proposed BCH encoder has a great flexibility in parallelization and the speed was increased by 40% than the original one. The results were implemented on synthesis and simulation on FPGA using VHDL.

A Study on the Parallel Processing of the Object Generator in a Suface Combat System LBTS (수상함 전투체계 육상시험체계용 개체생성기 구현에 적합한 병렬처리기법에 관한 연구)

  • Kim, Chang-Jin;Oh, Gwang-Baek;Jung, Young-Hwan
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.13 no.5
    • /
    • pp.734-738
    • /
    • 2010
  • Object Generator is a software to provide simulation object data(aircraft, ship, submarine, missile, torpedo) for sumulators in LBTS(Land Based Test System). but there is a burden to the system, because Object generator needs to send many object's data, display objects in a tactical screen, show object's information in a list in 1 second. This paper suggests a parallel software structure taking a few factors(deadlock, dependency) into consideration. At last, the paper shows the performance of the parallel structure's software compared with the former structure's software.

GPU-Based Parallel Collision Detection for Deformable Objects (변형 물체를 위한 GPU 기반 병렬 충돌 감지)

  • Sung, Nak-Jun;Kim, Min Sang;Hong, Min;Choi, Yoo-Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.1
    • /
    • pp.25-32
    • /
    • 2018
  • Due to heavy computational cost, deformable object simulation requires more effective collision detection method than rigid body simulation. However, when the CPU-based collision detection algorithm is purely applied to the GPU environment, the collision detection algorithm and the data structure optimized for the GPU environment are essential because the performance of the GPU can not be used properly. Therefore, we propose a GPU-based parallel collision detection algorithm for mass-spring system which is widely used for deformable object representation in this paper. The proposed method uses a parallel algorithm and data structure to reduce collision detection cost through GPU-based curling algorithm using AABB-Octree structure. In this paper, we prove the effectiveness of the proposed method by comparing the intersection test of all triangle pairs in parallel. The results of experimental tests show that the proposed method improves the performance by about 24% on average. Therefore, it is expected that the proposed method can improve the performance of real-time simulation for deformable objects.

Parallel Structure Design Method for Mass Spring Simulation (질량스프링 시뮬레이션을 위한 병렬 구조 설계 방법)

  • Sung, Nak-Jun;Choi, Yoo-Joo;Hong, Min
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.3
    • /
    • pp.55-63
    • /
    • 2019
  • Recently, the GPU computing method has been utilized to improve the performance of the physics simulation field. In particular, in the case of a deformed object simulation requiring a large amount of computation, a GPU-based parallel processing algorithm is required to guarantee real-time performance. We have studied the parallel structure design method to improve the performance of the mass spring simulation method which is one of the methods of implementing the deformation object simulation. We used OpenGL's GLSL, a graphics library that allows direct access to the GPU, and implemented the GPGPU environment using an independent pipeline, the compute shader. In order to verify the effectiveness of the parallel structure design method, the mass - spring system was implemented based on CPU and GPU. Experimental results show that the proposed method improves computation speed by about 6,000% compared to the CPU Environment. It is expected that the lightweight simulation technology can be effectively applied to the augmented reality and the virtual reality field by using the design method proposed later in this research.