• Title/Summary/Keyword: 데이터 레벨 병렬성

Search Result 13, Processing Time 0.019 seconds

High-Performance Line-Based Filtering Architecture Using Multi-Filter Lifting Method (다중필터 리프팅 방식을 이용한 고성능 라인기반 필터링 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.8
    • /
    • pp.75-84
    • /
    • 2004
  • In this paper, we proposed an efficient hardware architecture of line-based lifting algorithm for Motion JPEG2000. We proposed a new architecture of a lifting-based filtering cell which has an optimized and simplified structure. It was implemented in a hardware accommodating both (9,7) and (5,4) filter. Since the output rate is linearly proportional to the input rate, one can obtain the high throughput through parallel operation simply by adding the hardware units. It was implemented into both of ASIC and FPGA The 0.35${\mu}{\textrm}{m}$ CMOS library from Samsung was used for ASIC and Altera was the target for FRGA. In ASIC, the proposed architecture used 41,592 gates for the lifting arithmetic and 128 Kbit memory. For FPGA it used 6,520 LEs(Logic Elements) and 128 ESBs(Embedded System Blocks). The implementations were stably operated in the clock frequency of 128MHz and 52MHz, respectively.

A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization (GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법)

  • Thuan, Do Cong;Choi, Yong;Kim, Jong Myon;Kim, Cheol Hong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.5
    • /
    • pp.219-230
    • /
    • 2017
  • General-Purpose Graphics Processing Units (GPGPUs) build massively parallel architecture and apply multithreading technology to explore parallelism. By using programming models like CUDA, and OpenCL, GPGPUs are becoming the best in exploiting plentiful thread-level parallelism caused by parallel applications. Unfortunately, modern GPGPU cannot efficiently utilize its available hardware resources for numerous general-purpose applications. One of the primary reasons is the inefficiency of existing warp/thread block schedulers in hiding long latency instructions, resulting in lost opportunity to improve the performance. This paper studies the effects of hardware thread scheduling policy on GPGPU performance. We propose a novel warp scheduling policy that can alleviate the drawbacks of the traditional round-robin policy. The proposed warp scheduler first classifies the warps of a thread block into two groups, warps with long latency and warps with short latency and then schedules the warps with long latency before the warps with short latency. Furthermore, to support the proposed warp scheduler, we also propose a supplemental technique that can dynamically reduce the number of streaming multiprocessors to which will be assigned thread blocks when encountering a high contention degree at the memory and interconnection network. Based on our experiments on a 15-streaming multiprocessor GPGPU platform, the proposed warp scheduling policy provides an average IPC improvement of 7.5% over the baseline round-robin warp scheduling policy. This paper also shows that the GPGPU performance can be improved by approximately 8.9% on average when the two proposed techniques are combined.

Real-time 2-D Separable Median Filter (실시간 2차원 Separable 메디안 필터)

  • Jae Gil Jeong
    • Journal of the Korea Computer Industry Society
    • /
    • v.3 no.3
    • /
    • pp.321-330
    • /
    • 2002
  • A 2-D median filter has many applications in various image and video signal processing areas. The rapid development in VLSI technology makes it possible to implement a real-time or near real-time 2-D median filter with reasonable cost. For the efficient VLSI implementation, the algorithm should have characteristics such as small memory requirements, regular computations, and local data transfers. This paper presents an architecture of the real-time two-dimensional separable median filter which has appropriate characteristics for the VLSI implementation. For the efficient two-dimensional median filter, a separable two-dimensional median filtering structure and a bit-sliced pipelined median searching algorithm are used. A behavioral simulator is implemented with C language and used for the analysis of the presented architecture.

  • PDF