• Title/Summary/Keyword: pipelined

Search Result 377, Processing Time 0.029 seconds

A Design of Pipelined-parallel CABAC Decoder Adaptive to HEVC Syntax Elements (HEVC 구문요소에 적응적인 파이프라인-병렬 CABAC 복호화기 설계)

  • Bae, Bong-Hee;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.5
    • /
    • pp.155-164
    • /
    • 2015
  • This paper describes a design and implementation of CABAC decoder, which would handle HEVC syntax elements in adaptively pipelined-parallel computation manner. Even though CABAC offers the high compression rate, it is limited in decoding performance due to context-based sequential computation, and strong data dependency between context models, as well as decoding procedure bin by bin. In order to enhance the decoding computation of HEVC CABAC, the flag-type syntax elements are adaptively pipelined by precomputing consecutive flag-type ones; and multi-bin syntax elements are decoded by processing bins in parallel up to three. Further, in order to accelerate Binary Arithmetic Decoder by reducing the critical path delay, the update and renormalization of context modeling are precomputed parallel for the cases of LPS as well as MPS, and then the context modeling renewal is selected by the precedent decoding result. It is simulated that the new HEVC CABAC architecture could achieve the max. performance of 1.01 bins/cycle, which is two times faster with respect to the conventional approach. In ASIC design with 65nm library, the CABAC architecture would handle 224 Mbins/sec, which could decode QFHD HEVC video data in real time.

A Pipelined Hash Join Method for Load Balancing (부하 균형 유지를 고려한 파이프라인 해시 조인 방법)

  • Moon, Jin-Gue;Park, No-Sang;Kim, Pyeong-Jung;Jin, Seong-Il
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.755-768
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new hash join methods with load balancing capabilities. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets adaptively via a frequency distribution. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join processing without staying on disks. Unless the pipelining execution of multiple hash joins includes some load balancing mechanisms, the skew effect can severely deteriorate system performance. In this paper, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

Design and Implementation of An I/O System for Irregular Application under Parallel System Environments (병렬 시스템 환경하에서 비정형 응용 프로그램을 위한 입출력 시스템의 설계 및 구현)

  • No, Jae-Chun;Park, Seong-Sun;;Gwon, O-Yeong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.11
    • /
    • pp.1318-1332
    • /
    • 1999
  • 본 논문에서는 입출력 응용을 위해 collective I/O 기법을 기반으로 한 실행시간 시스템의 설계, 구현 그리고 그 성능평가를 기술한다. 여기서는 모든 프로세서가 동시에 I/O 요구에 따라 스케쥴링하며 I/O를 수행하는 collective I/O 방안과 프로세서들이 여러 그룹으로 묶이어, 다음 그룹이 데이터를 재배열하는 통신을 수행하는 동안 오직 한 그룹만이 동시에 I/O를 수행하는 pipelined collective I/O 등의 두 가지 설계방안을 살펴본다. Pipelined collective I/O의 전체 과정은 I/O 노드 충돌을 동적으로 줄이기 위해 파이프라인된다. 이상의 설계 부분에서는 동적으로 충돌 관리를 위한 지원을 제공한다. 본 논문에서는 다른 노드의 메모리 영역에 이미 존재하는 데이터를 재 사용하여 I/O 비용을 줄이기 위해 collective I/O 방안에서의 소프트웨어 캐슁 방안과 두 가지 모형에서의 chunking과 온라인 압축방안을 기술한다. 그리고 이상에서 기술한 방안들이 입출력을 위해 높은 성능을 보임을 기술하는데, 이 성능결과는 Intel Paragon과 ASCI/Red teraflops 기계 상에서 실험한 것이다. 그 결과 응용 레벨에서의 bandwidth는 peak point가 55%까지 측정되었다.Abstract In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two designs, namely, "Collective I/O" and "Pipelined Collective I/O". In the first scheme, all processors participate in the I/O simultaneously, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, so that only one group performs I/O simultaneously, while the next group performs communication to rearrange data, and this entire process is pipelined to reduce I/O node contention dynamically. In other words, the design provides support for dynamic contention management. Then we present a software caching method using collective I/O to reduce I/O cost by reusing data already present in the memory of other nodes. Finally, chunking and on-line compression mechanisms are included in both models. We demonstrate that we can obtain significantly high-performance for I/O above what has been possible so far. The performance results are presented on an Intel Paragon and on the ASCI/Red teraflops machine. Application level I/O bandwidth up to 55% of the peak is observed.he peak is observed.

A Load Balancing Method using Partition Tuning for Pipelined Multi-way Hash Join (다중 해시 조인의 파이프라인 처리에서 분할 조율을 통한 부하 균형 유지 방법)

  • Mun, Jin-Gyu;Jin, Seong-Il;Jo, Seong-Hyeon
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.180-192
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new harsh join methods in the shared-nothing multiprocessor environment. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets dynamically via a frequency distribution. Using harsh-based joins, multiple joins can be pipelined to that the early results from a join, before the whole join is completed, are sent to the next join processing without staying in disks. Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. However, this hardware structure is very sensitive to the data skew. Unless the pipelining execution of multiple hash joins includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this parer, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

A Pipelined Architecture for Maze Routing

  • Won Young Ju;Sahni Sartaj K.
    • Journal of the military operations research society of Korea
    • /
    • v.14 no.1
    • /
    • pp.1-17
    • /
    • 1988
  • This paper presents a hardware accelerator for the maze routing problem. This accelerator consists of three 3 stage pipelines. Banked memory is used to avoid memory read/write conflicts and obtain maximum efficiency.

  • PDF

A Pipelined Architecture for Maze Routing

  • Won Young Ju;Sahni Sartaj K.
    • Journal of the military operations research society of Korea
    • /
    • v.13 no.2
    • /
    • pp.1-17
    • /
    • 1987
  • This paper presents a hardware accelerator for the maze routing problem. This accelerator consists of three 3 stage pipelines. Banked memory is used to avoid memory read/write conflicts and obtain maximum efficiency.

  • PDF

Low-noise VLSI Implementation of Pipelined IIR Filters (파이프라인된 IIR 필터의 저잡음 VLSI구현)

  • 태기철;최정필;신승철;정진균
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.4B
    • /
    • pp.788-795
    • /
    • 2000
  • Scattered look-ahead pipelining method can be efficiently used for high sample rate or low-power applications of digital recursive filters. Although the pipelined filters are guaranteed to be stable by this method, these filters suffer from large round off noise when the poles are crowed within some critical regions. To avoid this problem, a low-noise implementation technique was proposed using constrained Remez exchange algorithm. By the constrained filter design approach, the desired filter spectrum is satisfied while some of the pole angles are constrained to avoid pole crowding within critical regions. In the proposed approach, to obtain improved spectrum characteristics or better round off noise properties, the radius of the angle-constrained pole is optimized depending on the direction of the pole movement.

  • PDF

Wafer-Level Three-Dimensional Monolithic Integration for Intelligent Wireless Terminals

  • Gutmann, R.J.;Zeng, A.Y.;Devarajan, S.;Lu, J.Q.;Rose, K.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.4 no.3
    • /
    • pp.196-203
    • /
    • 2004
  • A three-dimensional (3D) IC technology platform is presented for high-performance, low-cost heterogeneous integration of silicon ICs. The platform uses dielectric adhesive bonding of fully-processed wafer-to-wafer aligned ICs, followed by a three-step thinning process and copper damascene patterning to form inter-wafer interconnects. Daisy-chain inter-wafer via test structures and compatibility of the process steps with 130 nm CMOS sal devices and circuits indicate the viability of the process flow. Such 3D integration with through-die vias enables high functionality in intelligent wireless terminals, as vertical integration of processor, large memory, image sensors and RF/microwave transceivers can be achieved with silicon-based ICs (Si CMOS and/or SiGe BiCMOS). Two examples of such capability are highlighted: memory-intensive Si CMOS digital processors with large L2 caches and SiGe BiCMOS pipelined A/D converters. A comparison of wafer-level 3D integration 'lith system-on-a-chip (SoC) and system-in-a-package (SiP) implementations is presented.

Design of low-noise II R filter with high-density and low-power properties (고집적, 저전력 특성을 갖는 저잡음 IIR 필터 설계)

  • Bae Sung-hwan;Kim Dae-ik
    • The KIPS Transactions:PartA
    • /
    • v.12A no.1 s.91
    • /
    • pp.7-12
    • /
    • 2005
  • Scattered look-ahead(SLA) pipelining method can be efficiently used for high-speed or low-power applications of digital II R filters. Although the pipelined filters are guaranteed to be stable by this method, these filters suffer from large roundoff noise when the poles are crowded within some critical regions. An angle and radius constrained II R fille. design approach using modified Remez exchange algorithm and least squares algorithm is proposed to avoid tight pole-crowding in pipelined filters, resulting in improved frequency responses and reduced coefficient sensitivities. Experimental results demonstrate that our proposed method leads to chip area reduction by $33{\%}$ and low power by $45{\%}$ against the conventional method.

A Design of High-speed Phase Calculator for 3D Depth Image Extraction from TOF Sensor Data (TOF 센서용 3차원 Depth Image 추출을 위한 고속 위상 연산기 설계)

  • Koo, Jung-Youn;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.355-362
    • /
    • 2013
  • A hardware implementation of phase calculator for extracting 3D depth image from TOF(Time-Of-Flight) sensor is described. The designed phase calculator, which adopts a pipelined architecture to improve throughput, performs arctangent operation using vectoring mode of CORDIC algorithm. Fixed-point MATLAB modeling and simulations are carried out to determine the optimized bit-widths and number of iteration. The designed phase calculator is verified by FPGA-in-the-loop verification using MATLAB/Simulink, and synthesized with a TSMC 0.18-${\mu}m$ CMOS cell library. It has 16,000 gates and the estimated throughput is about 9.6 Gbps at 200Mhz@1.8V.