• Title/Summary/Keyword: Pipeline Structure

Search Result 273, Processing Time 0.026 seconds

Design of Asynchronous 16-Bit Divider Using NST Algorithm (NST알고리즘을 이용한 비동기식 16비트 제산기 설계)

  • 이우석;박석재;최호용
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.3
    • /
    • pp.33-42
    • /
    • 2003
  • This paper describes an efficient design of an asynchronous 16-bit divider using the NST (new Svoboda-Tung) algorithm. The divider is designed to reduce power consumption by using the asynchronous design scheme in which the division operation is performed only when it is requested. The divider consists of three blocks, i.e. pre-scale block, iteration step block, and on-the-fly converter block using asynchronous pipeline structure. The pre-scale block is designed using a new subtracter to have small area and high performance. The iteration step block consists of an asynchronous ring structure with 4 division steps for area reduction. In other to reduce hardware overhead, the part related to critical path is designed by a dual-rail circuit, and the other part is done by a single-rail circuit in the ring structure. The on-the-fly converter block is designed for high performance using the on-the-fly algorithm that enables parallel operation with iteration step block. The design results with 0.6${\mu}{\textrm}{m}$ CMOS process show that the divider consists of 12,956 transistors with 1,480 $\times$1,200${\mu}{\textrm}{m}$$^2$area and average-case delay is 41.7㎱.

Design of a Block-Based 2D Discrete Wavelet Transform Filter with 100% Hardware Efficiency (100% 하드웨어 효율을 갖는 블록기반의 이차원 이산 웨이블렛 변환 필터 설계)

  • Kim, Ju-Young;Park, Tae-Guen
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.12
    • /
    • pp.39-47
    • /
    • 2010
  • This paper proposes a fully-utilized block-based 2D DWT architecture, which consists of four 1D DWT filters with two-channel QMF PR Lattice structure. For 100% hardware utilization, we propose a new method which processes four input values at the same time. On the contrary to the image-based 2D DWT which requires large memories, we propose a block-based 2D DWT so that we only need 2MN-3N of storages, where M and N stand for filter lengths and width of the image respectively. Furthermore, the proposed architecture processes in horizontal and vertical directions simultaneously so that it computes the DWT for an $N{\times}N$ image within a period of $N^2(1-2^{-2J})/3$. Compared to existing approaches, the proposed architecture shows 100% of hardware utilization and high throughput rate. However, the proposed architecture may suffer from the long critical path delay due to the cascaded lattices in 1D DWT filters. This problem can be mitigated by applying the pipeline technique with maximum four level. The proposed architecture has been designed with VerilogHDL and synthesized using DongbuAnam $0.18{\mu}m$ standard cell.

A Design of Pipelined Adaptive Decision-Feedback Equalized using Delayed LMS and Redundant Binary Complex Filter Structure (Delayed LMS와 Redundant Binary 복소수 필터구조를 이용한 파이프라인 적응 결정귀환 등화기 설계)

  • An, Byung-Gyu;Lee, Jong-Nam;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.12
    • /
    • pp.60-69
    • /
    • 2000
  • This paper describes a single-chip full-custom implementation of pipelined adaptive decision-feedback equalizer(PADFE) using a 0.25-${\mu}m$ CMOS technology for wide-band wireless digital communication systems. To enhance the throughput rate of ADFE, two pipeline stages are inserted into the critical path of the ADFE by using delayed least-mean-square(DLMS) algorithm. Redundant binary (RB) arithmetic is applied to all the data processing of the PADFE including filter taps and coefficient update blocks. When compared with conventional methods based on two's complement arithmetic, the proposed approach reduces arithmetic complexity, as well as results in a very simple complex-valued filter structure, thus suitable for VLSI implementation. The design parameters including pipeline stage, filter tap, coefficient and internal bit-width, and equalization performance such as bit error rate (BER) and convergence speed are analyzed by algorithm-level simulation using COSSAP. The single-chip PADFE contains about 205,000 transistors on an area of about $1.96\times1.35-mm^2$. Simulation results show that it can safely operate with 200-MHz clock frequency at 2.5-V supply, and its estimated power dissipation is about 890-mW. Test results show that the fabricated chip works functionally well.

  • PDF

A Study on the Optimum Generation Condition of Ultrasonic Guided Waves for Insulation Pipelines (단열된 배관의 유도초음파 최적 발생조건 선정에 관한 연구)

  • Lee, Dong-Hoon;Cho, Hyun-Joon;Kang, To;Park, Dong-Jun;Kim, Byung-Duk;Huh, Yun-Sil;Lee, Yeon-Jae
    • Journal of the Korean Institute of Gas
    • /
    • v.20 no.6
    • /
    • pp.50-57
    • /
    • 2016
  • Pipeline is one of the most abundant components in petrochemical plant. It plays a critical role in transporting fluids. Some pipelines are thermally insulated by wrapping them with insulating materials to prevent the loss of energy. However, when corrosion begins under insulation, it cannot be easily seen without unwrapping the cover, and thus corrossion should be detected using a non-destructive ways such as ultrasound guided wave. In this paper, the piping where the CUI (Corrosion Under Insulation) which occurs in the insulation parts guided waves effectively the optimum condition which is theoretical for selected guided waves phase velocity dispersion curve and wave-structure. The results of this study are expected to be directly utilized for onsite inspection of pipeline's CUI in many petrochemical plants.

Efficient Parallel IP Address Lookup Architecture with Smart Distributor (스마트 분배기를 이용한 효율적인 병렬 IP 주소 검색 구조)

  • Kim, Junghwan;Kim, Jinsoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.44-51
    • /
    • 2013
  • Routers should perform fast IP address lookup for Internet to provide high-speed service. In this paper, we present a hybrid parallel IP address lookup structure composed of four-stage pipeline. It achieves parallelism at low cost by using multiple SRAMs in stage 2 and partitioned TCAMs in stage 3, and improves the performance through pipelining. The smart distributor in stage 1 does not transfer any IP address identical to previous one toward the next stage, but only uses the result of the previous lookup. So it improves throughput of lookup by caching effects, and decreases the access conflict to TCAM bank in stage 3 as well. In the last stage, the reorder buffer rearranges the completed IP addresses according to the input order. We evaluate the performance of our parallel pipelined IP lookup structure comparing with previous hybrid structure, using the real routing table and traffic distributions generated by Zipf's law.

Design and Implementation of Multi-View 3D Video Player (다시점 3차원 비디오 재생 시스템 설계 및 구현)

  • Heo, Young-Su;Park, Gwang-Hoon
    • Journal of Broadcast Engineering
    • /
    • v.16 no.2
    • /
    • pp.258-273
    • /
    • 2011
  • This paper designs and implements a multi-view 3D video player system which is operated faster than existing video player systems. The structure for obtaining the near optimum speed in a multi-processor environment by parallelizing the component modules is proposed to process large volumes of multi-view image data at high speed. In order to use the concurrency of bottleneck, we designed image decoding, synthesis and rendering modules in a pipeline structure. For load balancing, the decoder module is divided into the unit of viewpoint, and the image synthesis module is geometrically divided based on synthesized images. As a result of this experiment, multi-view images were correctly synthesized and the 3D sense could be felt when watching the images on the multi-view autostereoscopic display. The proposed application processing structure could be used to process large volumes of multi-view image data at high speed, using the multi-processors to their maximum capacity.

Hardware Design of Efficient SAO for High Performance In-loop filters (고성능 루프내 필터를 위한 효율적인 SAO 하드웨어 설계)

  • Park, Seungyong;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.543-545
    • /
    • 2017
  • This paper describes the SAO hardware architecture design for high performance in-loop filters. SAO is an inner module of in-loop filter, which compensates for information loss caused by block-based image compression and quantization. However, HEVC's SAO requires a high computation time because it performs pixel-unit operations. Therefore, the SAO hardware architecture proposed in this paper is based on a $4{\times}4$ block operation and a 2-stage pipeline structure for high-speed operation. The information generation and offset computation structure for SAO computation is designed in a parallel structure to minimize computation time. The proposed hardware architecture was designed with Verilog HDL and synthesized with TSMC chip process 130nm and 65nm cell library. The proposed hardware design achieved a maximum frequency of 476MHz yielding 163k gates and 312.5MHz yielding 193.6k gates on the 130nm and 65nm processes respectively.

  • PDF

Design of a High-Performance Mobile GPGPU with SIMT Architecture based on a Small-size Warp Scheduler (작은 크기의 Warp 스케쥴러 기반 SIMT구조 고성능 모바일 GPGPU 설계)

  • Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.25 no.3
    • /
    • pp.479-484
    • /
    • 2021
  • This paper proposed and designed a structure to achieve high performance with a small number of cores in GPGPU with SIMT structure. GPGPU for application to mobile devices requires a structure to increase performance compared to power consumption. In order to reduce power consumption, the number of cores decreased, but to improve performance, the size of the warp scheduler for managing threads was set to 4, which was greatly reduced than 32 of general GPGPU. Reducing warp size can reduce the number of idle cycles in pipelines and efficiently apply memory latency to reduce miss penalty when accessing cache memory. The designed GPGPU measured computational performance using a test program that includes floating point operations and measured power consumption through a 28nm CMOS process to obtain 104.5GFlops/Watt as a performance per power. The results of this paper showed about four times better performance per power compared to Tegra K1 of Nvidia

Parallel Structure Design Method for Mass Spring Simulation (질량스프링 시뮬레이션을 위한 병렬 구조 설계 방법)

  • Sung, Nak-Jun;Choi, Yoo-Joo;Hong, Min
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.3
    • /
    • pp.55-63
    • /
    • 2019
  • Recently, the GPU computing method has been utilized to improve the performance of the physics simulation field. In particular, in the case of a deformed object simulation requiring a large amount of computation, a GPU-based parallel processing algorithm is required to guarantee real-time performance. We have studied the parallel structure design method to improve the performance of the mass spring simulation method which is one of the methods of implementing the deformation object simulation. We used OpenGL's GLSL, a graphics library that allows direct access to the GPU, and implemented the GPGPU environment using an independent pipeline, the compute shader. In order to verify the effectiveness of the parallel structure design method, the mass - spring system was implemented based on CPU and GPU. Experimental results show that the proposed method improves computation speed by about 6,000% compared to the CPU Environment. It is expected that the lightweight simulation technology can be effectively applied to the augmented reality and the virtual reality field by using the design method proposed later in this research.

A Study on the Determination of Design Load for Buried Hume Pipeline (매설흄관의 설계하중 결정에 관한 연구)

  • O, Chi-Nam;Jeong, Seong-Gyo;Jang, Gi-Tae
    • Geotechnical Engineering
    • /
    • v.5 no.2
    • /
    • pp.19-32
    • /
    • 1989
  • The vertical loads of buried Hume pipes were calculated using the finite element method, in which the hyperbolic soil model, the nonlinear hysteretic stress path model and soil-structure interface model were used. The obtained results were compared and discussed with those from the classic methods such as Marston-Spangler's theory and so on. The effects of excavation width and depth to the top of pipe along with soil parameters and type of excavation, which have not been included in the classic methods, were investigated. In addition, a calculation method of the vertical load for buried Hume pipes was proposed and it is presumed to be easily applied in the practical fields.

  • PDF