• Title/Summary/Keyword: pipelined

Search Result 377, Processing Time 0.034 seconds

Parallel Distributed Implementation of GHT on MPI-based PC Cluster (MPI 기반 PC 클러스터에서 GHT의 병렬 분산 구현)

  • Kim, Yeong-Soo;Kim, Jeong-Sahm;Choi, Heung-Moon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.3
    • /
    • pp.81-89
    • /
    • 2007
  • This paper presents a parallel distributed implementation of the GHT (generalized Hough transform) for the fast processing on the MPI-based PC cluster. We tried to achieve the higher speedup mainly by alleviating the communication overhead through the pipelined broadcast and accumulator array partition strategy and by time overlapping of the communication and the computation over entire process. Experimental results show that nearly linear speedup is reachable by the proposed method on the MPI-based PC clusters connected through 100Mbps Ethernet switch.

A Lock Mechanism for HiPi-bus Based Multiprocessor Systems (HiPi-bus 구조의 다중 프로세서 시스템에서의 잠금장치)

  • 윤용호;임인칠
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.2
    • /
    • pp.33-43
    • /
    • 1993
  • Lock mechanism is essential for synchronization on the multiprocessor systems. Lock mechanism needs to reduce the time for lock operation in low lock contention. Lock mechanism must consider the case of the high lock contention. The conventional lock control scheme in memory results in the increase of bus traffic and memory utilization in lock operation. This paper suggests a lock scheme which stores the lock data in cache and manages it efficiently to reduce the time spent in lock operation when the lock contention is low on a multiprocessor system built on HiPi-bus(Highly Pipelined bus). This paper also presents the design of the HIPi-CLOCK (Highly Pipelined bus Cache LOCK mechanism) which transfere the data from on cache to another when the lock contention is high. The designed simulator compares the conventional lock scheme which controls the lock in memory with the suggested HiPi-CLOCK scheme in terms of the RMW(Read-Modify-Write) operation time using simulated trace. It is shown that the suggested lock control scheme performance is over twice than that of the conventional method in low lock contention. When the lock contention is high, the performance of the suggested scheme increases as the number of the shared lock data increases.

  • PDF

Design of Pipelined Parallel CRC Circuits (파이프라인 구조를 적용한 병렬 CRC 회로 설계)

  • Yi, Hyun-Bean;Kim, Ki-Tae;Kwon, Young-Min;Park, Sung-Ju
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.43 no.6 s.312
    • /
    • pp.40-47
    • /
    • 2006
  • This paper introduces an efficient CRC logic partitioning algorithm to design pipelined parallel CRC circuits aimed at improving speed performance. Focusing on the cases that the input data width is greater than the polynomial degree, equations are derived to divide the parallel CRC logic and decide the length of the pipeline stage. Through design experiments on different types of parallel CRC circuits, we have found a significant reduction in delay by adopting our approach.

A Low-Voltage Low-Power Opamp-Less 8-bit 1-MS/s Pipelined ADC in 90-nm CMOS Technology

  • Abbasizadeh, Hamed;Rikan, Behnam Samadpoor;Lee, Dong-Soo;Hayder, Abbas Syed;Lee, Kang-Yoon
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.6
    • /
    • pp.416-424
    • /
    • 2014
  • This paper presents an 8-bit pipelined analog-to-digital converter. The supply voltage applied for comparators and other sub-blocks of the ADC were 0.7V and 0.5V, respectively. This low power ADC utilizes the capacitive charge pump technique combined with a source-follower and calibration to resolve the need for the opamp. The differential charge pump technique does not require any common mode feedback circuit. The entire structure of the ADC is based on fully dynamic circuits that enable the design of a very low power ADC. The ADC was designed to operate at 1MS/s in 90nm CMOS process, where simulated results using ADS2011 show the peak SNDR and SFDR of the ADC to be 47.8 dB (7.64 ENOB) and 59 dB respectively. The ADC consumes less than 1mW for all active dynamic and digital circuitries.

An Efficient Hardware Implementation of Whirlpool Hash Function (Whirlpool 해쉬 함수의 효율적인 하드웨어 구현)

  • Park, Jin-Chul;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.263-266
    • /
    • 2012
  • This paper describes an efficient hardware implementation of Whirlpool hash function as ISO/IEC 10118-3 standard. Optimized timing is achieved by using pipelined small LUTs, and Whirlpool block cipher and key schedule have been implemented in parallel for improving throughput. In key schedule, key addition is area-optimized by using inverters and muxes instead of using rom and xor gates. This hardware has been implemented on Virtex5-XC5VSX50T FPGA device. Its maximum operating frequency is about 151MHz, and throughput is about 950Mbps.

  • PDF

Study of the Superconductive Pipelined Multi-Bit ALU (초전도 Pipelined Multi-Bit ALU에 대한 연구)

  • Kim, Jin-Young;Ko, Ji-Hoon;Kang, Joon-Hee
    • Progress in Superconductivity
    • /
    • v.7 no.2
    • /
    • pp.109-113
    • /
    • 2006
  • The Arithmetic Logic Unit (ALU) is a core element of a computer processor that performs arithmetic and logic operations on the operands in computer instruction words. We have developed and tested an RSFQ multi-bit ALU constructed with half adder unit cells. To reduce the complexity of the ALU, We used half adder unit cells. The unit cells were constructed of one half adder and three de switches. The timing problem in the complex circuits has been a very important issue. We have calculated the delay time of all components in the circuit by using Josephson circuit simulation tools of XIC, $WRspice^{TM}$, and Julia. To make the circuit work faster, we used a forward clocking scheme. This required a careful design of timing between clock and data pulses in ALU. The designed ALU had limited operation functions of OR, AND, XOR, and ADD. It had a pipeline structure. The fabricated 1-bit, 2-bit, and 4-bit ALU circuits were tested at a few kilo-hertz clock frequency as well as a few tens giga-hertz clock frequency, respectively. For high-speed tests, we used an eye-diagram technique. Our 4-bit ALU operated correctly at up to 5 GHz clock frequency.

  • PDF

Reduced-Pipelined Duty Cycle MAC Protocol (RP-MAC) for Wireless Sensor Network

  • Nguyen, Ngoc Minh;Kim, Myung Kyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.5
    • /
    • pp.2433-2452
    • /
    • 2017
  • Recently, the pipeline-forwarding has been proposed as a new technique to resolve the end-to-end latency problem of the duty-cycle MAC protocols in Wireless Sensor Networks (WSNs). Some protocols based on this technique such as PMAC and PRI-MAC have shown an improvement not only in terms of reducing end-to-end latency but also in terms of reducing power consumption. In these protocols, however, the sensor nodes still waste a significant amount of energy for unnecessary idle listening during contention period of upstream nodes to check the channel activity. This paper proposes a new pipeline-forwarding duty-cycle MAC protocol, named RP-MAC (Reduced Pipelined duty-cycle MAC), which tries to reduce the waste of energy. By taking advantage of ACK mechanism and shortening the handshaking procedure, RP-MAC minimizes the time for checking the channel and therefore reduces the energy consumption due to unnecessary idle listening. When comparing RP-MAC with the existing solution PRI-MAC and RMAC, our QualNet-based simulation results show a significant improvement in term of energy consumption.

Edge-Preserving Algorithm for Block Artifact Reduction and Its Pipelined Architecture

  • Vinh, Truong Quang;Kim, Young-Chul
    • ETRI Journal
    • /
    • v.32 no.3
    • /
    • pp.380-389
    • /
    • 2010
  • This paper presents a new edge-protection algorithm and its very large scale integration (VLSI) architecture for block artifact reduction. Unlike previous approaches using block classification, our algorithm utilizes pixel classification to categorize each pixel into one of two classes, namely smooth region and edge region, which are described by the edge-protection maps. Based on these maps, a two-step adaptive filter which includes offset filtering and edge-preserving filtering is used to remove block artifacts. A pipelined VLSI architecture of the proposed deblocking algorithm for HD video processing is also presented in this paper. A memory-reduced architecture for a block buffer is used to optimize memory usage. The architecture of the proposed deblocking filter is verified on FPGA Cyclone II and implemented using the ANAM 0.25 ${\mu}m$ CMOS cell library. Our experimental results show that our proposed algorithm effectively reduces block artifacts while preserving the details. The PSNR performance of our algorithm using pixel classification is better than that of previous algorithms using block classification.

A High Speed IP Address Lookup using Pipelined CAM Architecture(PICAM) (파이프라인 CAM 구조를 이용한 고속 IP주소룩업)

  • Ahn, Hee-Il;Cho, Tae-Won
    • Journal of IKEEE
    • /
    • v.5 no.1 s.8
    • /
    • pp.24-34
    • /
    • 2001
  • IP address lookup is a major bottleneck of IP packet processing in high speed router. Existing IP lookup methods are focused only on lookup throughput without considering lookup table update. So their slow update can lead to lookup blocking or wrong routing decision based on obsolete routes. Especially existing IP lookup methods based on CAM(content addressable memory) have slow update of O(n) cycles in spite of their high throughput and low area complexity In this paper we proposes a new IP address lookup method based on pipelined CAM architecture(PICAM) with fast update of O(1) cycle of lookup table and high throughput and low area complexity.

  • PDF

High Performance Routing Engine for an Advanced Input-Queued Switch Fabric (고속 입력 큐 스위치를 위한 고성능 라우팅엔진)

  • Jeong, Gab-Joong;Lee, Bhum-Cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.05a
    • /
    • pp.264-267
    • /
    • 2002
  • This paper presents the design of a pipelined virtual output queue routing engine for an advanced input-queued ATM switch, which has a serial cross bar structure. The proposed routing engine has been designed for wire-speed routing with a pipelined buffer management. It provides the tolerance of requests and grants data transmission latency between the routing engine and central arbiter using a new request control method that is based on a high-speed shifter. The designed routing engine has been implemented in a field programmable gate array (FPGA) chip with a 77MHz operating frequency, 16$\times$16 switch size, and 2.5Gbps/port speed.

  • PDF