• Title/Summary/Keyword: pipelined structure

Search Result 80, Processing Time 0.023 seconds

Efficient Parallel IP Address Lookup Architecture with Smart Distributor (스마트 분배기를 이용한 효율적인 병렬 IP 주소 검색 구조)

  • Kim, Junghwan;Kim, Jinsoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.44-51
    • /
    • 2013
  • Routers should perform fast IP address lookup for Internet to provide high-speed service. In this paper, we present a hybrid parallel IP address lookup structure composed of four-stage pipeline. It achieves parallelism at low cost by using multiple SRAMs in stage 2 and partitioned TCAMs in stage 3, and improves the performance through pipelining. The smart distributor in stage 1 does not transfer any IP address identical to previous one toward the next stage, but only uses the result of the previous lookup. So it improves throughput of lookup by caching effects, and decreases the access conflict to TCAM bank in stage 3 as well. In the last stage, the reorder buffer rearranges the completed IP addresses according to the input order. We evaluate the performance of our parallel pipelined IP lookup structure comparing with previous hybrid structure, using the real routing table and traffic distributions generated by Zipf's law.

An Implementation of Real-time Image Warping Using FPGA (FPGA를 이용한 실시간 영상 워핑 구현)

  • Ryoo, Jung Rae;Lee, Eun Sang;Doh, Tae-Yong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.6
    • /
    • pp.335-344
    • /
    • 2014
  • As a kind of 2D spatial coordinate transform, image warping is a basic image processing technique utilized in various applications. Though image warping algorithm is composed of relatively simple operations such as memory accesses and computations of weighted average, real-time implementations on embedded vision systems suffer from limited computational power because the simple operations are iterated as many times as the number of pixels. This paper presents a real-time implementation of a look-up table(LUT)-based image warping using an FPGA. In order to ensure sufficient data transfer rate from memories storing mapping LUT and image data, appropriate memory devices are selected by analyzing memory access patterns in an LUT-based image warping using backward mapping. In addition, hardware structure of a parallel and pipelined architecture is proposed for fast computation of bilinear interpolation using fixed-point operations. Accuracy of the implemented hardware is verified using a synthesized test image, and an application to real-time lens distortion correction is exemplified.

Development of Superconductive Arithmetic and Logic Devices (초전도 논리연산자의 개발)

  • Kang J. H
    • Progress in Superconductivity
    • /
    • v.6 no.1
    • /
    • pp.7-12
    • /
    • 2004
  • Due to the very fast switching speed of Josephson junctions, superconductive digital circuit has been a very good candidate fur future electronic devices. High-speed and Low-power microprocessor can be developed with Josephson junctions. As a part of an effort to develop superconductive microprocessor, we have designed an RSFQ 4-bit ALU (Arithmetic Logic Unit) in a pipelined structure. To make the circuit work faster, we used a forward clocking scheme. This required a careful design of timing between clock and data pulses in ALU. The RSFQ 1-bit block of ALU used in this work consisted of three DC current driven SFQ switches and a half-adder. We successfully tested the half adder cell at clock frequency up to 20 GHz. The switches were commutating output ports of the half adder to produce AND, OR, XOR, or ADD functions. For a high-speed test, we attached switches at the input ports to control the high-speed input data by low-frequency pattern generators. The output in this measurement was an eye-diagram. Using this setup, 1-bit block of ALU was successfully tested up to 40 GHz. An RSFQ 4-bit ALU was fabricated and tested. The circuit worked at 5 GHz. The circuit size of the 4-bit ALU was 3 mm ${\times}$ 1.5 mm, fitting in a 5 mm ${\times}$ 5 mm chip.

  • PDF

Fixed-complexity Sphere Encoder for Multi-user MIMO Systems (다중 사용자 MIMO 시스템을 위한 고정 복잡도를 갖는 스피어 인코더)

  • Mohaisen, Manar;Han, Dong-Keol;Chang, Kyung-Hi
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.7A
    • /
    • pp.632-638
    • /
    • 2010
  • In this paper, we propose a fixed-complexity sphere encoder (FSE) for multi-user MIMO (MU-MIMO) systems. The Proposed FSE accomplishes a scalable tradeoff between performance and complexity. Also, because it has a parallel tree-search structure, the proposed encoder can be easily pipelined, leading to a tremendous reduction in the precoding latency. The complexity of the proposed encoder is also analyzed, and we propose two techniques that reduce it. Simulation and analytical results demonstrate that in a $4\times4$ MU-MIMO system, the complexity of the proposed FSE is 16% that of the conventional QRD-M encoder (QRDM-E). Also, the encoding throughput of the proposed endoder is 7.5 times that of the QRDM-E with tolerable degradation in the BER performance, while achieving the optimum diversity order.

Timing analysis of RSFQ ALU circuit for the development of superconductive microprocessor (초전도 마이크로 프로세서개발을 위한 RSFQ ALU 회로의 타이밍 분석)

  • Kim J. Y;Baek S. H.;Kim S. H.;Kang J. H.
    • Progress in Superconductivity and Cryogenics
    • /
    • v.7 no.1
    • /
    • pp.9-12
    • /
    • 2005
  • We have constructed an RSFQ 4-bit Arithmetic Logic Unit (ALU) in a pipelined structure. An ALU is a core element of a computer processor that performs arithmetic and logic operation on the operands in computer instruction words. We have simulated the circuit by using Josephson circuit simulation tools. We used simulation tools of XIC, $WRspice^{TM}$, and Julia. To make the circuit work faster, we used a forward clocking scheme. This required a careful design of timing between clock and data pulses in ALU. The RSFQ 1-bit block of ALU used in constructing the 4-bit ALU was consisted of three DC current driven SFQ switches and a half-adder. By commutating output ports of the half adder, we could produce AND, OR, XOR, or ADD functions. The circuit size of the 4-bit ALU when fabricated was 3 mm x 1.5 mm, fitting in a 5 mm x 5mm chip. The fabricated 4-bit ALU operated correctly at 5 GHz clock frequency. The chip was tested at the liquid-helium temperature.

High-throughput and low-area implementation of orthogonal matching pursuit algorithm for compressive sensing reconstruction

  • Nguyen, Vu Quan;Son, Woo Hyun;Parfieniuk, Marek;Trung, Luong Tran Nhat;Park, Sang Yoon
    • ETRI Journal
    • /
    • v.42 no.3
    • /
    • pp.376-387
    • /
    • 2020
  • Massive computation of the reconstruction algorithm for compressive sensing (CS) has been a major concern for its real-time application. In this paper, we propose a novel high-speed architecture for the orthogonal matching pursuit (OMP) algorithm, which is the most frequently used to reconstruct compressively sensed signals. The proposed design offers a very high throughput and includes an innovative pipeline architecture and scheduling algorithm. Least-squares problem solving, which requires a huge amount of computations in the OMP, is implemented by using systolic arrays with four new processing elements. In addition, a distributed-arithmetic-based circuit for matrix multiplication is proposed to counterbalance the area overhead caused by the multi-stage pipelining. The results of logic synthesis show that the proposed design reconstructs signals nearly 19 times faster while occupying an only 1.06 times larger area than the existing designs for N = 256, M = 64, and m = 16, where N is the number of the original samples, M is the length of the measurement vector, and m is the sparsity level of the signal.

The Design and Implementation of AES Rijndael Cipher Algorithm (AES Rijndael 암호.복호 알고리듬의 설계 및 구현)

  • 신성호;이재흥
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.196-198
    • /
    • 2003
  • In this paper, Rijndal cipher algorithm is implemented by a hardware. It is selected as the AES(Advanced Encryption Standard) by NIST. The processor has structure that round operation divided into 2 subrounds and subrounds are pipelined to calculate efficiently. It takes 5 clocks for one-round. The AES-128 cipher algorithm is implemented for hardware by ALTERA FPGA, and then, analyzed the performance. The AES-128 cipher algorithm has approximately 424 Mbps encryption rate for 166Mhz max clerk frequency. In case of decryption, it has 363 Mbps decryption rate for 142Mhz max clock frequency.

  • PDF

A Parallel Video Encoding Technique for U-HDTV (U-HDTV를 위한 향상된 병렬 비디오 부호화 기법)

  • Jung, Seung-Won;Ko, Sung-Jea
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.1
    • /
    • pp.132-140
    • /
    • 2011
  • Ultra-High Definition Television (U-HDTV) is a promising candidate for the next generation television. Since the U-HDTV video signal requires a huge amount of data, parallel implementation of the U-HDTV compression system is highly demanding. In the conventional parallel video codec, a video is divided into sub-sequences and the sub-sequences are independently encoded. In this paper, for efficient parallel processing, we propose a pipelined encoding structure which exploits cross-correlation among the sub-sequences. The experimental results demonstrate that the proposed technique improves the coding efficiency and provides the sub-sequences of the balanced visual quality.

Hardware Design of VLIW coprocessor for Computer Vision Application (컴퓨터 비전 응용을 위한 VLIW 보조프로세서의 하드웨어 설계)

  • Choi, Byeong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.9
    • /
    • pp.2189-2196
    • /
    • 2014
  • In this paper, a VLIW(Very Long Instruction Word) vision coprocessor which can efficiently accelerate computer vision algorithm for automotive is designed. The VLIW coprocessor executes four instructions per clock cycle via 8-stage pipelined structure and has 36 integer and floating-point instructions to accelerate computer vision algorithm for pedestrian detection. The processor has about 300-MHz operating frequency and about 210,900 gates under 45nm CMOS technology and its estimated performance is 1.2 GOPS(Giga Operations Per Second). The vision system composed of vision primitive engine and eight VLIW coprocessors can execute pedestrian detection at 25~29 frames per second(FPS). Because the VLIW coprocessor has high detection rate and loosely coupled interface with host processor, it can be efficiently applicable to a wide range of vision applications.

Hardware Design with Efficient Pipelining for High-throughput AES (높은 처리량을 가지는 AES를 위한 효율적인 파이프라인을 적용한 하드웨어 설계)

  • Antwi, Alexander O.A;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.578-580
    • /
    • 2017
  • IoT technology poses a lot of security threats. Various algorithms are thus employed in ensuring security of transactions between IoT devices. Advanced Encryption Standard (AES) has gained huge popularity among many other symmetric key algorithms due to its robustness till date. This paper presents a hardware based implementation of the AES algorithm. We present a four-stage pipelined architecture of the encryption and key generation. This method allowed a total plain text size of 512 bits to be encrypted in 46 cycles. The proposed hardware design achieved a maximum frequency of 1.18GHz yielding a throughput of 13Gbps and 800MHz yielding a throughput of 8.9Gbps on the 65nm and 180nm processes respectively.

  • PDF