• Title/Summary/Keyword: pipelined memory

Search Result 79, Processing Time 0.034 seconds

A Parallel Processing System for Visual Media Applications (시각매체를 위한 병렬처리 시스템)

  • Lee, Hyung;Pakr, Jong-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.1A
    • /
    • pp.80-88
    • /
    • 2002
  • Visual media(image, graphic, and video) processing poses challenge from several perpectives, specifically from the point of view of real-time implementation and scalability. There have been several approaches to obtain speedups to meet the computing demands in multimedia processing ranging from media processors to special purpose implementations. A variety of parallel processing strategies are adopted in these implementations in order to achieve the required speedups. We have investigated a parallel processing system for improving the processing speed o f visual media related applications. The parallel processing system we proposed is similar to a pipelined memory stystem(MAMS). The multi-access memory system is made up of m memory modules and a memory controller to perform parallel memory access with a variety of combinations of 1${\times}$pq, pq${\times}$1, and p${\times}$q subarray, which improves both cost and complexity of control. Facial recognition, Phong shading, and automatic segmentation of moving object in image sequences are some that have been applied to the parallel processing system and resulted in faithful processing speed. This paper describes the parallel processing systems for the speedup and its utilization to three time-consuming applications.

Design of a Scalable Systolic Synchronous Memory

  • Jeong, Gab-Joong;Kwon, Kyoung-Hwan;Lee, Moon-Key
    • Journal of Electrical Engineering and information Science
    • /
    • v.2 no.4
    • /
    • pp.8-13
    • /
    • 1997
  • This paper describes a scalable systolic synchronous memory for digital signal processing and packet switching. The systolic synchronous memory consists of the 2-D array of small memory blocks which are fully pipelined and communicated in three directions with adjacent blocks. The maximum delay of a small memory block becomes the operation speed of the chip. The array configuration is scalable for the entire memory size requested by an application. it has the initial latency of N+3 cycles with NxN array configuration. We designed an experimental 200 MHz 4Kb static RAM chip with the 4x4 array configuration of 256 SRAM blocks. It was fabricated is 0.8$\mu\textrm{m}$ twin-well single-poly double-metal CMOS technology.

  • PDF

A New Pipelined Binary Search Architecture for IP Address Lookup (IP 어드레스 검색을 위한 새로운 pipelined binary 검색 구조)

  • Lim Hye-Sook;Lee Bo-Mi;Jung Yeo-Jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1B
    • /
    • pp.18-28
    • /
    • 2004
  • Efficient hardware implementation of address lookup is one of the most important design issues of internet routers. Address lookup significantly impacts router performance since routers need to process tens-to-hundred millions of packets per second in real time. In this paper, we propose a practical IP address lookup structure based on the binary tree of prefixes of different lengths. The proposed structure produces multiple balanced trees, and hence it solve the issues due to the unbalanced binary prefix tree of the existing scheme. The proposed structure is implemented using pipelined binary search combined with a small size TCAM. Performance evaluation results show that the proposed architecture requires a 2000-entry TCAM and total 245 kbyte SRAMs to store about 30,000 prefix samples from MAE-WEST router, and an address lookup is achieved by a single memory access. The proposed scheme scales very well with both of large databases and longer addresses as in IPv6.

A Design of Pipeline Chain Algorithm Based on Circuit Switching for MPI Broadcast Communication System (MPI 브로드캐스트 통신을 위한 서킷 스위칭 기반의 파이프라인 체인 알고리즘 설계)

  • Yun, Heejun;Chung, Wonyoung;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.9
    • /
    • pp.795-805
    • /
    • 2012
  • This paper proposes an algorithm and a hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional system, The pipelined broadcast algorithm is an algorithm which takes advantage of maximum bandwidth of communication bus. But unnecessary synchronization process are repeated, because the pipelined broadcast sends the data divided into many parts. In this paper, the MPI unit for pipeline chain algorithm based on circuit switching removing the redundancy of synchronization process was designed, the proposed architecture was evaluated by modeling it with systemC. Consequently, the performance of the proposed architecture was highly improved for broadcast communication up to 3.3 times that of systems using conventional pipelined broadcast algorithm, it can almost take advantage of the maximum bandwidth of transmission bus. Then, it was implemented with VerilogHDL, synthesized with TSMC 0.18um library and implemented into a chip. The area of synthesis results occupied 4,700 gates(2 input NAND gate) and utilization of total area is 2.4%. The proposed architecture achieves improvement in total performance of MPSoC occupying relatively small area.

Edge-Preserving Algorithm for Block Artifact Reduction and Its Pipelined Architecture

  • Vinh, Truong Quang;Kim, Young-Chul
    • ETRI Journal
    • /
    • v.32 no.3
    • /
    • pp.380-389
    • /
    • 2010
  • This paper presents a new edge-protection algorithm and its very large scale integration (VLSI) architecture for block artifact reduction. Unlike previous approaches using block classification, our algorithm utilizes pixel classification to categorize each pixel into one of two classes, namely smooth region and edge region, which are described by the edge-protection maps. Based on these maps, a two-step adaptive filter which includes offset filtering and edge-preserving filtering is used to remove block artifacts. A pipelined VLSI architecture of the proposed deblocking algorithm for HD video processing is also presented in this paper. A memory-reduced architecture for a block buffer is used to optimize memory usage. The architecture of the proposed deblocking filter is verified on FPGA Cyclone II and implemented using the ANAM 0.25 ${\mu}m$ CMOS cell library. Our experimental results show that our proposed algorithm effectively reduces block artifacts while preserving the details. The PSNR performance of our algorithm using pixel classification is better than that of previous algorithms using block classification.

FPGA Design and Implementation of A Pipelined Out-of-Order Superscalar Processor (파이프라인식 비순차실행 수퍼스칼라 프로세서의 FPGA 설계 및 구현)

  • Jongbok Lee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.153-158
    • /
    • 2023
  • Domestically, the importance of system semiconductor design is increasing, and the balanced development with the high-end memory semiconductors should be promoted. Using Xilinx Vivado as a development enivronment tool, it reduces time and cost dramatically in implementing the processor on FPGA. In this paper, the VHDL language which provides record data structure for an efficient digital system design is used for designing a pipelined out-of-order superscalar processor. It has been simulated extensively, synthesized and implemented on FPGA and verified by Integrated Logic Analyzer. As a result, the pipelined out-of-order superscalar processor could be executed successfully.

Design of a Pipelined Deblocking Filter with efficient memory management for high performance H.264 decoders (효율적인 메모리 관리 구조를 갖는 H.264용 고성능 디블록킹 필터 설계)

  • Yu, Yong-Hoon;Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.1
    • /
    • pp.64-70
    • /
    • 2008
  • The H.264 standard is widely used due to the high compression rate and quality. The deblocking filter of the H.264 standard improves the quality of images by eliminating blocking artifacts of pictures, and it requires a lot of computation. We propose a new hardware architecture for the deblocking filter with pipelined architecture, 1-D filters which support both horizontal and vertical filtering and efficient memory management. Four memory blocks are configured for the efficient storage and access of the current macroblock and adjacent referenced sub-macroblocks, and the pixel data from the motion compensation unit can be transferred without waiting during the computation cycles of the deblocking filter. The number of computation cycles and the hardware area are reduced using the proposed architecture, and the performance of the H.264 decoder is improved. We design the deblocking filter using Verilog-HDL and implement using an FPGA. The designed deblocking filter can be used for decoding HD quality images at 77 MHz.

An area-efficient 256-point FFT design for WiMAX systems

  • Yu, Jian;Cho, Kyung-Ju
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.3
    • /
    • pp.270-276
    • /
    • 2018
  • This paper presents a low area 256-point pipelined FFT architecture, especially for IEEE 802.16a WiMAX systems. Radix-24 algorithm and single-path delay feedback (SDF) architecture are adopted in the design to reduce the complexity of twiddle factor multiplication. A new cascade canonical signed digit (CSD) complex multipliers are proposed for twiddle factor multiplication, which has lower area and less power consumption than conventional complex multipliers composed of 4 multipliers and 2 adders. Also, the proposed cascade CSD multipliers can remove look-up table for storing coefficient of twiddle factors. In hardware implementation with Cyclone 10LP FPGA, it is shown that the proposed FFT design method achieves about 62% reduction in gate count and 64% memory reduction compared with the previous schemes.

Adaptive Frequency Scaling for Efficient Power Management in Pipelined Deep Packet Inspection Systems (파이프라인형 DPI 시스템에서 효율적인 소비전력 감소를 위한 동작주파수 설계방법)

  • Kim, Han-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.12
    • /
    • pp.133-141
    • /
    • 2014
  • An efficient method for reducing power consumption in pipelined deep packet inspection systems is proposed. It is based on the observation that the number of memory accesses is dominant for the power consumption and the number of accesses drops drastically as the input goes through stages of the pipelined AC-DFA. A DPI system is implemented where the operating frequency of the stages that are not frequently used in the pipeline is reduced to eliminate the waste of power consumption. The power consumption of the proposed DPI system is measured upon various input character set and up to 25% of reduction of total power consumption is obtained, compared to those of the recent DPI systems. The method can be easily applied to other pipelined architecture and string searching applications.

High Performance Coprocessor Architecture for Real-Time Dense Disparity Map (실시간 Dense Disparity Map 추출을 위한 고성능 가속기 구조 설계)

  • Kim, Cheong-Ghil;Srini, Vason P.;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.14A no.5
    • /
    • pp.301-308
    • /
    • 2007
  • This paper proposes high performance coprocessor architecture for real time dense disparity computation based on a phase-based binocular stereo matching technique called local weighted phase-correlation(LWPC). The algorithm combines the robustness of wavelet based phase difference methods and the basic control strategy of phase correlation methods, which consists of 4 stages. For parallel and efficient hardware implementation, the proposed architecture employs SIMD(Single Instruction Multiple Data Stream) architecture for each functional stage and all stages work on pipelined mode. Such that the newly devised pipelined linear array processor is optimized for the case of row-column image processing eliminating the need for transposed memory while preserving generality and high throughput. The proposed architecture is implemented with Xilinx HDL tool and the required hardware resources are calculated in terms of look up tables, flip flops, slices, and the amount of memory. The result shows the possibility that the proposed architecture can be integrated into one chip while maintaining the processing speed at video rate.