• Title/Summary/Keyword: pipelining

Search Result 141, Processing Time 0.031 seconds

High Throughput Implementation of RLS Algorithm Using Fewer Processing Elements

  • Niki, Takeo;Yamada, Rikita;Nishikawa, Kiyoshi;Kiya, Hitoshi
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.406-409
    • /
    • 2000
  • This paper proposes a method that enables us to implement the recursive least squares (RLS) algorithm at, high throughput rate using fewer processing elements (PEs). It is known that the pipeline processing can provide a high throughput rate. But, pipelining is effective only when enough number of PEs are available. The proposed method achieves high throughput rate using a few PEs. The effectiveness of the proposed method is verified through simulations on programmable digital signal processors (in the following, DSP processors).

  • PDF

A High-Speed Thinning Processor for Character Recognition System (문자인식 시스템을 위한 고속 세선화 장치)

  • 김용섭;김민석;주양성;김수원
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.2
    • /
    • pp.153-158
    • /
    • 1992
  • We propose, in this paper, a new thinning algorithm and demonstrate Its effectiveness with some concrete experimental results. This new thinning process can solve the problems of disconnectivity and end point reduction explored in one-Pass algorithm Furthermore, this algorithm is proven effective particularly In high speed operation. A processor for this algorithm that is capable of hand-ling Input Image width(between 25 and 4t) bits ) and also operates on pipelining, is implemented and tested. Flexibility and high speed operation of this thinning processor should find excellent applicability in various areas.

  • PDF

The Optimal pipelining architecture for PICAM (PICAM에서의 최적 파이프라인 구조)

  • 안희일;조태원
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.6A
    • /
    • pp.1107-1116
    • /
    • 2001
  • 고속 IP 주소 룩업(lookup)은 고속 인터넷 라우터의 성능을 좌우하는 주요 요소이다. LPM(longest prefix matching) 탐색은 IP 주소 룩업에서 가장 시간이 많이 걸리는 부분이다. PICAM은 고속 LPM 탐색을 위한 파이프라인 CAM 구조로서, 기존 CAM(content addressable memory, 내용 주수화 메모리)을 이용한 방법보다 룩업 테이블의 갱신속도가 빠르면서도 LPM 탐색율이 높은 CAM 구조이다. PICAM은 3단계의 파이프라인으로 구성된다. 단계 1 및 단계 2의 키필드분할수 및 매칭점의 분포에 따라 파이프라인의 성능이 좌우되며, LPM 탐색율이 달라질 수 있다. 본 논문에서는 PICAM의 파이프라인 성능모델을 제시하고, 이산사건 시뮬레이션(discrete event simulation)을 수행하여, 최적의 PICAM 구조를 도출하였다. IP version 4인 경우 키필드분할수를 8로 하고, 부하가 많이 걸리는 키필드블록을 중복 설치하는 것이 최적구조이며, IP version 6인 경우 키필드블록의 개수를 16으로 하는 것이 최적구조다.

  • PDF

An Improved Implementation of Block Matching Algorithm on a VLIW-based DSP (VLIW 기반 DSP에서의 개선된 블록매칭 알고리즘 구현)

  • You, Hui-Jae;Chung, Sun-Tae;Jung, Sou-Hwan
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.225-226
    • /
    • 2007
  • In this paper, we present our study about the optimization of the block matching algorithm on a VLIW based DSP. The block matching algorithm is well known for its computational burden in motion picture encoding. As supposed to the previous researches where the optimization is achieved by optimizing SAD, the most heavy routine of the block matching, we optimize the block matching algorithm by applying software pipelining technique to the whole routine of the algorithm. Through experiments, the efficiency of the proposed optimization is verified.

  • PDF

Implementation of Efficient Channel Decoder for WiBro System (WiBro 시스템을 위한 효율적인 구조의 채널 복호화기 구현)

  • Kim, Jang-Hun;Han, Chul-Hee
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.177-178
    • /
    • 2007
  • WiBro system provides reliable broadband communication services for mobile and portable subcribers. It allows interference-free reception under the conditions of multipath propagation and transmission errors. Thus, powerful channel-error correction ability Is required. CC/CTC Decoder which Is mandatory for WiBro system needs lots of computations for real-time operation. So, it is desired to design a CC/CTC Decoder having highly optimized hardware scheme for low latency operation under high data rates. This paper proposes an efficient CC/CTC Decoder structure for high data rate WiBro system. Particularly, the proposed CTC Decoder architecture reduces decoding delay by applying pipelining and multiple decoding blocks. Simulation results show that reduction of about 80% of processing time is enabled with the proposed CC/CTC Decoder despite of increase in are.

  • PDF

A Fuzzy Microprocessor for Real-time Control Applications

  • Katashiro, Takeshi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.1394-1397
    • /
    • 1993
  • A Fuzzy Microprocessor(FMP) is presented, which is suitable for real-time control applications. The features include high speed inference of maximum 114K FLIPS at 20MHz system clocks, capability of up to 128-rule construction, and handing of 8 input variables with 8-bit resolution. In order to realize these features, the fuzzifier circuit and the processing element(PE) are well optimized for LSI implementation. The chip fabricated in 1.2$\mu\textrm{m}$ CMOS technology contains 71K transistors in 82.8 $\textrm{mm}^2$ die size and is packaged in 100-pin plastic QFP.

  • PDF

Delayed Scheduling under Resource Constrains (자원제약하에서의 지연 스케쥴링)

  • Shin, In-Soo;Lee, Keun-Man
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.10
    • /
    • pp.2571-2580
    • /
    • 1997
  • In this paper, we deal with the resource constrain scheduling to execute behavior algorithm under resource limit. Expecially, we proposed a scheduling algorithm, called delayed scheduling, which finds the lower bound control step to assign operation under resource limit. We take in account the actual scheduling problems including multicycle operation and functional pipelining. Integer Linear Programing formulations are used to the scheduling problems in order to get optimal scheduling result. Experiment was done on the DFG model of fifth-order digital wave filter, to show it's effectiveness.

  • PDF

Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder (JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증)

  • Kim, Yong-Min;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.2
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

Improving Software Pipelining Performance Using a Register Renaming Technique (소프트웨어 파이프라이닝에서 레지스터 변경을 통한 성능 개선)

  • Cho, Doosan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.1642-1643
    • /
    • 2010
  • 멀티미디어 도메인의 응용 프로그램에는 풍부한 병렬성이 내재하기 때문에 VLIW (Very Long Instruction Word) 형식의 신호처리 프로세서가 많이 사용되고 있다. VLIW 프로세서를 구성하는 복수의 연산처리유닛 (processing unit, PU)의 사용률은 컴파일러의 명령어 스케쥴러의 성능에 의하여 결정된다. 명령어들 사이의 병렬성을 분석하여 동시 수행가능한 명령어들을 각기 다른 PU 에서 수행되도록 프로그램 코드를 최적화한다. 하지만 기존의 명령어 스케쥴러는 복잡한 데이터 디펜던스 그래프 (data dependence graph, DDG)를 구성하여 복수의 PU 를 충분히 사용하도록 하지 못하는 문제점을 내재하고 있다. 이는 명령어 스케쥴러가 각 레지스터 사용시간을 별도로 고려하지 않기 때문에 실제로 내재된 데이터 디펜던스 보다 복잡성이 높은 디펜던스 그래프를 구성하게 되어 스케쥴러가 올바르게 최적화된 코드 스케쥴링 결과를 제공하지 못하기 때문이다. 본 연구에서는 레지스터의 라이프타임을 다른 레지스터를 이용하여 적절히 끊어주는 것으로 데이터 디펜던스 복잡도 완화하여 시스템 성능 향상의 가능성을 보이고 있다.

Monitoring of Parallel Transfer Performance for MPTCP-based Globus Service (MPTCP기반 Globus 서비스 적용을 위한 병렬 전송성능 모니터링)

  • Hong, Wontaek
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.354-356
    • /
    • 2021
  • For science applications that requires rapid transfer and sharing of large volume data, many efforts to improve data transfer performance have been made based on concurrency, parallelism and pipelining in data transfer applications such as Globus/GridFTP. In this paper, as a similar trial, experiments have been conducted for the expected transfer throughput enhancement by the increased number of network interface and parallelism in the Mptcp emulation environment and the result is presented.

  • PDF