• Title/Summary/Keyword: pipelined architecture

Search Result 176, Processing Time 0.02 seconds

Computer Application to ECG Signal Processing

  • Okajima, Mitsuharu
    • Journal of Biomedical Engineering Research
    • /
    • v.6 no.2
    • /
    • pp.13-14
    • /
    • 1985
  • We have developed a microprogramir!able signal processor for real-time ultrasonic signal processing. Processing speed was increased by the parallelism in horizontal microprogram using 104bits microcode and the Pipelined architecture. Control unit of the signal processor was designed by microprogrammed architec- ture and writable control store (WCS) which was interfaced with host computer, APPLE- ll . This enables the processor to develop and simulate various digital signal processing algorithms. The performance of the processor was evaluated by the Fast Fourier Transform (FFT) program. The execution time to perform 16 bit 1024 points complex FF7, radix-2 DIT algorithm, was about 175 msec with IMHz master Clock. We can use this processor to Bevelop more efficient signal processing algorithms on the biological signal processing.

  • PDF

Implementation of DCT using Bit Slice Signal Processor (BIT SLICE SIGNAL PROCESSOR를 이용한 DCT의 구현)

  • Kim, Dong-L.;Go, Seok-B.;Paek, Seung-K.;Lee, Tae-S.;Min, Byong-G.
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1449-1453
    • /
    • 1987
  • A microprogrammable Bit Slice Sinal Processor for image processing is implemented. Processing speed is increased by the parallelism in horizontal microprogram using 120bits microcode, pipelined architecture, 2 bank memory switching that interfaces with the Host through DMA, a variable clock control, overflow checking H/W,look-up table method and cache memory. With this processor, a DCT algorithm which uses 2-D FFT is performed. The execution time for $512{\times}512{\times}8$ image is 12 sec when 16 bit operation is runned, and the recovered image has acceptable quality with MSE 0.276%.

  • PDF

Approximate-SAD Circuit for Power-efficient H.264 Video Encoding under Maintaining Output Quality and Compression Efficiency

  • Le, Dinh Trang Dang;Nguyen, Thi My Kieu;Chang, Ik Joon;Kim, Jinsang
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.5
    • /
    • pp.605-614
    • /
    • 2016
  • We develop a novel SAD circuit for power-efficient H.264 encoding, namely a-SAD. Here, some highest-order MSB's are approximated to single MSB. Our theoretical estimations show that our proposed design simultaneously improves performance and power of SAD circuit, achieving good power efficiency. We decide that the optimal number of approximated MSB's is four under 8-bit YUV-420 format, the largest number not to affect video quality and compression-rate in our video experiments. In logic simulations, our a-SAD circuit shows at least 9.3% smaller critical-path delay compared to existing SAD circuits. We compare power dissipation under iso-throughput scenario, where our a-SAD circuit obtains at least 11.6% power saving compared to other designs. We perform same simulations under two- and three-stage pipelined architecture. Here, our a-SAD circuit delivers significant performance (by 13%) and power (by 17% and 15.8% for two and three stages respectively) improvements.

An Efficient Scheduling for Input Queued Switch (입력큐 교환기를 위한 스케줄링기법)

  • Lee, Sang-Ho;Shin, Dong-Ryeol
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.38 no.12
    • /
    • pp.58-66
    • /
    • 2001
  • Input queueing is useful for high bandwidth switches and routers because of lower complexity and fewer circuits than output queueing. The input queueing switch, however, suffers HOL-Blocking, which limits the throughput to 58%. To get around this low throughput, many input queueing switches have centralized scheduler, which centralized scheduler restrict the design of the switch architecture. To overcome this problem, we propose a simple scheduler called PRR(Pipelined Round Robin), which is intrinsically distributed and presents to show the effectiveness of the proposed scheduling.

  • PDF

A VLSI implementation of image processor for facsimile and digital copier (팩시밀리 및 디지털 복사기를 위한 고속 영상 처리기의 VLSI구현)

  • 박창대;정영훈;김형수;김진수;권오준;홍기상;장동구;박기용;김윤수
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.1
    • /
    • pp.105-113
    • /
    • 1998
  • A new image processor is implemented for high-speed digital copiers and facsimiles. The imgage processor performs CCD and CIS interface, pre-processing, enlargement andreduction of gray level image, and various halftoning algorithms. Implemented halftoning algorithms are simple thresholding, fuzzy based mixed mode thresholding, dithering, and edge enhanced error diffusion. The result of binarization is transferred to a printer with serial or paralel output ports. Line by line pipelined data prodessing architecture is employed with time sharing access of the external memory. In receiving mode, it converts the resolution of received binary image for compatibility with conventional facsimile. In copy mode, a line of A3 paper with 400 dpi is processed with in 2.5 ms. The prototype of image processor was implemented usig Laser Programmable Gate Array (LPGA) with 0.8.mu.m technology.

  • PDF

An Adaptive Fast Motion Estimation Based on Directional Correlation and Predictive Values in H.264 (움직임 방향 연관 및 예측치 적용 기반 적응적 고속 H.264 움직임 추정 알고리즘의 설계)

  • Kim, Cheong-Ghil
    • Journal of The Institute of Information and Telecommunication Facilities Engineering
    • /
    • v.10 no.2
    • /
    • pp.53-61
    • /
    • 2011
  • This research presents an adaptive fast motion estimation (ME) computation on the stage of uneven multi-hexagon grid search (UMHGS) algorithm included in an unsymmetrical-cross multi-hexagon-grid search (UMHexagonS) in H.264 standard. The proposed adaptive method is based on statistical analysis and previously obtained motion vectors to reduce the computational complexity of ME. For this purpose, the algorithm is decomposed into three processes: skipping, terminating, and reducing search areas. Skipping and terminating are determined by the statistical analysis of the collected minimum SAD (sum of absolute difference) and the search area is constrained by the slope of previously obtained motion vectors. Simulation results show that 13%-23% of ME time can be reduced compared with UMHexagonS, while still maintaining a reasonable PSNR (peak signal-to-noise ratio) and average bitrates.

  • PDF

A Study on the Bit-slice Signal Processor for the Biological Signal Processing (생체 신호처리용 Bit-slice Signal Processor에 관한 연구)

  • Kim, Yeong-Ho;Kim, Dong-Rok;Min, Byeong-Gu
    • Journal of Biomedical Engineering Research
    • /
    • v.6 no.2
    • /
    • pp.15-22
    • /
    • 1985
  • We have developed a microprogramir!able signal processor for real-time ultrasonic signal processing. Processing speed was increased by the parallelism in horizontal microprogram using 104bits microcode and the Pipelined architecture. Control unit of the signal processor was designed by microprogrammed architec- ture and writable control store (WCS) which was interfaced with host computer, APPLE- ll . This enables the processor to develop and simulate various digital signal processing algorithms. The performance of the processor was evaluated by the Fast Fourier Transform (FFT) program. The execution time to perform 16 bit 1024 points complex FF7, radix-2 DIT algorithm, was about 175 msec with IMHz master Clock. We can use this processor to Bevelop more efficient signal processing algorithms on the biological signal processing.

  • PDF

The Design of A Program Counter Unit for RISC Processors (RISC 프로세서의 프로그램 카운터 부(PCU)의 설계)

  • 홍인식;임인칠
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.7
    • /
    • pp.1015-1024
    • /
    • 1990
  • This paper proposes a program counter unit(PCU) on the pipelined architecture of RISC (Reduced Instruction Set Computer) type high performance processors, PCU is used for supplying instruction addresses to memory units(Instruction Cache) efficiently. A RISC processor's PCU has to compute the instruction address within required intervals continnously. So, using the method of self-generated incrementor, is more efficient than the conventional one's using ALU or private adder. The proposed PCU is designed to have the fast +4(Byte Address) operation incrementor that has no carry propagation delay. Design specifications are taken by analyzing the whole data path operation of target processor's default and exceptional mode instructions. CMOS and wired logic circuit technologic are used in PCU for the fast operation which has small layout area and power dissipation. The schematic capture and logic, timing simulation of proposed PCU are performed on Apollo W/S using Mentor Graphics CAD tooks.

  • PDF

Performance Optimization of Parallel Algorithms

  • Hudik, Martin;Hodon, Michal
    • Journal of Communications and Networks
    • /
    • v.16 no.4
    • /
    • pp.436-446
    • /
    • 2014
  • The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.

DESIGN AND IMPLEMENTATION OF THE ALL DIGITAL QPSK TRANSMITTER FOR MPEG-2 PACKETS SUPPORTING THE DAVIC STANDARD

  • Park, Sungsoo;Lee, Youngkou;Kim, Kiseon
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.914-918
    • /
    • 2000
  • In this paper, a next generation high speed QPSK transmitter is designed based on 1.8${\mu}$m design rule. The designed transmitter supports the MPEG2-TS coded packed data for the DAVIC standard. Transmitter is composed of the convolutional coder, the shortened Reed-Solomon coder, and QPSK modulator. The coded packets are modulated in APSK with an RC filter. Especially, Galois Field multiplier with a standard basis is designed with the pipelined parallel architecture. Also, in the QPSK modulator, the RC filter and mixer are simplified into the ROM table, which can improve the performance of the transmitter. The total number of gates for the implemented baseband transmitter is 26,875.

  • PDF