• Title/Summary/Keyword: Pipelined Processor

Search Result 75, Processing Time 0.025 seconds

A Pipelined Hadamard Transform Processor (파이프라인 방식에 의한 아다마르 변환 프로세서)

  • 황영수;윤대희;차일환
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.10
    • /
    • pp.1617-1623
    • /
    • 1989
  • The introduction of the fast Fourier transform(FFT),an efficient computational algorithm for the discrete Fourier transform(DFT) by Cooley and Tukey(1965), has brought to the limelight various other discrete transforms. Some of the analog functions from which these transforms have been derived date back to the early 1920's, for example, Walsh functions (Walsh, 1923) and Hadamard Transform(Enomoto et al, 1965). Fast algorithms developed for the forward transform are equally applicable, exept for minor changes, to the inverse transform. In this paper, we present a simple pipelined Hadamard matrix(HM) which is used to develop a fast algorithm for the Hadamard Processor (HP). The Fast Hadamard Transform(FHT) can be derived using matrix partitioning techniques. The HP system is incorporated through a modular design which permits tailoring to meet a wide range of video data link applications. Emphasis has been placed on a low cost, a low power design suitable for airbone system and video codec.

  • PDF

Design of Pipelined Floating-Point Arithmetic Unit for Mobile 3D Graphics Applications

  • Choi, Byeong-Yoon;Ha, Chang-Soo;Lee, Jong-Hyoung;Salclc, Zoran;Lee, Duck-Myung
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.6
    • /
    • pp.816-827
    • /
    • 2008
  • In this paper, two-stage pipelined floating-point arithmetic unit (FP-AU) is designed. The FP-AU processor supports seventeen operations to apply 3D graphics processor and has area-efficient and low-latency architecture that makes use of modified dual-path computation scheme, new normalization circuit, and modified compound adder based on flagged prefix adder. The FP-AU has about 4-ns delay time at logic synthesis condition using $0.18{\mu}m$ CMOS standard cell library and consists of about 5,930 gates. Because it has 250 MFLOPS execution rate and supports saturated arithmetic including a number of graphics-oriented operations, it is applicable to mobile 3D graphics accelerator efficiently.

  • PDF

Implementation of Digital Filters on Pipelined Processor with Multiple Accumulators and Internal Datapaths

  • Hong, Chun-Pyo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.4 no.2
    • /
    • pp.44-50
    • /
    • 1999
  • This paper presents a set of techniques to automatically find rate optimal or near rate optimal implementation of shift-invariant flow graphs on pipelined processor, in which pipeline processor has multiple accumulators and internal datapaths. In such case, the problem to be addressed is the scheduling of multiple instruction streams which control all of the pipeline stages. The goal of an automatic scheduler in this context is to rearrange the order of instructions such that they are executed with minimum iteration period between successive iteration of defining flow graphs. The scheduling algorithm described in this paper also focuses on the problem of removing the hazards due to inter-instruction dependencies.

  • PDF

A Modular On-the-fly Round Key Generator for AES Cryptographic Processor (AES 암호 프로세서용 모듈화된 라운드 키 생성기)

  • Choi Byeong-Yoon;Lee Jong-Hyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.5
    • /
    • pp.1082-1088
    • /
    • 2005
  • Generating fast round key in AES Rijndael algorithm using three key sizes, such as 128, 192, and 256-bit keys is a critical factor to develop high throughput AES processors. In this paper, we propose on-the-fly round key generator which is applicable to the pipelined and non-pipelined AES processor in which cipher and decipher nodes must be implemented on a chip. The proposed round key generator has modular and area-and-time efficient structure implemented with simple connection of two key expander modules, such as key_exp_m and key_exp_s module. The round key generator for non-pipelined AES processor with support of three key lengths and cipher/decipher modes has about 7.8-ns delay time under 0.25um 2.5V CMOS standard cell library and consists of about 17,700 gates.

A Study on the Bit-slice Signal Processor for the Biological Signal Processing (생체 신호처리용 Bit-slice Signal Processor에 관한 연구)

  • Kim, Yeong-Ho;Kim, Dong-Rok;Min, Byeong-Gu
    • Journal of Biomedical Engineering Research
    • /
    • v.6 no.2
    • /
    • pp.15-22
    • /
    • 1985
  • We have developed a microprogramir!able signal processor for real-time ultrasonic signal processing. Processing speed was increased by the parallelism in horizontal microprogram using 104bits microcode and the Pipelined architecture. Control unit of the signal processor was designed by microprogrammed architec- ture and writable control store (WCS) which was interfaced with host computer, APPLE- ll . This enables the processor to develop and simulate various digital signal processing algorithms. The performance of the processor was evaluated by the Fast Fourier Transform (FFT) program. The execution time to perform 16 bit 1024 points complex FF7, radix-2 DIT algorithm, was about 175 msec with IMHz master Clock. We can use this processor to Bevelop more efficient signal processing algorithms on the biological signal processing.

  • PDF

A VLSI Design for Digital Pre-distortion with Pipelined CORDIC Processors

  • Park, Jong Kang;Moon, Jun Young;Kim, Kyunghoon;Yang, Youngoo;Kim, Jong Tae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.6
    • /
    • pp.718-727
    • /
    • 2014
  • In a wireless communications system, a predistorter is often used to compensate for the nonlinear distortions that result from operating a power amplifier near the saturation region, thereby improving system performance and increasing the spectral efficiency for the communication channels. This paper presents a new VLSI design for the polynomial digital predistorter (DPD). The proposed DPD uses a Coordinate Rotation Digital Computing (CORDIC) processor and a PD process with a fully-pipelined architecture. Due to its simple and regular structure, it can be a competitive design when compared to existing polynomial-type and approximated DPDs. Implementing a fifth-order distorter with the proposed design requires only 43,000 logic gates in a $0.35{\mu}m$ CMOS standard cell library.

Computer Application to ECG Signal Processing

  • Okajima, Mitsuharu
    • Journal of Biomedical Engineering Research
    • /
    • v.6 no.2
    • /
    • pp.13-14
    • /
    • 1985
  • We have developed a microprogramir!able signal processor for real-time ultrasonic signal processing. Processing speed was increased by the parallelism in horizontal microprogram using 104bits microcode and the Pipelined architecture. Control unit of the signal processor was designed by microprogrammed architec- ture and writable control store (WCS) which was interfaced with host computer, APPLE- ll . This enables the processor to develop and simulate various digital signal processing algorithms. The performance of the processor was evaluated by the Fast Fourier Transform (FFT) program. The execution time to perform 16 bit 1024 points complex FF7, radix-2 DIT algorithm, was about 175 msec with IMHz master Clock. We can use this processor to Bevelop more efficient signal processing algorithms on the biological signal processing.

  • PDF

Optimization of Pipelined Discrete Wavelet Packet Transform Based on an Efficient Transpose Form and an Advanced Functional Sharing Technique

  • Nguyen, Hung-Ngoc;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of Information Processing Systems
    • /
    • v.15 no.2
    • /
    • pp.374-385
    • /
    • 2019
  • This paper presents an optimal implementation of a Daubechies-based pipelined discrete wavelet packet transform (DWPT) processor using finite impulse response (FIR) filter banks. The feed-forward pipelined (FFP) architecture is exploited for implementation of the DWPT on the field-programmable gate array (FPGA). The proposed DWPT is based on an efficient transpose form structure, thereby reducing its computational complexity by half of the system. Moreover, the efficiency of the design is further improved by using a canonical-signed digit-based binary expression (CSDBE) and advanced functional sharing (AFS) methods. In this work, the AFS technique is proposed to optimize the convolution of FIR filter banks for DWPT decomposition, which reduces the hardware resource utilization by not requiring any embedded digital signal processing (DSP) blocks. The proposed AFS and CSDBE-based DWPT system is embedded on the Virtex-7 FPGA board for testing. The proposed design is implemented as an intellectual property (IP) logic core that can easily be integrated into DSP systems for sub-band analysis. The achieved results conclude that the proposed method is very efficient in improving hardware resource utilization while maintaining accuracy of the result of DWPT.

Design of AMBA AX I Slave Unit for Pipelined Arithmetic Unit (파이프라인 구조 연산회로를 위한 AMBA AXI Slave 설계)

  • Choi, Byeong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.712-713
    • /
    • 2011
  • In this paper, the AMBA AXI slave unit that can verify the pipelined arithmetic unit is proposed and the 2-stage 16-bit pipelined multiplier is introduced as design example. The proposed AXI slave unit consists of input buffer block memory, control registers, pipelined arithmetic unit, control unit, output buffer block memory, and AXI slave interface unit. The main operational procedures are divided into the following steps, such as burst-mode input data loading for the input buffer memory, programming of control registers, arithmetic operations for block data in the input buffer memory, and burst-mode output data unloading from output buffer memory to host processor. Because the proposed AXI slave unit is general structure, it can be efficiently applicable to AMBA AXI and AHB slave unit with pipelined arithmetic unit.

  • PDF

Implementation of DCT using Bit Slice Signal Processor (BIT SLICE SIGNAL PROCESSOR를 이용한 DCT의 구현)

  • Kim, Dong-L.;Go, Seok-B.;Paek, Seung-K.;Lee, Tae-S.;Min, Byong-G.
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1449-1453
    • /
    • 1987
  • A microprogrammable Bit Slice Sinal Processor for image processing is implemented. Processing speed is increased by the parallelism in horizontal microprogram using 120bits microcode, pipelined architecture, 2 bank memory switching that interfaces with the Host through DMA, a variable clock control, overflow checking H/W,look-up table method and cache memory. With this processor, a DCT algorithm which uses 2-D FFT is performed. The execution time for $512{\times}512{\times}8$ image is 12 sec when 16 bit operation is runned, and the recovered image has acceptable quality with MSE 0.276%.

  • PDF