• Title/Summary/Keyword: pipelined

Search Result 377, Processing Time 0.033 seconds

FPGA Implementation of CORDIC-based Phase Calculator for Depth Image Extraction (Depth Image 추출용 CORDIC 기반 위상 연산기의 FPGA 구현)

  • Koo, Jung-youn;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.279-282
    • /
    • 2012
  • In this paper, a hardware architecture of phase calculator for 3D image processing is proposed. The designed phase calculator, which adopts a pipelined architecture to improve throughput, performs arctangent operation using vectoring mode of CORDIC algorithm. Fixed-point MATLAB modeling and simulations are carried out to determine the optimized bit-widths and number of iteration. Phase calculator designed in Verilog HDL is verified by emulating the restoration of virtual 3D data using MATLAB/Simulink and FPGA-in-the-loop verification.

  • PDF

A Study on the ADC for High Speed Data Conversion (고속 데이터 변환을 위한 ADC에 관한 연구)

  • Kim, Sun-Youb;Park, Hyoung-Keun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.3
    • /
    • pp.460-465
    • /
    • 2007
  • In this paper, the pipelined A/D converter with multi S/H stage structure is proposed for high resolution and high-speed data conversion rate. In order to improve a resolution and operational speed, the proposed structure increased the sampling time that is sampled input signal. In order to verify the operation characteristics 20MS/s pipelined A/D converter is designed with two S/H stage. The simulation result shows that INL and DNL are $0.52LSB{\sim}-0.63LSB$ and $0.53LSB{\sim}-0.56LSB$, respectively. Also, the designed Analog-to-Digital converter has the SNR of 43dB and power consumption is 18.5mW.

  • PDF

2048-point Low-Complexity Pipelined FFT Processor based on Dynamic Scaling (동적 스케일링에 기반한 낮은 복잡도의 2048 포인트 파이프라인 FFT 프로세서)

  • Kim, Ji-Hoon
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.697-702
    • /
    • 2021
  • Fast Fourier Transform (FFT) is a major signal processing block being widely used. For long-point FFT processing, usually more than 1024 points, its low-complexity implementation becomes very important while retaining high SQNR (Signal-to-Quantization Noise Ratio). In this paper, we present a low-complexity FFT algorithm with a simple dynamic scaling scheme. For the 2048-point pipelined FFT processing, we can reduce the number of general multipliers by half compared to the well-known radix-2 algorithm. Also, the table size for twiddle factors is reduced to 35% and 53% compared to the radix-2 and radix-22 algorithms respectively, while achieving SQNR of more than 55dB without increasing the internal wordlength progressively.

FPGA Design and Implementation of A Pipelined Out-of-Order Superscalar Processor (파이프라인식 비순차실행 수퍼스칼라 프로세서의 FPGA 설계 및 구현)

  • Jongbok Lee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.153-158
    • /
    • 2023
  • Domestically, the importance of system semiconductor design is increasing, and the balanced development with the high-end memory semiconductors should be promoted. Using Xilinx Vivado as a development enivronment tool, it reduces time and cost dramatically in implementing the processor on FPGA. In this paper, the VHDL language which provides record data structure for an efficient digital system design is used for designing a pipelined out-of-order superscalar processor. It has been simulated extensively, synthesized and implemented on FPGA and verified by Integrated Logic Analyzer. As a result, the pipelined out-of-order superscalar processor could be executed successfully.

A Design and Implementation of 32-bit RISC-V RV32IM Pipelined Processor in Embedded Systems (임베디드 환경에서의 32-bit RISC-V RV32IM 파이프라인 프로세서 설계 및 구현)

  • Subin Park;Yongwoo Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.81-86
    • /
    • 2023
  • Recently, demand for embedded systems requiring low power and high specifications has been increasing, and RISC-V processors are being widely applied. RISC-V, a RISC-based open instruction set architecture (ISA), has been developed and researched by UC Berkeley and other researchers since 2010. RV32I ISA is sufficient to support integer operations such as addition and subtraction instructions, but M-extension should be defined for multiplication and division instructions. This paper proposes an RV32I, RV32IM processor, and indicates benchmark performance scores compared to an existing processor. Additionally, A non-stalling method was proposed to support a 2-stage pipelined DSP multiplier to the 5-stage pipelined RV32IM processor. Proposed RV32I and RV32IM processors satisfied a maximum operating frequency of 50 MHz on Artix-7 FPGA. The performance of the proposed processors was verified using benchmark programs from Dhrystone and Coremark. As a result, the Coremark benchmark results of the proposed processor showed that it outperformed the existing RV32IM processor by 23.91%.

  • PDF

A Pipelined Parallel Optimized Design for Convolution-based Non-Cascaded Architecture of JPEG2000 DWT (JPEG2000 이산웨이블릿변환의 컨볼루션기반 non-cascaded 아키텍처를 위한 pipelined parallel 최적화 설계)

  • Lee, Seung-Kwon;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.7
    • /
    • pp.29-38
    • /
    • 2009
  • In this paper, a high performance pipelined computing design of parallel multiplier-temporal buffer-parallel accumulator is present for the convolution-based non-cascaded architecture aiming at the real time Discrete Wavelet Transform(DWT) processing. The convolved multiplication of DWT would be reduced upto 1/4 by utilizing the filter coefficients symmetry and the up/down sampling; and it could be dealt with 3-5 times faster computation by LUT-based DA multiplication of multiple filter coefficients parallelized for product terms with an image data. Further, the reutilization of computed product terms could be achieved by storing in the temporal buffer, which yields the saving of computation as well as dynamic power by 50%. The convolved product terms of image data and filter coefficients are realigned and stored in the temporal buffer for the accumulated addition. Then, the buffer management of parallel aligned storage is carried out for the high speed sequential retrieval of parallel accumulations. The convolved computation is pipelined with parallel multiplier-temporal buffer-parallel accumulation in which the parallelization of temporal buffer and accumulator is optimize, with respect to the performance of parallel DA multiplier, to improve the pipelining performance. The proposed architecture is back-end designed with 0.18um library, which verifies the 30fps throughput of SVGA(800$\times$600) images at 90MHz.

High Performance Hardware Implementation of the 128-bit SEED Cryptography Algorithm (128비트 SEED 암호 알고리즘의 고속처리를 위한 하드웨어 구현)

  • 전신우;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.11 no.1
    • /
    • pp.13-23
    • /
    • 2001
  • This paper implemented into hardware SEED which is the KOREA standard 128-bit block cipher. First, at the respect of hardware implementation, we compared and analyzed SEED with AES finalist algorithms - MARS, RC6, RIJNDAEL, SERPENT, TWOFISH, which are secret key block encryption algorithms. The encryption of SEED is faster than MARS, RC6, TWOFISH, but is as five times slow as RIJNDAEL which is the fastest. We propose a SEED hardware architecture which improves the encryption speed. We divided one round into three parts, J1 function block, J2 function block J3 function block including key mixing block, because SEED repeatedly executes the same operation 16 times, then we pipelined one round into three parts, J1 function block, J2 function block, J3 function block including key mixing block, because SEED repeatedly executes the same operation 16 times, then we pipelined it to make it more faster. G-function is implemented more easily by xoring four extended 4 byte SS-boxes. We tested it using ALTERA FPGA with Verilog HDL. If the design is synthesized with 0.5 um Samsung standard cell library, encryption of ECB and decryption of ECB, CBC, CFB, which can be pipelined would take 50 clock cycles to encrypt 384-bit plaintext, and hence we have 745.6 Mbps assuming 97.1 MHz clock frequency. Encryption of CBC, OFB, CFB and decryption of OFB, which cannot be pipelined have 258.9 Mbps under same condition.

An Implementation of Pipelined Prallel Processing System for Multi-Access Memory System

  • Lee, Hyung;Cho, Hyeon-Koo;You, Dae-Sang;Park, Jong-Won
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.149-151
    • /
    • 2002
  • We had been developing the variety of parallel processing systems in order to improve the processing speed of visual media applications. These systems were using multi-access memory system(MAMS) as a parallel memory system, which provides the capability of the simultaneous accesses of image points in a line-segment with an arbitrary degree, which is required in many low-level image processing operations such as edge or line detection in a particular direction, and so on. But, the performance of these systems did not give a faithful speed because of asynchronous feature between MAMS and processing elements. To improve the processing speed of these systems, we have been investigated a pipelined parallel processing system using MAMS. Although the system is considered as being the single instruction multiple data(SIMD) type like the early developed systems, the performance of the system yielded about 2.5 times faster speed.

  • PDF

A study on the Cycle-Accurate Retargetable Micro-Architecture Simulation Framework (사이클 정확도의 재목적화 가능한 마이크로아키텍쳐 시뮬레이션 프레임워크에 관한 연구)

  • Yang, Hoon-Mo;Lee, Moon-Key
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.643-646
    • /
    • 2005
  • This paper presents CARMA (Cycle-Accurate Retargetable Micro-Architecture) as efficient framework for SoC-centric pipelined instruction-set architectures. It is based on ADL (Architecture Description Language) and provides more concise and manifest semantics to describe behavior of instruction set by mixing efficiency of instruction-set simulators and flexibility of RTL simulators. It exploits new timing model method based on process scheduling so it can support general timing model with cycle accuracy for large-scaled architectures usually used in SoC multimedia chip-set. According to experiments, the proposed framework was shown to be 5.5 times faster than HDL and 2.5 times faster than System-C in simulation speed so it is applicable for complex instruction-set pipelined architectures.

  • PDF

High-Speed Hardware Architectures for ARIA with Composite Field Arithmetic and Area-Throughput Trade-Offs

  • Lee, Sang-Woo;Moon, Sang-Jae;Kim, Jeong-Nyeo
    • ETRI Journal
    • /
    • v.30 no.5
    • /
    • pp.707-717
    • /
    • 2008
  • This paper presents two types of high-speed hardware architectures for the block cipher ARIA. First, the loop architectures for feedback modes are presented. Area-throughput trade-offs are evaluated depending on the S-box implementation by using look-up tables or combinational logic which involves composite field arithmetic. The sub-pipelined architectures for non-feedback modes are also described. With loop unrolling, inner and outer round pipelining techniques, and S-box implementation using composite field arithmetic over $GF(2^4)^2$, throughputs of 16 Gbps to 43 Gbps are achievable in a 0.25 ${\mu}m$ CMOS technology. This is the first sub-pipelined architecture of ARIA for high throughput to date.

  • PDF