• Title/Summary/Keyword: 파이프라인 구조

Search Result 474, Processing Time 0.024 seconds

High Performance Coprocessor Architecture for Real-Time Dense Disparity Map (실시간 Dense Disparity Map 추출을 위한 고성능 가속기 구조 설계)

  • Kim, Cheong-Ghil;Srini, Vason P.;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.14A no.5
    • /
    • pp.301-308
    • /
    • 2007
  • This paper proposes high performance coprocessor architecture for real time dense disparity computation based on a phase-based binocular stereo matching technique called local weighted phase-correlation(LWPC). The algorithm combines the robustness of wavelet based phase difference methods and the basic control strategy of phase correlation methods, which consists of 4 stages. For parallel and efficient hardware implementation, the proposed architecture employs SIMD(Single Instruction Multiple Data Stream) architecture for each functional stage and all stages work on pipelined mode. Such that the newly devised pipelined linear array processor is optimized for the case of row-column image processing eliminating the need for transposed memory while preserving generality and high throughput. The proposed architecture is implemented with Xilinx HDL tool and the required hardware resources are calculated in terms of look up tables, flip flops, slices, and the amount of memory. The result shows the possibility that the proposed architecture can be integrated into one chip while maintaining the processing speed at video rate.

Fast Fourier Transform Processor based on Low-power and Area-efficient Algorithm (저 전력 및 면적 효율적인 알고리즘 기반 고속 퓨리어 변환 프로세서)

  • Oh Jung-yeol;Lim Myoung-seob
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.2 s.302
    • /
    • pp.143-150
    • /
    • 2005
  • This paper proposes a new $radix-2^4$ FFT algorithm and an efficient pipeline architecture based on this new algorithm for OFDM systems. The pipeline architecture based on the new algorithm has the same number of multipliers as that of the $radix-2^2$ algorithm. However, the multiplier complexity could be reduced by more than $30\%$ by replacing one half of the programmable complex multipliers by the newly proposed CSD constant complex multipliers. From synthesis simulations of a standard 0.35um CMOS Samsung process, a proposed CSD constant complex multiplier achieved more than $60\%$ area efficiency when compared with the conventional programmable complex multiplier. This promoted efficiency can be used for the design of a long length FFT processor in wireless OFDM applications which needs more power and area efficiency.

Efficient Parallel IP Address Lookup Architecture with Smart Distributor (스마트 분배기를 이용한 효율적인 병렬 IP 주소 검색 구조)

  • Kim, Junghwan;Kim, Jinsoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.44-51
    • /
    • 2013
  • Routers should perform fast IP address lookup for Internet to provide high-speed service. In this paper, we present a hybrid parallel IP address lookup structure composed of four-stage pipeline. It achieves parallelism at low cost by using multiple SRAMs in stage 2 and partitioned TCAMs in stage 3, and improves the performance through pipelining. The smart distributor in stage 1 does not transfer any IP address identical to previous one toward the next stage, but only uses the result of the previous lookup. So it improves throughput of lookup by caching effects, and decreases the access conflict to TCAM bank in stage 3 as well. In the last stage, the reorder buffer rearranges the completed IP addresses according to the input order. We evaluate the performance of our parallel pipelined IP lookup structure comparing with previous hybrid structure, using the real routing table and traffic distributions generated by Zipf's law.

Open-Loop Pipeline ADC Design Techniques for High Speed & Low Power Consumption (고속 저전력 동작을 위한 개방형 파이프라인 ADC 설계 기법)

  • Kim Shinhoo;Kim Yunjeong;Youn Jaeyoun;Lim Shin-ll;Kang Sung-Mo;Kim Suki
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.1A
    • /
    • pp.104-112
    • /
    • 2005
  • Some design techniques for high speed and low power pipelined 8-bit ADC are described. To perform high-speed operation with relatively low power consumption, open loop architecture is adopted, while closed loop architecture (with MDAC) is used in conventional pipeline ADC. A distributed track and hold amplifier and a cascading structure are also adopted to increase the sampling rate. To reduce the power consumption and the die area, the number of amplifiers in each stage are optimized and reduced with proposed zero-crossing point generation method. At 500-MHz sampling rate, simulation results show that the power consumption is 210mW including digital logic with 1.8V power supply. And the targeted ADC achieves ENOB of about 8-bit with input frequency up to 200-MHz and input range of 1.2Vpp (Differential). The ADC is designed using a $0.18{\mu}m$ 6-Metal 1-Poly CMOS process and occupies an area of $900{\mu}m{\times}500{\mu}m$

A Design of ADC with Multi SHA Structure which for High Data Communication (고속 데이터 통신을 위한 다중Multi SHA구조를 갖는 ADC설계)

  • Kim, Sun-Youb
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.9
    • /
    • pp.1709-1716
    • /
    • 2007
  • In this paper, ADC with multi SHA structure is proposed for high speed operation. The proposed structure incorporates a multi SHA block that consists of multiple SHAs of identical characteristics in parallel to improve the conversion speed. The designed multi SHA is operated by non-overlapping clocks and the sampling speed can be improved by increasing the number of multiplexed SHAs. Pipelined A/D converter, applying the proposed structure, is designed to satisfy requirement of analog front-end of VDSL modem. The measured INL and DNL of designed A/D converter are $0.52LSB{\sim}-0.50LSB$ and $0.80LSB{\sim}-0.76LSB$, respectively. It satisfies the design specifications for VDSL modems. The simulated SNR is about 66dB which corresponds to a 10.7 bit resolution. The power consumption is 24.32mW.

A 12b 10MS/s CMOS Pipelined ADC Using a Reference Scaling Technique (기준 전압 스케일링을 이용한 12비트 10MS/s CMOS 파이프라인 ADC)

  • Ahn, Gil-Cho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.16-23
    • /
    • 2009
  • A 12b 10MS/s pipelined ADC with low DC gain amplifiers is presented. The pipelined ADC using a reference scaling technique is proposed to compensate the gain error in MDACs due to a low DC gain amplifier. To minimize the performance degradation of the ADC due to amplifier offset, the proposed offset trimming circuit is employed m the first-stage MDAC amplifier. Additional reset switches are used in all MDACs to reduce the memory effect caused by the low DC gain amplifier. The measured differential and integral non-linearities of the prototype ADC with 45dB DC gain amplifiers are less than 0.7LSB and 3.1LSB, respectively. The prototype ADC is fabricated in a $0.35{\mu}m$ CMOS process and achieves 62dB SNDR and 72dB SFDR with 2.4V supply and 10MHz sampling frequency while consuming 19mW power.

Design and Simulation for Out-of-Order Execution Processor of a Fully Pipelined Scheme (완전한 파이프라인 방식의 비순차실행 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.5
    • /
    • pp.143-149
    • /
    • 2020
  • Currently, a multi-core processor is mainly used as a central processing unit of a computer system, and a high-performance out-of-order processor is adopted as each core to maximize system performance. The early out-of-order execution processor with Tomasulo algorithm aimed at floating-point instructions, and it took several cycles to execute by the use of complex structures such as reorder buffer and reservation station. However, in order for the processor to properly utilize out-of-order execution and increase the throughput of instructions, it must operate in a fully pipelined manner. In this paper, a fully pipelined out-of-order processor with speculative execution is designed with VHDL and verified with GHDL. As a result of the simulation, a program composed of ARM instructions is successfully performed.

Web Server based Hologram Image Production Pipeline System Implementation (웹 서버 기반의 홀로그램 영상 제작 파이프라인 시스템 구현)

  • Kim, Yongjung;Park, Chansoo;Shin, Seokyong;Kim, Jungho;Gentet, Philippe;Lee, Jiyoon;Kwon, Soonchul;Lee, Seunghyun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.751-757
    • /
    • 2021
  • In this paper, we proposed a pipeline system for holographic image production in a web server-based environment. There are time and spatial constraints for the existing holographic image production. The purpose of the proposed system is to obtain high-quality holographic images by reducing accessibility to users. It is a structure in which a video captured by a user in a web environment is transmitted to a server and converted into a frame for holographic image production through post-production. For high-quality holographic image acquisition, post-processing uses a deep learning-based algorithm. The proposed system provides various service tools in the web environment for user convenience. Through this method, the user's accessibility is improved when producing holographic images because images are taken in a web environment rather than in a limited space.

A Prefetch Architecture with Efficient Branch Prediction for a 64-bit 4-way Superscalar Microprocessor (64비트 4-way 수퍼스칼라 마이크로프로세서의 효율적인 분기 예측을 수행하는 프리페치 구조)

  • 문상국;문병인;이용환;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.11B
    • /
    • pp.1939-1947
    • /
    • 2000
  • 본 논문에서는 명령어의 효율적인 페치를 위해 분기 타겟 주소 전체를 사용하지 않고 캐쉬 메모리(cache memory) 내의 적은 비트 수로 인덱싱 하여 한 클럭 사이클 안에 최대 4개의 명령어를 다음 파이프라인으로 보내줄 수 있는 방법을 제시한다. 본 프리페치 유닛은 크게 나누어 3개의 영역으로 나눌 수 있는데, 분기에 관련하여 미리 부분적으로 명령어를 디코드 하는 프리디코드(predecode) 블록, 타겟 주소(NTA : Next Target Address) 테이블 영역을 추가시킨 명령어 캐쉬(instruction cache) 블록, 전체 유닛을 제어하고 가상 주소를 관리하는 프리페치(prefetch) 블록으로 나누어진다. 사용된 명령어들은 SPARC(Scalable Processor ARChitecture) V9에 기준 하였고 구현은 Verilog-HDL(Hardwave Description Language)을 사용하여 기능 수준으로 기술되고 검증되었다. 구현된 프리페치 유닛은 명령어 흐름에 분기가 존재하더라도 단일 사이클 안에 4개까지의 명령어들을 정확한 예측 하에 다음 파이프라인으로 보내줄 수 있다. 또한 NTA를 사용한 방법은 같은 수의 레지스터 비트를 사용하였을 때 BTB(Branch Target Buffer)를 사용하는 방법과 비교하여 2배정도 많은 개수의 분기 명령 주소를 저장할 수 있는 장점이 있다.

  • PDF

EIS Processor Architecture for Enhanced Instruction Processing (빠른 명령어 처리가 가능한 EIS 프로세서 구조)

  • 지승현;전중남;김석일
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.12B
    • /
    • pp.1967-1978
    • /
    • 2000
  • 본 논문에서는 실행 시에 긴명령어를 구성하는 각 단위 명령어를 독립적으로 스케줄링할 수 있는 EIS 프로세서 구조를 제안하였다. 단위 명령어별 독립적인 수행을 위해서, EIS 프로세서 구조는 여러 개의 연산처리기와 스케줄러의 쌍으로 구성된다. EIS 프로세서 구조내의 모든 스케줄러는 독립적으로 자료종속성이나 자원충돌 여부를 검사하여 단위 명령어를 실행할지 혹은 다음 파이프라인 사이클동안 실행을 지연시킬지를 결정한다. 또한 EIS프로세서용 목적코드는 단위 명령어들간 동기화를 위해서 모든 단위 명령어에 종속성정보를 삽입하는 특징을 지닌다. 즉, EIS 프로세서 구조는 긴명령어내의 각 단위 명령어를 독립적으로 실행시킬 수 있으므로 기존의 VLIW 프로세서 구조나 SVLIW 프로세서 구조에서의 실행지연 시간을 제거할 수 있다. 시뮬레이션을 통해서도 EIS 프로세서 구조의 실행사이클이 VLIW 프로세서 구조나 SVLIW 프로세서 구조에서의 경우보다 더 빠름을 입증할 수 있었다. 특히 실수 명령어 분포가 높은 프로그램에서 EIS 프로세서에서의 실행사이클이 다른 프로세서 구조의 경우에 비하여 현저하게 줄어드는 것을 확인할 수 있었다.

  • PDF