• Title/Summary/Keyword: Pipelined Processor

Search Result 75, Processing Time 0.032 seconds

Bounding Worst-Case DRAM Performance on Multicore Processors

  • Ding, Yiqiang;Wu, Lan;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.53-66
    • /
    • 2013
  • Bounding the worst-case DRAM performance for a real-time application is a challenging problem that is critical for computing worst-case execution time (WCET), especially for multicore processors, where the DRAM memory is usually shared by all of the cores. Typically, DRAM commands from consecutive DRAM accesses can be pipelined on DRAM devices according to the spatial locality of the data fetched by them. By considering the effect of DRAM command pipelining, we propose a basic approach to bounding the worst-case DRAM performance. An enhanced approach is proposed to reduce the overestimation from the invalid DRAM access sequences by checking the timing order of the co-running applications on a dual-core processor. Compared with the conservative approach, which assumes that no DRAM command pipelining exists, our experimental results show that the basic approach can bound the WCET more tightly, by 15.73% on average. The experimental results also indicate that the enhanced approach can further improve the tightness of WCET by 4.23% on average as compared to the basic approach.

Design of Pipeline Bus and the Performance Evaluation in Multiprocessor System (다중프로세서 시스템에서 파이프라인 전송 버스의 설계 및 성능 평가)

  • 윤용호;임인칠
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.2
    • /
    • pp.288-299
    • /
    • 1993
  • This paper proposes the new bus protocol in the tightly coupled multiprocessor system. The bus protocol uses the pipelined data transfer and block transfer scheme to increase the bus bandwidth, The bus also has the independent transfer lines for the address and data respectively, and it can transfer the data up to maximum 264 Mbytes /sec. This paper also models the multiprocessor system where each processor boards have the private cache. Simulation evaluates the bus and system performance according to hit ratio of the reference data in cache memory, In the case of using this bus, the bus is evaluated not to be saturated when up to 10 processor boards are connected to the bus. As for up to 4 memory interleavng, the performance increases linearly.

  • PDF

Hardware Design of VLIW coprocessor for Computer Vision Application (컴퓨터 비전 응용을 위한 VLIW 보조프로세서의 하드웨어 설계)

  • Choi, Byeong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.9
    • /
    • pp.2189-2196
    • /
    • 2014
  • In this paper, a VLIW(Very Long Instruction Word) vision coprocessor which can efficiently accelerate computer vision algorithm for automotive is designed. The VLIW coprocessor executes four instructions per clock cycle via 8-stage pipelined structure and has 36 integer and floating-point instructions to accelerate computer vision algorithm for pedestrian detection. The processor has about 300-MHz operating frequency and about 210,900 gates under 45nm CMOS technology and its estimated performance is 1.2 GOPS(Giga Operations Per Second). The vision system composed of vision primitive engine and eight VLIW coprocessors can execute pedestrian detection at 25~29 frames per second(FPS). Because the VLIW coprocessor has high detection rate and loosely coupled interface with host processor, it can be efficiently applicable to a wide range of vision applications.

Efficient FFT Algorithm and Hardware Implementation for High Speed Multimedia Communication Systems (고속 멀티미디어 통신시스템을 위한 효율적인 FFT 알고리즘 및 하드웨어 구현)

  • 정윤호;김재석
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.3
    • /
    • pp.55-64
    • /
    • 2004
  • In this paper, we propose an efficient FFT algorithm for high speed multimedia communication systems, and present its pipeline implementation results. Since the proposed algorithm is based on the radix-4 butterfly unit, the processing rate can be twice as fast as that based on the radix-2$^3$ algorithm. Also, its implementation is more area-efficient than the implementation from conventional radix-4 algorithm due to reduced number of nontrivial multipliers like using the radix-23 algorithm. In order to compare the proposed algorithm with the conventional radix-4 algorithm, the 64-point MDC pipelined FFT processor based on the proposed algorithm was implemented. After the logic synthesis using 0.6${\mu}{\textrm}{m}$ technology, the logic gate count for the processor with the proposed algorithm is only about 70% of that for the processor with the conventional radix-4 algorithm. Since the proposed algorithm can be achieve higher processing rate and better efficiency than the conventional algorithm, it is very suitable for the high speed multimedia communication systems such as WLAN, DAB, DVB, and ADSL/VDSL systems.

Verification Platform with ARM- and DSP-Based Multiprocessor Architecture for DVB-T Baseband Receivers

  • Cho, Koon-Shik;Chang, June-Young;Cho, Han-Jin;Cho, Jun-Dong
    • ETRI Journal
    • /
    • v.30 no.1
    • /
    • pp.141-151
    • /
    • 2008
  • In this paper, we introduce a new verification platform with ARM- and DSP-based multiprocessor architecture. Its simple communication interface with a crossbar switch architecture is suitable for a heterogeneous multiprocessor platform. The platform is used to verify the function and performance of a DVB-T baseband receiver using hardware and software partitioning techniques with a seamless hardware/software co-verification tool. We present a dual-processor platform with an ARM926 and a Teak DSP, but it cannot satisfy the standard specification of EN 300 744 of DVB-T ETSI. Therefore, we propose a new multiprocessor strategy with an ARM926 and three Teak DSPs synchronized at 166 MHz to satisfy the required specification of DVB-T.

  • PDF

A 7.8Gbps pipelined LEA crypto-processor (7.8Gbps 파이프라인 LEA 크립토 프로세서)

  • Sung, Mi-ji;Shin, Kyung-wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.157-159
    • /
    • 2016
  • 3가지 마스터키 길이 128/192/256 비트를 지원하는 파이프라인 LEA(Lightweight Encryption Algorithm) 크립토 프로세서를 설계하였다. 높은 처리율을 얻기 위해 16개의 라운드 스테이지가 파이프라인 방식으로 동작하며, 각 라운드 스테이지는 128비트 데이터패스를 갖도록 설계하였다. 설계된 LEA 프로세서는 FPGA 구현을 통해 하드웨어 동작을 검증하였다. Xilinx ISE로 합성한 결과, 최대 동작주파수 122MHz로 동작하여 7.8Gbps의 성능을 갖는 것으로 평가되었다.

  • PDF

High Performance Coprocessor Architecture for Real-Time Dense Disparity Map (실시간 Dense Disparity Map 추출을 위한 고성능 가속기 구조 설계)

  • Kim, Cheong-Ghil;Srini, Vason P.;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.14A no.5
    • /
    • pp.301-308
    • /
    • 2007
  • This paper proposes high performance coprocessor architecture for real time dense disparity computation based on a phase-based binocular stereo matching technique called local weighted phase-correlation(LWPC). The algorithm combines the robustness of wavelet based phase difference methods and the basic control strategy of phase correlation methods, which consists of 4 stages. For parallel and efficient hardware implementation, the proposed architecture employs SIMD(Single Instruction Multiple Data Stream) architecture for each functional stage and all stages work on pipelined mode. Such that the newly devised pipelined linear array processor is optimized for the case of row-column image processing eliminating the need for transposed memory while preserving generality and high throughput. The proposed architecture is implemented with Xilinx HDL tool and the required hardware resources are calculated in terms of look up tables, flip flops, slices, and the amount of memory. The result shows the possibility that the proposed architecture can be integrated into one chip while maintaining the processing speed at video rate.

Efficient Hardware Support: The Lock Mechanism without Retry (하드웨어 지원의 재시도 없는 잠금기법)

  • Kim Mee-Kyung;Hong Chul-Eui
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.9
    • /
    • pp.1582-1589
    • /
    • 2006
  • A lock mechanism is essential for synchronization on the multiprocessor systems. The conventional queuing lock has two bus traffics that are the initial and retry of the lock-read. %is paper proposes the new locking protocol, called WPV (Waiting Processor Variable) lock mechanism, which has only one lock-read bus traffic command. The WPV mechanism accesses the shared data in the initial lock-read phase that is held in the pipelined protocol until the shared data is transferred. The nv mechanism also uses the cache state lock mechanism to reduce the locking overhead and guarantees the FIFO lock operations in the multiple lock contentions. In this paper, we also derive the analytical model of WPV lock mechanism as well as conventional memory and cache queuing lock mechanisms. The simulation results on the WPV lock mechanism show that about 50% of access time is reduced comparing with the conventional queuing lock mechanism.

Single chip multi-function peripheral image processor with unified binarization architecture (통합된 이진화 구조를 가진 복합기용 1-Chip 영상처리 프로세서의 개발)

  • Park, Chang-Dae;Lee, Eul-Hwan;Kim, Jae-Ho
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.11
    • /
    • pp.34-43
    • /
    • 1999
  • A high-speed image processor (HIP) is implemented for a high-speed multi-function peripheral. HIP has a binarization architecture with unified data path. It has the pixel-by-pixel pipelined processing to minimize size of the external memory. It performs pre-processing such as shading correction, automatic gain control (AGC), and gamma correction, and also drives external CCD or CIS modules. The pre-processed data can be enlarged or reduced. Various binarizatin algorithms can be processed in the unified archiecture. The embedded binarization algorithms are simple thresholding, high pass filtering, dithering, error diffusion, and thershold modulated error diffusion. These binarization algorithms are unified based on th threshold modulated error diffusion. The data path is designed to share the common functional block of the binarization algorithms. The complexity of the controls and the gate counts is greatly reduced with this novel architecture.

  • PDF

VLSI Design of H.263 Video Codec Based on Modular Architecture (모듈화된 구조에 기반한 H.263 비디오 코덱 VLSI의 설계)

  • Kim, Myung-Jin;Lee, Sang-Hee;Kim, Keun-Bae
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.5
    • /
    • pp.477-485
    • /
    • 2002
  • In this paper, we present an efficient hardware architecture for the H.263 video codec and its VLSI implementation. This architecture is based on the unified interface by which internal hardware engines and an internal RISC processor are connected one another. The unified interface enables the modular design of internal blocks, efficient hardware/software partitioning, and pipelined paralled operations. The developed VLSI supports the H.263 version 2 profile 3 @ level 10, and moreover, both the control protocol H.245 and the multiplexing protocol H.223. Therefore, it can be used for the complete ITU-T H.324 or 3GPP 3G 324M multimedia processor with the help of an external audio codec. Simultaneous encoding and decoding of QCIF format images at a rate greater than 15 frames per second is achieved at 40 MHz clock frequency.