• Title/Summary/Keyword: Pipeline computing

Search Result 57, Processing Time 0.023 seconds

Design of Pipeline-based Failure Recovery Method for VOD Server (파이프라인 개념을 이용한 VOD 서버의 장애 복구 방법 연구)

  • Lee, Joa-Hyoung;Park, Chong-Myoung;Jung, In-Bum
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.5
    • /
    • pp.942-947
    • /
    • 2008
  • A cluster server usually consists of a front end node and multiple backend nodes. Though increasing the number of bookend nodes can result in the more QoS(Quality of Service) streams for clients, the possibility of failures in backend nodes is proportionally increased. The failure causes not only the stop of all streaming service but also the loss of the current playing positions. In this paper, when a backend node becomes a failed state, the recovery mechanisms are studied to support the unceasing streaming service. The basic techniques are hewn as providing very high speed data transfer rates suitable for the video streaming. However, without considering the architecture of cluster-based VOD server, the application of these basic techniques causes the performance bottleneck of the internal network for recovery and also results in the inefficiency CPU usage of backend nodes. To resolve these problems, we propose a new failure recovery mechanism based on the pipeline computing concept.

Pipeline Structured-Degree Computationless Modified Euclidean Algorithm for RS(23,17) Decoder (RS(23,17) 복호기를 위한 PS-DCME 알고리즘)

  • Kang, Sung-Jin;Hong, Dae-Ki
    • Journal of Internet Computing and Services
    • /
    • v.10 no.1
    • /
    • pp.1-9
    • /
    • 2009
  • In this paper, A pipeline structured-degree computationless modified Euclidean (PS-DCME) algorithm is proposed, which can be used for a RS(23,17) decoder for MB-OFDM system. PS-DCME algorithm requires a state machine instead of the degree computation and comparison circuits, so that the hardware complexity of the decoder can be reduced and high-speed decoder can be implemented. We have implemented a RS(23,17) decoder with PS-DCME using Verilog HDL and synthesized with Samsung 65nm library. From synthesis results, it can operate at clock frequency of 250MHz, and gate count is 19,827.

  • PDF

Enhanced Pipeline Scheduling for IA-64 (IA-64를 위한 향상된 소프트웨어 파이프라인 명령어 스케줄링)

  • Lee Jae-Mok;Moon Soo-Mook
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.826-828
    • /
    • 2005
  • 인텔의 IA-64 프로세서는 명령어 수준의 병렬수행을 지원하는 EPIC (Explicitly Parallel Instruction Computing) 구조를 채택하고 있으며 컴파일러가 순차적 코드에서 병렬 수행이 가능한 독립적인 명령어들을 스케줄링 하도록 되어있다. 본 논문에서는 IA-64 스케줄링을 위해 향상된 파이프라인 스케줄링 (Enhanced Pipeline Scheduling, EPS) 기법[1]을 적용한 결과를 소개한다. EPS는 루프수준의 병렬화를 위한 소프트웨어 파이프라이닝 (software pipelining)기법으로 전역 스케줄링 (global Scheduling) 기법을 기반으로 하고 있다. 우리는 IA-64 프로세서를 위한 공개소스 컴파일러인 ORC (Open Research Compiler)에 EPS를 구현하고 실제 프로세서인 Itanium에서 실험을 수행하였다. 상용 프로세서와 컴파일러에 구현과 튜닝을 하는 과정에서 얻은 경험을 소개하고 기존의 ORC 컴파일러와 비교하여 얻은 성능 향상을 보고하고 분석한다.

  • PDF

Numerical Computing on Graphics Hardware

  • 임인성
    • 한국가시화정보학회:학술대회논문집
    • /
    • 2004.04a
    • /
    • pp.57-63
    • /
    • 2004
  • 최근 일반 범용 PC 에 장착되고 있는 ATI 나 NVIDIA 등의 그래픽스 가속기의 성능은 수년전과 비교할 때 비교가 안 될 정도의 빠른 속도를 자랑하고 있다. 이러한 속도 향상과 함께 급격하게 일어나고 있는 변화 중의 하나는 바로 기존의 고정된 기능의 그래픽스 파이프라인(fixed-function graphics pipeline)과는 달리 프로그래머가 가속기의 기능을 자유자재로 프로그래밍할 수 있도록 해주는 프로그래밍이 가능한 파이프라인(programmable graphics pipeline)의 출현이라 할 수 있다. 이러한 가속기에 장착되고 있는 GPU (Graphics Processing Unit)는 간단한 형태의 SIMD 프로세서라 할 수 있는데, 특히 GPU 의 한 부분인 픽셀 쉐이더는 그 처리 속도가 매우 높기 때문에 이를 통하여 기존의 수치 알고리즘을 병렬화 하려는 시도가 활발히 일어나고 있다. 본 강연에서는 다양한 수치 계산을 그래픽스 가속기를 사용하여 해결하려는 시도에 대하여 간단히 살펴본다.

  • PDF

Performance Enhancement of Parallel Prime Sieving with Hybrid Programming and Pipeline Scheduling (혼합형 병렬처리 및 파이프라이닝을 활용한 소수 연산 알고리즘)

  • Ryu, Seung-yo;Kim, Dongseung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.10
    • /
    • pp.337-342
    • /
    • 2015
  • We develop a new parallelization method for Sieve of Eratosthenes algorithm, which enhances both computation speed and energy efficiency. A pipeline scheduling is included for better load balancing after proper workload partitioning. They run on multicore CPUs with hybrid parallel programming model which uses both message passing and multithreading computation. Experimental results performed on both small scale clusters and a PC with a mobile processor show significant improvement in execution time and energy consumptions.

Hierarchical Multiplexing Interconnection Structure for Fault-Tolerant Reconfigurable Chip Multiprocessor

  • Kim, Yoon-Jin
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.11 no.4
    • /
    • pp.318-328
    • /
    • 2011
  • Stage-level reconfigurable chip multiprocessor (CMP) aims to achieve highly reliable and fault tolerant computing by using interwoven pipeline stages and on-chip interconnect for communicating with each other. The existing crossbar-switch based stage-level reconfigurable CMPs offer high reliability at the cost of significant area/power overheads. These overheads make realizing large CMPs prohibitive due to the area and power consumed by heavy interconnection networks. On other hand, area/power-efficient architectures offer less reliability and inefficient stage-level resource utilization. In this paper, I propose a hierarchical multiplexing interconnection structure in lieu of crossbar interconnect to design area/power-efficient stage-level reconfigurable CMP. The proposed approach is able to keep the reliability offered by the crossbar-switch while reducing the area and power overheads. Experimental results show that the proposed approach reduces area by up to 21% and power by up to 32% when compared with the crossbar-switch based interconnection network.

QoS Guarantee in Partial Failure of Clustered VOD Server (클러스터 VOD 서버의 부분적 장애에서 QoS 보장)

  • Lee, Joa-Hyoung;Jung, In-Bum
    • The KIPS Transactions:PartC
    • /
    • v.16C no.3
    • /
    • pp.363-372
    • /
    • 2009
  • For large scale VOD service, cluster servers are spotlighted to their high performance and low cost. A cluster server usually consists of a front-end node and multiple back-end nodes. Though increasing the number of back-end nodes can result in the more QoS streams for clients, the possibility of failures in back-end nodes is proportionally increased. The failure causes not only the stop of all streaming service but also the loss of the current playing positions. In this paper, when a back-end node becomes a failed state, the recovery mechanisms are studied to support the unceasing streaming service. For the actual VOD service environment, we implement a cluster-based VOD servers composed of general PCs and adopt the parallel processing for MPEG movies. From the implemented VOD server, a video block recovery mechanism is designed on parity algorithms. However, without considering the architecture of cluster-based VOD server, the application of the basic technique causes the performance bottleneck of the internal network for recovery and also results in the inefficiency CPU usage of back-end nodes. To address these problems, we propose a new failure recovery mechanism based on the pipeline computing concept.

A Fully Programmable Shader Processor for Low Power Mobile Devices (저전력 모바일 장치를 위한 완전 프로그램 가능형 쉐이더 프로세서)

  • Jeong, Hyung-Ki;Lee, Joo-Sock;Park, Tae-Ryong;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.253-259
    • /
    • 2009
  • In this paper, we propose a novel architecture of a general graphics shader processor without a dedicated hardware. Recently, mobile devices require the high performance graphics processor as well as the small size, low power. The proposed shader processor is a GP-GPU(General-Purpose computing on Graphics Processing Units) to execute the whole OpenGL ES 2.0 graphics pipeline by using shader instructions. It does not require the separate dedicate H/W such as rasterization on this fully programmable capability. The fully programmable 3D graphics shader processor can reduce much of the graphics hardware. The chip size of the designed shader processor is reduced 60% less than the sizes of previous processors.

  • PDF

Computation-Communication Overlapping in AES-CCM Using Thread-Level Parallelism on a Multi-Core Processor (멀티코어 프로세서의 쓰레드-수준 병렬성을 활용한 AES-CCM 계산-통신 중첩화)

  • Lee, Eun-Ji;Lee, Sung-Ju;Chung, Yong-Wha;Lee, Myung-Ho;Min, Byoung-Ki
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.8
    • /
    • pp.863-867
    • /
    • 2010
  • Multi-core processors are becoming increasingly popular. As they are widely adopted in embedded systems as well as desktop PC's, many multimedia applications are being parallelized on multi-core platforms. However, it is difficult to parallelize applications with inherent data dependencies such as encryption algorithms for multimedia data. In order to overcome this limit, we propose a technique to overlap computation and communication using an otherwise idle core in this paper. In particular, we interpret the problem of multimedia computation and communication as a pipeline design problem at the application program level, and derive an optimal number of stages in the pipeline.

An Efficient Hardware Implementation of AES Rijndael Block Cipher Algorithm (AES Rijndael 블록 암호 알고리듬의 효율적인 하드웨어 구현)

  • 안하기;신경욱
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.53-64
    • /
    • 2002
  • This paper describes a design of cryptographic processor that implements the AES (Advanced Encryption Standard) block cipher algorithm, "Rijndael". An iterative looping architecture using a single round block is adopted to minimize the hardware required. To achieve high throughput rate, a sub-pipeline stage is added by dividing the round function into two blocks, resulting that the second half of current round function and the first half of next round function are being simultaneously operated. The round block is implemented using 32-bit data path, so each sub-pipeline stage is executed for four clock cycles. The S-box, which is the dominant element of the round block in terms of required hardware resources, is designed using arithmetic circuit computing multiplicative inverse in GF($2^8$) rather than look-up table method, so that encryption and decryption can share the S-boxes. The round keys are generated by on-the-fly key scheduler. The crypto-processor designed in Verilog-HDL and synthesized using 0.25-$\mu\textrm{m}$ CMOS cell library consists of about 23,000 gates. Simulation results show that the critical path delay is about 8-ns and it can operate up to 120-MHz clock Sequency at 2.5-V supply. The designed core was verified using Xilinx FPGA board and test system.