• Title/Summary/Keyword: VLIW

Search Result 54, Processing Time 0.027 seconds

Compiler Processor Trade-offs for Dynamic Scheduling of VLIW Instructions (VLIW명령어의 동적 스케줄링을 위한 컴파일러와 프로세서간 상호보완)

  • Sunghyun Jee
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.5_6
    • /
    • pp.279-287
    • /
    • 2004
  • This paper describes a processor architecture, named Dynamically Instruction Scheduled VLIW (DISVLIW). The DISVLIW Processor architecture is designed for dynamic scheduling VLIW instructions using dependency information. The DISVLIW instruction format is augmented to allow dependency bit vectors to be placed in the same VLIW word. The DISVLIW processor dynamically schedules each instruction in long instructions using functional unit and dynamic scheduler pairs. Features such as explicit parallelism, balanced scheduling effort, and dynamic scheduling of VLIW instructions can be used to provide a sound frustructure for supercomputing. We simulate the DISVLIW processor architecture and show that the DISVLIW processor performs significantly better than the VLIW processor for a wide range of cache sites and across numerical benchmark applications.

Soft Error Detection for VLIW Architectures with a Variable Length Execution Set (Variable Length Execution Set을 지원하는 VLIW 아키텍처를 위한 소프트 에러 검출 기법)

  • Lee, Jongwon;Cho, Doosan;Paek, Yunheung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.3
    • /
    • pp.111-116
    • /
    • 2013
  • With technology scaling, soft error rate has greatly increased in embedded systems. Due to high performance and low power consumption, VLIW (Very Long Instruction Word) architectures have been widely used in embedded systems and thus many researches have been studied to improve the reliability of a system by duplicating instructions in VLIW architectures. However, existing studies have ignored the feature, called VLES (Variable Length Execution Set), which is adopted in most modern VLIW architectures to reduce code size. In this paper, we propose how to support instruction duplication in VLIW architecture with VLES. Our experimental results demonstrate that a VLIW architecture with VLES shows 64% code size decrement on average at the cost of about 4% additional cell area as compared to the case of a VLIW architecture without VLES when instruction duplication is applied to both architectures. Also, it is shown that the case with VLES does not cause extra execution time compared to the case without VLES.

Hardware Design of VLIW coprocessor for Computer Vision Application (컴퓨터 비전 응용을 위한 VLIW 보조프로세서의 하드웨어 설계)

  • Choi, Byeong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.9
    • /
    • pp.2189-2196
    • /
    • 2014
  • In this paper, a VLIW(Very Long Instruction Word) vision coprocessor which can efficiently accelerate computer vision algorithm for automotive is designed. The VLIW coprocessor executes four instructions per clock cycle via 8-stage pipelined structure and has 36 integer and floating-point instructions to accelerate computer vision algorithm for pedestrian detection. The processor has about 300-MHz operating frequency and about 210,900 gates under 45nm CMOS technology and its estimated performance is 1.2 GOPS(Giga Operations Per Second). The vision system composed of vision primitive engine and eight VLIW coprocessors can execute pedestrian detection at 25~29 frames per second(FPS). Because the VLIW coprocessor has high detection rate and loosely coupled interface with host processor, it can be efficiently applicable to a wide range of vision applications.

Operation Rearrangement for Low-Power VLIW Instruction Fetches (저전력 VLIW 명령어 추출을 위한 연산재배치 기법)

  • Sin, Dong-Gun;Kim, Ji-Hong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.10
    • /
    • pp.530-540
    • /
    • 2001
  • As mobile applications are required to handle more computing-intensive tasks, many mobile devices are designed using VLIW processors for high performance. In VLIW machines where a single instruction contains multiple operations, the power consumption during instruction fetches varies significantly depending on how the operations are arranged within the instruction. In this paper, we describe a post-pass optimal operation rearrangement method for low-power VLIW instruction fetch, The proposed method modifies operation placement orders within VLIW instructions so that the switching activity between successive instruction fetches is minimized. Our experiment shows that the switching activity can be 34% on average fro benchmark programs.

  • PDF

Performance Improvement Through Aggressive Instruction Packing (적극적인 명령어 압축을 통한 성능향상)

  • Ji, Seung-Hyeon;Kim, Seok-Il
    • The KIPS Transactions:PartA
    • /
    • v.9A no.2
    • /
    • pp.231-240
    • /
    • 2002
  • This paper proposes balancing scheduling effort more evenly between the compiler and the processor, by introducing independently scheduled VLIW instructions. Aggressively Packed VLIW (APVLIW) processor is aimed specifically at independent scheduling Very Long Instruction Word(VLIW) instructions with dependency information. The APVLIW processor independently schedules earth instruction within long instructions using functional unit and dynamic scheduler pairs. Every dynamic scheduler dynamically checks far data dependencies and resource collisions while scheduling each instruction. This scheduling is especially effective in applications containing loops. We simulate the architecture and show that the APVLIW processor performs significantly better than the VLIW processor for a wide range of cache sizes and across various numerical benchmark applications.

Peak Power Minimization for Clustered VLIW Architectures (분산된 VLIW 구조에서의 최대 전력 최소화 방법)

  • 서재원;김태환;정기석
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.5_6
    • /
    • pp.258-264
    • /
    • 2003
  • VLIW architecture has emerged as one of the most effective architectures in dealing with multimedia applications. In multimedia applications, there is ample potential for parallelizing the execution of multiple operations because such applications typically have data intensive processing which often has limited data and/or control dependencies. As the degree of instruction-level parallelism increases, non-clustered VLIW architectures scale poorly because of the tremendous register port pressure. Therefore, clustered VLIW architecture is definitely preferred over non-clustered VLIW architecture when a higher degree of parallelizing is possible as in the case of multimedia processing However, having multiple clusters in an architecture implies that the amount of hardware is quite large, and therefore, power consumption becomes a very crucial issue. In this paper, we propose an algorithm to minimize the peak power consumption without incurring little or no delay penalty. The effectiveness of our algorithm has been verified by various sets of experiments, and up to 30.7% reduction in the peak power consumption is observed compared with the results that is optimized to minimize resources only.

A Parallelising Algortithm for Matrix Arithmetics of Digital Signal Processings on VLIW Simulator (VLIW 시뮬레이터 상에서의 디지털 신호처리 행렬 연산에 대한 병렬화 알고리즘)

  • Song, Jin-Hee;Jun, Moon-Seog
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.8
    • /
    • pp.1985-1996
    • /
    • 1998
  • A parallelising algorithm for partitioning and mapping methods of matrix/vector multiplication into linear processor array/VLW simulator is presented in this paper. First we discuss the mapping methods for input matrix or vector into the arbitrarily size of processor arrays. Then, we show partitioning the algorithmss of the large size of computational problem into the size of the processor array. We execute the algorithm on VLIW simuhator and show to effectiviness of algorithm. The result which we achived better parallelising performance on our VLIW simulator dsign than on linear processor array.

  • PDF

Performance Analysis of Caching Instructions on SVLIW Processor and VLIW Processor (SVLIW 프로세서와 VLIW 프로세서의 명령어 캐싱에 따른 성능 분석)

  • Ji, Sung-Hyun;Park, No-Kwang;Kim, Suk-Il
    • Journal of IKEEE
    • /
    • v.1 no.1 s.1
    • /
    • pp.101-110
    • /
    • 1997
  • SVLIW processor architectures can resolve resource collisions and data dependencies between the instructions while scheduling VLIW instructions at run-time. As a result, long NOP word instructions can be removed from the object code produced for the processor. Thus, the occurrence of cache misses on the SVLIW processor would be lesser than that on the same cache size VLIW processor. Less frequent cache misses on the SVLIW processor would incur less frequent memory access, and thus, the total execution cycles to complete an application would be shortened compared with cases on the VLIW processor. Such a feature eventually compromises effects of longer instruction pipeline stages than those of the VLIW processor. In this paper, we formulate and compare two execution cycle models of the two architectures. A simulation results show that the longer memory access cycles when cache miss occurs, the total execution cycles of SVLIW processor would be shorter than those of VLIW processor.

  • PDF

Implementation of a Compiler for VLIW rchitecture (VLIW 구조를 위한 컴파일러의 구현)

  • Choe, Seong-Uk;Kim, Gyeong-Hun;Park, Myeong-Sun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.1
    • /
    • pp.109-121
    • /
    • 1999
  • VLIW(Very Long Instruction Word)기술을 이용한 프로세서는 최근에 다른 어떠한 형태의 프로세서보다 좋은 성능을 보일 것으로 기대되고 있다. 컴파일러가 전역적인 분석을 진행하여 명령어 수준의 병렬성을 , VLIW 구조를 위한 많은 컴파일 기술이 연구되어왔다. 컴파일 기술의 연구에 대해 보다 신뢰성 있는 결과를 얻기 위해서는 자신의 새로운 기술이 첨가될 수 있는 기본 토대로서 VLIW 컴파일러 및 실험환경을 구축하는 것이 필요하다. 본 논문에서는 VLIW 프로세서를 위해 GURPR을 기반으로 한 소프트웨어 파이프라이닝등 기존의 병렬성 증진 최적화 기법등을 포함한 병렬화 컴파일러를 개발하였고, 시뮬레이터 환경에서 테스트하였다. 실험 결과, 몇몇 벤치마크는 최대 30% 까지 실행시간이 시간이 단축될 수 있음을 보였다. 본 컴파일러 시스템은 컴파일링 기술에 대한 연구에 있어 기존 모듈을 개선하는 등에 대해 많은 도움을 줄 것이며 향후 새로운 연구결과와 구현이 본 컴파일러 환경에 추가되어 성능 향상 정도를 실험할 수 있을 것으로 기대하고 있다.

GCC based Compiler Construction for Compact DSP32

  • Cho, Myeong-Jin;Lee, Ho-Kyoon;Huong, Giang Nguyen Thi;Kim, Seon-Wook;Han, Young-Sun;Um, Jung-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.43-45
    • /
    • 2011
  • Very Long Instruction Word (VLIW) executes multiple instructions in parallel. In order to exploit higher performance, i.e., higher parallelism, VLIW compiler groups as many instructions into one word as possible. In this paper, we show how to construct a VLIW C compiler based on GCC for CDSP32 (Compact Digital Signal Processor 32-bit) which is an embedded DSP processor to issue two instructions in one VLIW. Also, we evaluated the compiler on EEMBC benchmark; the experiment result showed that the total number of dynamic instructions of the VLIW compiler was reduced by 18% on average over without VLIW instruction scheduling.