Search | Korea Science

Implementation of Optimizing Compiler for Bus-based VLIW Processors (버스기반의 VLIW형 프로세서를 위한 최적화 컴파일러 구현)

Hong, Seung-Pyo;Moon, Soo-Mook
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.4
- /
- pp.401-407
- /
- 2000
Modern microprocessors exploit instruction-level parallel processing to increase the performance. Especially VLIW processors supported by the parallelizing compiler are used more and more in specific applications such as high-end DSP and graphic processing. Bus-based VLIW architecture was proposed for these specific applications and it was designed to reduce the overhead of forwarding unit and the instruction width. In this paper, a optimizing scheduling compiler developed for the proposed bus-based VLIW processor is introduced. First, the method to model interconnections between buses and resource usage patterns is described. Then, on the basis of the modeling, machine-dependent optimization techniques such as bus-to-register promotion, copy coalescing and operand substitution were implemented. Optimization techniques for general-purpose VLIW microprocessors such as selective scheduling and enhanced pipelining scheduling(EPS) were also implemented. The experiment result shows about 20% performance gain for multimedia application benchmarks.
PDF

Static forwardin: an approach to reduce data hazards in VLIW processor (정적 포워딩에 의한 VLIW 프로세서의 데이터 hazard 처리)

박형준;김이섭
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.35C no.2
- /
- pp.1-9
- /
- 1998
To achieve high performance in VLIW processors, they must exploit the parallelism on application programs. Data dependency makes it difficult to find the instruction-level parallelism. Among the three kinds of data dependency, true dependency causes RAW(Read After Wirte) hazards that occur most frequently in VILW processors. Forwarding is a widely used technique to reduce the performance degradation caused by RAW hazards. However, forwarding requires too much area of the chip when it is applied to VLIW processors. In this paper, static forwarding is proposed to reduce the hardware cost of forwarding circuits. It needs an extended compiler to detect RAW hazards and control the proposed forwarding scheme via instruction. And it uses the modified register file to shrink the area of forwarding path. VLIW Processor Model is also designed to verify static forwarding. This paper describes the operation of static forwarding and the comparison with the conventional forwarding.
PDF

Performance Improvement of SVLIW Architectures by Removing LNOPs from An Object Code (목적 코드에서 LNOP 코드가 제거됨에 따른 SVLIW 구조의 성능 향상)

Jeong, Bo-Yun;Jeon, Joong-Nam;Kim, Suk-Il
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.9
- /
- pp.2269-2279
- /
- 1997
SVLIW (Superscalar VLIW) processor, a family of VLIW processors schedules very long instruction words at runtime. If a very long instruction word that is to be issued occurs data dependence relations and/or resource conflicts with those words that were under execution, a long NOP word is issued instead of the word until all the data dependence relations and/or resource conflicts have been resolved. Thus, LNOPs can be removed in object codes for SVLIW processors. In this paper, we measure an improvement of the cache hit ratio caused by removing LNOPs in the object code. We also analyze an improvement of the processor performance due to higher cache hit ratio of the processor. Benchmark tests promise that the performance of SVLIW processors is improved more than 5% compared with that of traditional VLIW processors.
PDF

Optimal Scheduling of SAD Algorithm on VLIW-Based High Performance DSP (VLIW 기반 고성능 DSP에서의 SAD 알고리즘 최적화 스케줄링)

Yu, Hui-Jae;Jung, Sou-Hwan;Chung, Sun-Tae
- The Journal of the Korea Contents Association
- /
- v.7 no.12
- /
- pp.262-272
- /
- 2007
SAD (Sum of Absolute Difference) algorithm is the most frequently executing routine in motion estimation, which is the most demanding process in motion picture encoding. To enhance the performance of motion picture encoding on a VLIW processor, an optimal implementation of SAD algorithm on VLIW processor should be accomplished. In this paper, we propose an implementation of optimal scheduling of SAD algorithm with conditional branch on a VLIW-based high performance DSP. We first transform the nested loop with conditional branch of SAD algorithm into a single loop with conditional branch which has a large enough loop body to utilize fully the ILP capability of VLIW DSP and has a conditional branch to make the escape from loop to be achieved as soon as possible. And then we apply a modulo scheduling technique to the transformed single loop. We test the proposed implementation on TMS320C6713, and analyze the code size and performance with respect to processing time. Through experiments, it is shown that the SAD implementation proposed in this paper has small code size appropriate for embedded applications, and the H.263 encoder with the proposed SAD implementation performs better than other H.263 encoder with other SAD implementations.
https://doi.org/10.5392/JKCA.2007.7.12.262 인용 PDF

A Study on software performance acceleration for improving real time constraint of a VLIW type Drone FCC (VLIW (Very Long Instruction Word) 형식 드론 FCC(Flight Control Computer)의 실시간성 개선을 위한 소프트웨어 성능 가속화 연구)

Cho, Doo-San
- Journal of the Korean Society of Industry Convergence
- /
- v.20 no.1
- /
- pp.1-7
- /
- 2017
Most conventional processors execute program instructions in a sequential manner. On the other hand, VLIW processor can execute multiple instructions at the same time. It exploits instruction level parallelism to improve system performance. To that end, program code should be rearranged to VLIW instruction format by a compiler. The compiler determine an optimal execution order of instructions of a program code. This instruction ordering is also called instruction scheduling. The scheduling is an algorithm that decides the execution order for instruction codes in loop parts of a program so that the instruction level parallelism can be maximized. In this research, we apply an existing scheduling algorithm to a VLIW FCC and describe analysis results to further improve its performance. And, we present a solution to solve some limitation of the existing scheduling technique. By using our solution, FCC's performance can be improved upto 32% compared to the existing scheduling only setting.
https://doi.org/10.21289/KSIC.2017.20.1.001 인용 PDF KSCI

An Optimizing Compiler for VLIW Microcontrollers (VLIW형 마이크로컨트롤러를 위한 최적화 컴파일러의 구현)

홍승표;문수묵
- Proceedings of the Korean Information Science Society Conference
- /
- 1998.10a
- /
- pp.759-761
- /
- 1998
90년대 중반 이후 고성능의 프로세서들은 성능 향상을 위해 명령어 수준의 병렬성을 이용하고 있다. 특히 실행화일의 호환성을 고려할 필요가 없는 마이크로컨트롤에서는 같은 하드웨어로 더 많은 함수유닛을 가질 수 있는 VLIW 구조가 널리 사용된다. 이러한 VLIW형의 마이크로컨트롤러에서는 병렬성을 추출하는 역할이 전적으로 소프트웨어에 있으므로 컴파일어가 성능향상에 매우 큰 영향을 미치게 된다. 본 논문에서는 마이크로컨트롤러의 구조와 그룹짓기 조건을 분석하고 선택 스케쥴링과 소프트웨어 파이프라이닝을 이용한 VLIW형 마이크로컨트롤러용 최적화 컴파일러를 구현하고 그 성능을 측정한다.
PDF

VLIW architecture for compensating simple bypassing paths (간단한 바이패싱 회로를 보상하는 VLIW 구조)

김석주
- Proceedings of the Korea Multimedia Society Conference
- /
- 2002.05c
- /
- pp.27-32
- /
- 2002
본 논문에서는 NOP 이 차지하는 슬롯에 의미 있는 명령어를 중복 할당하여 자료의존 관계를 해소하고 프로그램 실행 사이클을 단축시키는 명령어 중복 스케줄링 기법을 적용할 수 있는 VLIW 구조인 TiPs(Tiny Processors) 구조를 제안하였으며 TiPs는 회로의 복잡도를 증가시키지 않으면서 실행시간을 단축시켜 가상의 바이패싱 회로를 추가한 효과를 얻을 수 있다. 실험 결과 TiPs에서 명령어 중복 스케줄링 기법을 적용할 경우 8% ～ 25%의 성능 향상 효과가 있음을 알 수 있었다.
PDF

SIMD Extended VLIW ASIP architecture (SIMD 명령어가 추가된 VLIW ASIP 프로세서)

Yang, Seungjun;Park, Sanghyun;Heo, Ingoo;Lee, Jongwon;Kim, Yongjoo;Paek, Yunheung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2010.11a
- /
- pp.1589-1590
- /
- 2010
VLIW 아키텍처는 동시에 여러 개의 명령어를 수행하면서도 상대적으로 크기가 작으며 적은 전력을 소모한다는 장점 때문에 임베디드 어플리케이션을 처리하기 위해 많이 쓰이고 있다. 본 논문에서는 SIMD 명령어를 추가한 VLIW 아키텍처를 설계함으로써 동영상 처리와 같은 미디어 어플리케이션을 효과적으로 처리할 수 있도록 하였다.
https://doi.org/10.3745/PKIPS.y2010m11a.1589 인용 PDF

An Effective Parallel and Pipelined Algorithm with Minimum Delayed Time in VLIW System (VLIW 시스템에서의 최소 시간 지연을 갖는 효율적인 병렬 파이프라인 알고리즘)

Seo, Jang-Won;Song, Jin-Hui;Ryu, Cheon-Yeol;Jeon, Mun-Seok
- The Transactions of the Korea Information Processing Society
- /
- v.2 no.4
- /
- pp.466-476
- /
- 1995
This pater describes pipelining algorithm issues for a VLIW(Very Long Instruction Word) System and the effective pipelined processing method by occurrence in pipelined management of processor minimized to timing delay. The proposed algorithm is executed in pipeline and parallel processings, and by combining basic operations variable instruction set can be desinged for various applications. In this paper, we prove and analyze the efficiency of the proposed pipeline algorithm and compare with other processor pipeline algorithm in terms of time minimizing.
PDF

Performance Improvement of a VLIW ARchitecture without Pipeline-Stall during Instruction Cache Miss (명령어 캐시미스중에서도 파이프라인의 고착을 피할 수 있는 VLIW 구조의 성능향상)

Ji, Seung-Hyeon;Park, No-Gwang;Kim, Seok-Il
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.3
- /
- pp.301-312
- /
- 1999
본 논문에서는 명령어 수준의 병렬성을 다루는 세 가지 프로세서 모델을 정의하고 각 모델별로 명령어 파이프라인을 운용하는 방법에 다른 실행사이클의 변화를 연구하였다. 본 논문에서 고려한 세가지 모델은1) 긴 명령어 인출시 캐시미스가 발생하면 명령어 파이프라인이 정지되는 전통적인 VLIW 구조, 2) 전통적인 VLIW 구조와 같이 긴 명령어 인출시 캐시미스가 발생하면 명령어 파이프라인이 정지되나 실시간에 긴 명령어를 실행 유니트로 스케줄링할 수있으므로 목적 코드에서 LNOP를 제거할 수 있는 구조 및 3)2)의 구조에서 긴 명령어를 인출하는 과정에서 캐시미스가 발생하더라도 LNOP을 분석 유니트로 제공하여 명령어 파이프라인을 계속 진행시키는 구조의 세 가지이다. 연구결과, 세 번째 구조에서 발생되는 LNOP 의 수는 첫 번째 구조와 두 번째 구조에 비하여 적어서 동일한 응용 프로그램을 처리하는데 필요한 실행사이클의 수가 가장 짧았다. 여러 가지 벤치 마크들에 대한 모의 실험에서도 세 번째 구조가 다른 구조의 프로세서에 비하여 실행사이클의 수가 가장 짧음을 확인할 수 있었다.

Search Result 54, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)