Search | Korea Science

Compiler Processor Trade-offs for Dynamic Scheduling of VLIW Instructions (VLIW명령어의 동적 스케줄링을 위한 컴파일러와 프로세서간 상호보완)

Sunghyun Jee
- Journal of KIISE:Computer Systems and Theory
- /
- v.31 no.5_6
- /
- pp.279-287
- /
- 2004
This paper describes a processor architecture, named Dynamically Instruction Scheduled VLIW (DISVLIW). The DISVLIW Processor architecture is designed for dynamic scheduling VLIW instructions using dependency information. The DISVLIW instruction format is augmented to allow dependency bit vectors to be placed in the same VLIW word. The DISVLIW processor dynamically schedules each instruction in long instructions using functional unit and dynamic scheduler pairs. Features such as explicit parallelism, balanced scheduling effort, and dynamic scheduling of VLIW instructions can be used to provide a sound frustructure for supercomputing. We simulate the DISVLIW processor architecture and show that the DISVLIW processor performs significantly better than the VLIW processor for a wide range of cache sites and across numerical benchmark applications.
PDF KSCI

Soft Error Detection for VLIW Architectures with a Variable Length Execution Set (Variable Length Execution Set을 지원하는 VLIW 아키텍처를 위한 소프트 에러 검출 기법)

Lee, Jongwon;Cho, Doosan;Paek, Yunheung
- KIPS Transactions on Computer and Communication Systems
- /
- v.2 no.3
- /
- pp.111-116
- /
- 2013
With technology scaling, soft error rate has greatly increased in embedded systems. Due to high performance and low power consumption, VLIW (Very Long Instruction Word) architectures have been widely used in embedded systems and thus many researches have been studied to improve the reliability of a system by duplicating instructions in VLIW architectures. However, existing studies have ignored the feature, called VLES (Variable Length Execution Set), which is adopted in most modern VLIW architectures to reduce code size. In this paper, we propose how to support instruction duplication in VLIW architecture with VLES. Our experimental results demonstrate that a VLIW architecture with VLES shows 64% code size decrement on average at the cost of about 4% additional cell area as compared to the case of a VLIW architecture without VLES when instruction duplication is applied to both architectures. Also, it is shown that the case with VLES does not cause extra execution time compared to the case without VLES.
https://doi.org/10.3745/KTCCS.2013.2.3.111 인용 PDF KSCI

Peak Power Minimization for Clustered VLIW Architectures (분산된 VLIW 구조에서의 최대 전력 최소화 방법)

서재원;김태환;정기석
- Journal of KIISE:Computer Systems and Theory
- /
- v.30 no.5_6
- /
- pp.258-264
- /
- 2003
VLIW architecture has emerged as one of the most effective architectures in dealing with multimedia applications. In multimedia applications, there is ample potential for parallelizing the execution of multiple operations because such applications typically have data intensive processing which often has limited data and/or control dependencies. As the degree of instruction-level parallelism increases, non-clustered VLIW architectures scale poorly because of the tremendous register port pressure. Therefore, clustered VLIW architecture is definitely preferred over non-clustered VLIW architecture when a higher degree of parallelizing is possible as in the case of multimedia processing However, having multiple clusters in an architecture implies that the amount of hardware is quite large, and therefore, power consumption becomes a very crucial issue. In this paper, we propose an algorithm to minimize the peak power consumption without incurring little or no delay penalty. The effectiveness of our algorithm has been verified by various sets of experiments, and up to 30.7% reduction in the peak power consumption is observed compared with the results that is optimized to minimize resources only.
PDF KSCI

Performance Improvement Through Aggressive Instruction Packing (적극적인 명령어 압축을 통한 성능향상)

Ji, Seung-Hyeon;Kim, Seok-Il
- The KIPS Transactions:PartA
- /
- v.9A no.2
- /
- pp.231-240
- /
- 2002
This paper proposes balancing scheduling effort more evenly between the compiler and the processor, by introducing independently scheduled VLIW instructions. Aggressively Packed VLIW (APVLIW) processor is aimed specifically at independent scheduling Very Long Instruction Word(VLIW) instructions with dependency information. The APVLIW processor independently schedules earth instruction within long instructions using functional unit and dynamic scheduler pairs. Every dynamic scheduler dynamically checks far data dependencies and resource collisions while scheduling each instruction. This scheduling is especially effective in applications containing loops. We simulate the architecture and show that the APVLIW processor performs significantly better than the VLIW processor for a wide range of cache sizes and across various numerical benchmark applications.
https://doi.org/10.3745/KIPSTA.2002.9A.2.231 인용 PDF KSCI

VLIW architecture for compensating simple bypassing paths (간단한 바이패싱 회로를 보상하는 VLIW 구조)

김석주
- Proceedings of the Korea Multimedia Society Conference
- /
- 2002.05c
- /
- pp.27-32
- /
- 2002
본 논문에서는 NOP 이 차지하는 슬롯에 의미 있는 명령어를 중복 할당하여 자료의존 관계를 해소하고 프로그램 실행 사이클을 단축시키는 명령어 중복 스케줄링 기법을 적용할 수 있는 VLIW 구조인 TiPs(Tiny Processors) 구조를 제안하였으며 TiPs는 회로의 복잡도를 증가시키지 않으면서 실행시간을 단축시켜 가상의 바이패싱 회로를 추가한 효과를 얻을 수 있다. 실험 결과 TiPs에서 명령어 중복 스케줄링 기법을 적용할 경우 8% ～ 25%의 성능 향상 효과가 있음을 알 수 있었다.
PDF

Implementation of Optimizing Compiler for Bus-based VLIW Processors (버스기반의 VLIW형 프로세서를 위한 최적화 컴파일러 구현)

Hong, Seung-Pyo;Moon, Soo-Mook
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.4
- /
- pp.401-407
- /
- 2000
Modern microprocessors exploit instruction-level parallel processing to increase the performance. Especially VLIW processors supported by the parallelizing compiler are used more and more in specific applications such as high-end DSP and graphic processing. Bus-based VLIW architecture was proposed for these specific applications and it was designed to reduce the overhead of forwarding unit and the instruction width. In this paper, a optimizing scheduling compiler developed for the proposed bus-based VLIW processor is introduced. First, the method to model interconnections between buses and resource usage patterns is described. Then, on the basis of the modeling, machine-dependent optimization techniques such as bus-to-register promotion, copy coalescing and operand substitution were implemented. Optimization techniques for general-purpose VLIW microprocessors such as selective scheduling and enhanced pipelining scheduling(EPS) were also implemented. The experiment result shows about 20% performance gain for multimedia application benchmarks.
PDF

Further Specialization of Clustered VLIW Processors: A MAP Decoder for Software Defined Radio

Ituero, Pablo;Lopez-Vallejo, Marisa
- ETRI Journal
- /
- v.30 no.1
- /
- pp.113-128
- /
- 2008
Turbo codes are extensively used in current communications standards and have a promising outlook for future generations. The advantages of software defined radio, especially dynamic reconfiguration, make it very attractive in this multi-standard scenario. However, the complex and power consuming implementation of the maximum a posteriori (MAP) algorithm, employed by turbo decoders, sets hurdles to this goal. This work introduces an ASIP architecture for the MAP algorithm, based on a dual-clustered VLIW processor. It displays the good performance of application specific designs along with the versatility of processors, which makes it compliant with leading edge standards. The machine deals with multi-operand instructions in an innovative way, the fetching and assertion of data is serialized and the addressing is automatized and transparent for the programmer. The performance-area trade-off of the proposed architecture achieves a throughput of 8 cycles per symbol with very low power dissipation.
PDF

SIMD Extended VLIW ASIP architecture (SIMD 명령어가 추가된 VLIW ASIP 프로세서)

Yang, Seungjun;Park, Sanghyun;Heo, Ingoo;Lee, Jongwon;Kim, Yongjoo;Paek, Yunheung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2010.11a
- /
- pp.1589-1590
- /
- 2010
VLIW 아키텍처는 동시에 여러 개의 명령어를 수행하면서도 상대적으로 크기가 작으며 적은 전력을 소모한다는 장점 때문에 임베디드 어플리케이션을 처리하기 위해 많이 쓰이고 있다. 본 논문에서는 SIMD 명령어를 추가한 VLIW 아키텍처를 설계함으로써 동영상 처리와 같은 미디어 어플리케이션을 효과적으로 처리할 수 있도록 하였다.
https://doi.org/10.3745/PKIPS.y2010m11a.1589 인용 PDF

A Study on Processor Monitoring for Integration Test of Flight Control Computer equipped with A Modern Processor (최신 프로세서 탑재 비행제어 컴퓨터의 통합시험을 위한 프로세서 모니터링 연구)

Lee, Cheol;Kim, Jae-Cheol;Cho, In-Jae
- Journal of Institute of Control, Robotics and Systems
- /
- v.14 no.10
- /
- pp.1081-1087
- /
- 2008
This paper describes limitations and solutions of the existing processor-monitoring concept for a military supersonics aircraft Flight Control Computer (FLCC) equipped with modern architecture processor to perform the system integration test. Safecritical FLCC integration test, which requires automatic test for thousands of test cases and real-time input/output test condition generation, depends on the processor-monitoring device called Processor Interface (PI). The PI, which relies upon on the FLCC processor's external address and data-bus data, has some limitations due to multi-fetching capability of the modern sophisticated military processors, like C6000's VLIW (Very-Long Instruction Word) architecture and PowerPC's Superscalar architecture. Several techniques for limitations were developed and proper monitoring approach was presented for modem processor-adopted FLCC system integration test.
https://doi.org/10.5302/J.ICROS.2008.14.10.1081 인용 PDF KSCI

Performance Improvement of a VLIW ARchitecture without Pipeline-Stall during Instruction Cache Miss (명령어 캐시미스중에서도 파이프라인의 고착을 피할 수 있는 VLIW 구조의 성능향상)

Ji, Seung-Hyeon;Park, No-Gwang;Kim, Seok-Il
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.3
- /
- pp.301-312
- /
- 1999
본 논문에서는 명령어 수준의 병렬성을 다루는 세 가지 프로세서 모델을 정의하고 각 모델별로 명령어 파이프라인을 운용하는 방법에 다른 실행사이클의 변화를 연구하였다. 본 논문에서 고려한 세가지 모델은1) 긴 명령어 인출시 캐시미스가 발생하면 명령어 파이프라인이 정지되는 전통적인 VLIW 구조, 2) 전통적인 VLIW 구조와 같이 긴 명령어 인출시 캐시미스가 발생하면 명령어 파이프라인이 정지되나 실시간에 긴 명령어를 실행 유니트로 스케줄링할 수있으므로 목적 코드에서 LNOP를 제거할 수 있는 구조 및 3)2)의 구조에서 긴 명령어를 인출하는 과정에서 캐시미스가 발생하더라도 LNOP을 분석 유니트로 제공하여 명령어 파이프라인을 계속 진행시키는 구조의 세 가지이다. 연구결과, 세 번째 구조에서 발생되는 LNOP 의 수는 첫 번째 구조와 두 번째 구조에 비하여 적어서 동일한 응용 프로그램을 처리하는데 필요한 실행사이클의 수가 가장 짧았다. 여러 가지 벤치 마크들에 대한 모의 실험에서도 세 번째 구조가 다른 구조의 프로세서에 비하여 실행사이클의 수가 가장 짧음을 확인할 수 있었다.

Search Result 19, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)