Search | Korea Science

Performance Improvement of a VLIW ARchitecture without Pipeline-Stall during Instruction Cache Miss (명령어 캐시미스중에서도 파이프라인의 고착을 피할 수 있는 VLIW 구조의 성능향상)

Ji, Seung-Hyeon;Park, No-Gwang;Kim, Seok-Il
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.3
- /
- pp.301-312
- /
- 1999
본 논문에서는 명령어 수준의 병렬성을 다루는 세 가지 프로세서 모델을 정의하고 각 모델별로 명령어 파이프라인을 운용하는 방법에 다른 실행사이클의 변화를 연구하였다. 본 논문에서 고려한 세가지 모델은1) 긴 명령어 인출시 캐시미스가 발생하면 명령어 파이프라인이 정지되는 전통적인 VLIW 구조, 2) 전통적인 VLIW 구조와 같이 긴 명령어 인출시 캐시미스가 발생하면 명령어 파이프라인이 정지되나 실시간에 긴 명령어를 실행 유니트로 스케줄링할 수있으므로 목적 코드에서 LNOP를 제거할 수 있는 구조 및 3)2)의 구조에서 긴 명령어를 인출하는 과정에서 캐시미스가 발생하더라도 LNOP을 분석 유니트로 제공하여 명령어 파이프라인을 계속 진행시키는 구조의 세 가지이다. 연구결과, 세 번째 구조에서 발생되는 LNOP 의 수는 첫 번째 구조와 두 번째 구조에 비하여 적어서 동일한 응용 프로그램을 처리하는데 필요한 실행사이클의 수가 가장 짧았다. 여러 가지 벤치 마크들에 대한 모의 실험에서도 세 번째 구조가 다른 구조의 프로세서에 비하여 실행사이클의 수가 가장 짧음을 확인할 수 있었다.

Performance Improvement of SVLIW Architectures by Removing LNOPs from An Object Code (목적 코드에서 LNOP 코드가 제거됨에 따른 SVLIW 구조의 성능 향상)

Jeong, Bo-Yun;Jeon, Joong-Nam;Kim, Suk-Il
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.9
- /
- pp.2269-2279
- /
- 1997
SVLIW (Superscalar VLIW) processor, a family of VLIW processors schedules very long instruction words at runtime. If a very long instruction word that is to be issued occurs data dependence relations and/or resource conflicts with those words that were under execution, a long NOP word is issued instead of the word until all the data dependence relations and/or resource conflicts have been resolved. Thus, LNOPs can be removed in object codes for SVLIW processors. In this paper, we measure an improvement of the cache hit ratio caused by removing LNOPs in the object code. We also analyze an improvement of the processor performance due to higher cache hit ratio of the processor. Benchmark tests promise that the performance of SVLIW processors is improved more than 5% compared with that of traditional VLIW processors.
PDF

An Analytical Performance Model for Supercalar Processors (가변적 하드웨어 구성에 대한 수퍼스칼라 프로세서의 성능 예측 모델)

이종복
- Proceedings of the Korean Information Science Society Conference
- /
- 1999.10c
- /
- pp.24-26
- /
- 1999
본 논문에서는 주어진 윈도우에 대하여 수퍼스칼라 프로세서의 하드웨어를 구성하는 기본 요소인 인출율과 연산 유닛의 개수로 표현되는 성능 예측 모델을 제시하였다. 이때, 수퍼스칼라 프로세서에서 실행되는 벤치마크 프로그램은 매 싸이클당 각 명령어 개수가 시행되는 확률과 분기 예측 정확도에 의하여 특성화된다. 초기의 실험으로 각종 파라미터를 획득한 후에는 다양한 연산유닛과 인출율을 갖는 수퍼스칼라 프로세서의 성능을 본 논문에서 제안하는 모델에 의하여 간단하게 구할 수 있다. 명령어 자취 모의실험(trace-driven simulation)으로 측정한 성능과 본 논문에서 제안하는 성능 예측 모델에 의한 성능을 비교한 결과, 3.8%의 평균오차를 기록하였다.
PDF

Design and Performance Evaluation of Expansion Buffer Cache (확장 버퍼 캐쉬의 설계 및 성능 평가)

Hong Won-Kee
- The KIPS Transactions:PartA
- /
- v.11A no.7 s.91
- /
- pp.489-498
- /
- 2004
VLIW processor is considered to be an appropriate processor for the embedded system, provided with high performance and low power con-sumption due to its simple hardware structure. Unfortunately, the VLIW processor often suffers from high memory access latency due to the variable length of I-packets, which consist of independent instructions to be issued in parallel. It is because of the variable I-packet length that some I-packets must be placed over two cache blocks, which are called straddle I-packets, so that two cache accesses are required to fetch such I-packets. In this paper, an expansion buffer cache is proposed to improve not only the instruction fetch bandwidth, but also the power consumption of the I-cache with moderate hardware cost. The expansion buffer cache has a small expansion buffer containing a fraction of a straddle packet along with the main cache to reduce the additional cache accesses due to the straddle I-packets. With a great reduction in the cache accesses due to the straddle packets, the expansion buffer cache can achieve $5{\~}9{\%}$improvement over the conventional I-caches in the $Delay{\cdot}Power{\cdot}Area$ metric.
https://doi.org/10.3745/KIPSTA.2004.11A.7.489 인용 PDF KSCI

Design and Evaluation of Cache Structure for Semi-packed Instruction (부분 압축 명령어를 위한 캐쉬 구조의 설계 및 평가)

Hong, Won-Gi;Lee, Seung-Yeop;Kim, Sin-Deok
- Journal of KIISE:Computer Systems and Theory
- /
- v.28 no.5
- /
- pp.245-258
- /
- 2001
VLIW에서는 프로그램 코드를 병렬화 하는 작업이 모두 컴파일러에 의해서만 이루어진다. 따라서 병렬로 수행될 연산어들을 명시적으로 나타내 주어야 하며, 이를 위한 명령어 인코딩 방식으로 전개 인코딩 방식과 압축 인코딩 방식이 사용되어 왔다. 각 인코딩 방식들은 명령어의 적재 및 검색을 위해 서로 다른 캐쉬 구조를 필요로 하는데, 전개 인코딩 방식으로 비압축 캐쉬를 압축 인코딩 방식으로 압축 캐쉬를 사용하고 있다. 그러나 이들은 각각 무효 연산어로 인한 메모리 활용 효율 저하와 복원 과정으로 인한 명령어 인출 오버헤드의 증가라는 문제점을 안고 있다. 본 논문에서는 부분적으로 명령어 길이를 일정하게 유지하는 부분 압축 인코딩을 사용해 메모리 활용 효율을 높이는 동시에 명령어 인출 오버헤드를 줄일 수 있는 분할 캐쉬 구조를 제안한다. 각 캐쉬 구조를 구현하는데 필요한 칩 영역을 계산하여, 분할 캐쉬가 비교적 비용 효율적인 캐쉬 구조임을 확인하였다. 모의 실험을 통한 메모리 활용 효율 측정 결과 하드웨어 비용의 증가를 고려하더라도 분할 캐쉬는 비압축 캐쉬에 비해 최고 약 3배의 메모리 활용 효율을 얻을 수 있었다. 각 캐쉬 구조를 일차 캐쉬로 하는 VLIW 시스템들의 성능 측정 결과는 TCSC(블록 집중형 분할 캐쉬)를 사용한 시스템이 비용 대비 성능 면에서 가장 우수한 것으로 나타났다.
PDF

The Instruction Flash memory system with the high performance dual buffer system (명령어 플래시 메모리를 위한 고성능 이중 버퍼 시스템 설계)

Jung, Bo-Sung;Lee, Jung-Hoon
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.2
- /
- pp.1-8
- /
- 2011
NAND type Flash memory has performing much researches for a hard disk substitution due to its low power consumption, cheap prices and a large storage. Especially, the NAND type flash memory is using general buffer systems of a cache memory for improving overall system performance, but this has shown a tendency to emphasize in terms of data. So, our research is to design a high performance instruction NAND type flash memory structure by using a buffer system. The proposed buffer system in a NAND flash memory consists of two parts, i.e., a fully associative temporal buffer for branch instruction and a fully associative spatial buffer for spatial locality. The spatial buffer with a large fetching size turns out to be effective serial instructions, and the temporal buffer with a small fetching size can achieve effective branch instructions. According to the simulation results, we can reduce average miss ratios by around 77% and the average memory access time can achieve a similar performance compared with the 2-way, victim and fully associative buffer with two or four sizes.
https://doi.org/10.9708/jksci.2011.16.2.001 인용 PDF KSCI

Radiation-Induced Soft Error Detection Method for High Speed SRAM Instruction Cache (고속 정적 RAM 명령어 캐시를 위한 방사선 소프트오류 검출 기법)

Kwon, Soon-Gyu;Choi, Hyun-Suk;Park, Jong-Kang;Kim, Jong-Tae
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.35 no.6B
- /
- pp.948-953
- /
- 2010
In this paper, we propose multi-bit soft error detection method which can use an instruction cache of superscalar CPU architecture. Proposed method is applied to high-speed static RAM for instruction cache. Using 1D parity and interleaving, it has less memory overhead and detects more multi-bit errors comparing with other methods. It only detects occurrence of soft errors in static RAM. Error correction is treated like a cache miss situation. When soft errors are occurred, it is detected by 1D parity. Instruction cache just fetch the words from lower-level memory to correct errors. This method can detect multi-bit errors in maximum 4$\times$4 window.
PDF KSCI

VHDL Design for Out-of-Order Superscalar Processor of A Fully Pipelined Scheme (완전한 파이프라인 방식의 비순차실행 수퍼스칼라 프로세서의 VHDL 설계)

Lee, Jongbok
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.21 no.1
- /
- pp.99-105
- /
- 2021
Today, a superscalar processor is the basic unit or an essential component of a multi-core processor, SoCs, and GPUs. Hence, a high-performance out-of-order superscalar processor must be adopted for these systems to maximize its performance. The superscalar processor fetches, issues, executes, and writes back multiple instructions per cycle by utilizing reorder buffers and reservation stations to dynamically schedule instructions in a pipelined scheme. In this paper, a fully pipelined out-of-order superscalar processor with speculative execution is designed with VHDL and verified with GHDL. As a result of the simulation, the program composed of ARM instructions is successfully performed.
https://doi.org/10.7236/JIIBC.2021.21.1.99 인용 PDF KSCI HTML

Power Performance of Instruction Pre-Fetch Unit (명령어 선 인출기의 전력 성능)

송영규;오형철
- Proceedings of the IEEK Conference
- /
- 1999.06a
- /
- pp.365-368
- /
- 1999
In this paper, we investigate the effect of adopting branch-penalty compensation schemes on the power performance of TLBs(Translation Look-aside Buffers) and instruction caches. We found that the double-buffer branch-penalty compensation scheme can reduce the power consumption of the TLBs and the instruction caches considered by up to 14-21.3%. The power consumption is estimated through simulation at the architectural level, using the Kamble/Ghose method
PDF

Early Start Branch Prediction to Resolve Prediction Delay (분기 명령어의 조기 예측을 통한 예측지연시간 문제 해결)

Kwak, Jong-Wook;Kim, Ju-Hwan
- The KIPS Transactions:PartA
- /
- v.16A no.5
- /
- pp.347-356
- /
- 2009
Precise branch prediction is a critical factor in the IPC Improvement of modern microprocessor architectures. In addition to the branch prediction accuracy, branch prediction delay have a profound impact on overall system performance as well. However, it tends to be overlooked when the architects design the branch predictor. To tolerate branch prediction delay, this paper proposes Early Start Prediction (ESP) technique. The proposed solution dynamically identifies the start instruction of basic block, called as Basic Block Start Address (BB_SA), and the solution uses BB_SA when predicting the branch direction, instead of branch instruction address itself. The performance of the proposed scheme can be further improved by combining short interval hiding technique between BB_SA and branch instruction. The simulation result shows that the proposed solution hides prediction latency, with providing same level of prediction accuracy compared to the conventional predictors. Furthermore, the combination with short interval hiding technique provides a substantial IPC improvement of up to 10.1%, and the IPC is actually same with ideal branch predictor, regardless of branch predictor configurations, such as clock frequency, delay model, and PHT size.
https://doi.org/10.3745/KIPSTA.2009.16A.5.347 인용 PDF KSCI

Search Result 15, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)