• Title/Summary/Keyword: 명령어 인출 예측

Search Result 8, Processing Time 0.017 seconds

An Analytical Performance Model for Supercalar Processors (가변적 하드웨어 구성에 대한 수퍼스칼라 프로세서의 성능 예측 모델)

  • 이종복
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10c
    • /
    • pp.24-26
    • /
    • 1999
  • 본 논문에서는 주어진 윈도우에 대하여 수퍼스칼라 프로세서의 하드웨어를 구성하는 기본 요소인 인출율과 연산 유닛의 개수로 표현되는 성능 예측 모델을 제시하였다. 이때, 수퍼스칼라 프로세서에서 실행되는 벤치마크 프로그램은 매 싸이클당 각 명령어 개수가 시행되는 확률과 분기 예측 정확도에 의하여 특성화된다. 초기의 실험으로 각종 파라미터를 획득한 후에는 다양한 연산유닛과 인출율을 갖는 수퍼스칼라 프로세서의 성능을 본 논문에서 제안하는 모델에 의하여 간단하게 구할 수 있다. 명령어 자취 모의실험(trace-driven simulation)으로 측정한 성능과 본 논문에서 제안하는 성능 예측 모델에 의한 성능을 비교한 결과, 3.8%의 평균오차를 기록하였다.

  • PDF

Early Start Branch Prediction to Resolve Prediction Delay (분기 명령어의 조기 예측을 통한 예측지연시간 문제 해결)

  • Kwak, Jong-Wook;Kim, Ju-Hwan
    • The KIPS Transactions:PartA
    • /
    • v.16A no.5
    • /
    • pp.347-356
    • /
    • 2009
  • Precise branch prediction is a critical factor in the IPC Improvement of modern microprocessor architectures. In addition to the branch prediction accuracy, branch prediction delay have a profound impact on overall system performance as well. However, it tends to be overlooked when the architects design the branch predictor. To tolerate branch prediction delay, this paper proposes Early Start Prediction (ESP) technique. The proposed solution dynamically identifies the start instruction of basic block, called as Basic Block Start Address (BB_SA), and the solution uses BB_SA when predicting the branch direction, instead of branch instruction address itself. The performance of the proposed scheme can be further improved by combining short interval hiding technique between BB_SA and branch instruction. The simulation result shows that the proposed solution hides prediction latency, with providing same level of prediction accuracy compared to the conventional predictors. Furthermore, the combination with short interval hiding technique provides a substantial IPC improvement of up to 10.1%, and the IPC is actually same with ideal branch predictor, regardless of branch predictor configurations, such as clock frequency, delay model, and PHT size.

A Power-aware Branch Predictor for Embedded Processors (내장형 프로세서를 위한 저전력 분기 예측기 설계 기법)

  • Kim, Cheol-Hong;Song, Sung-Gun
    • The KIPS Transactions:PartA
    • /
    • v.14A no.6
    • /
    • pp.347-356
    • /
    • 2007
  • In designing a branch predictor, in addition to accuracy, microarchitects should consider power consumption, especially for embedded processors. This paper proposes a power-aware branch predictor, which is based on the gshare predictor, by accessing the BTB (Branch Target Buffer) only when the prediction from the PHT (Pattern History Table) is taken. To enable the selective access to the BTB, the PHT in the proposed branch predictor is accessed one cycle earlier than the traditional PHT to prevent the additional delay. As a side effect, two predictions from the PHT are obtained through one access to the PHT, which leads to more power savings. The proposed branch predictor reduces the power consumption, not requiring any additional storage arrays, not incurring additional delay (except just one MUX delay) and never harming accuracy. Simulation results show that the proposed predictor reduces the power consumption by $35{\sim}48%$ compared to the traditional predictor.

Branch Prediction Latency Hiding Scheme using Branch Pre-Prediction and Modified BTB (분기 선예측과 개선된 BTB 구조를 사용한 분기 예측 지연시간 은폐 기법)

  • Kim, Ju-Hwan;Kwak, Jong-Wook;Jhon, Chu-Shik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.1-10
    • /
    • 2009
  • Precise branch predictor has a profound impact on system performance in modern processor architectures. Recent works show that prediction latency as well as prediction accuracy has a critical impact on overall system performance as well. However, prediction latency tends to be overlooked. In this paper, we propose Branch Pre-Prediction policy to tolerate branch prediction latency. The proposed solution allows that branch predictor can proceed its prediction without any information from the fetch engine, separating the prediction engine from fetch stage. In addition, we propose newly modified BTE structure to support our solution. The simulation result shows that proposed solution can hide most prediction latency with still providing the same level of prediction accuracy. Furthermore, the proposed solution shows even better performance than the ideal case, that is the predictor which always takes a single cycle prediction latency. In our experiments, IPC improvement is up to 11.92% and 5.15% in average, compared to conventional predictor system.

VHDL Design for Out-of-Order Superscalar Processor of A Fully Pipelined Scheme (완전한 파이프라인 방식의 비순차실행 수퍼스칼라 프로세서의 VHDL 설계)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.99-105
    • /
    • 2021
  • Today, a superscalar processor is the basic unit or an essential component of a multi-core processor, SoCs, and GPUs. Hence, a high-performance out-of-order superscalar processor must be adopted for these systems to maximize its performance. The superscalar processor fetches, issues, executes, and writes back multiple instructions per cycle by utilizing reorder buffers and reservation stations to dynamically schedule instructions in a pipelined scheme. In this paper, a fully pipelined out-of-order superscalar processor with speculative execution is designed with VHDL and verified with GHDL. As a result of the simulation, the program composed of ARM instructions is successfully performed.

Filter Cache Predictor Using Mode Selection Bit (모드 선택 비트를 사용한 필터 캐시 예측기)

  • Kwak, Jong-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.5
    • /
    • pp.1-13
    • /
    • 2009
  • Filter cache has been introduced as one solution of reducing cache power consumption. More than 50% of the power reduction results from the filter cache, whereas more than 20% of the performance is compromised. To minimize the performance degradation of the filter cache, the predictive filter cache has been proposed. In this paper, we review the previous filter cache predictors and analyze the problems of the solutions. As a result, we found main problems that cause prediction misses in previous filter cache schemes and, to resolve the problems, this paper proposes a new prediction policy. In our scheme, some reference bit entries, called MSBs, are inserted into filter cache and BTB, to adaptively control the filter cache access. In simulation parts, we use a modified SimpleScalar simulator with MiBench benchmark programs to verify the proposed filter cache. The simulation result shows in average 5% performance improvement, compared to previous ones.

Prediction Accuracy Enhancement of Function Return Address via RAS Pollution Prevention (RAS 오염 방지를 통한 함수 복귀 예측 정확도 향상)

  • Kim, Ju-Hwan;Kwak, Jong-Wook;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.3
    • /
    • pp.54-68
    • /
    • 2011
  • As the prediction accuracy of conditional branch instruction is increased highly, the importance of prediction accuracy for unconditional branch instruction is also increased accordingly. Except the case of RAS(Return Address Stack) overflow, the prediction accuracy of function return address should be 100% theoretically. However, there exist some possibilities of miss-predictions for RAS return addresses, when miss-speculative execution paths are invalidated, in case of modern speculative microprocessor environments. In this paper, we propose the RAS rename technique to prevent RAS pollution, results in the reduction of RAS miss-prediction. We divide a RAS stack into a soft-stack and a hard-stack and we handle the instructions for speculative execution in the soft-stack. When some overwrites happen in the soft-stack, we move the soft-stack data into the hard-stack. In addition, we propose an enhanced version of RAS rename scheme. In simulation results, our solution provide 1/90 reduction of miss-prediction of function return address, results in up to 6.85% IPC improvement, compared to normal RAS method. Furthermore, it reduce miss-prediction ratio as 1/9, compared to previous technique.

A Hardware Cache Prefetching Scheme for Multimedia Data with Intermittently Irregular Strides (단속적(斷續的) 불규칙 주소간격을 갖는 멀티미디어 데이타를 위한 하드웨어 캐시 선인출 방법)

  • Chon Young-Suk;Moon Hyun-Ju;Jeon Joongnam;Kim Sukil
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.11
    • /
    • pp.658-672
    • /
    • 2004
  • Multimedia applications are required to process the huge amount of data at high speed in real time. The memory reference instructions such as loads and stores are the main factor which limits the high speed execution of processor. To enhance the memory reference speed, cache prefetch schemes are used so as to reduce the cache miss ratio and the total execution time by previously fetching data into cache that is expected to be referenced in the future. In this study, we present an advanced data cache prefetching scheme that improves the conventional RPT (reference prediction table) based scheme. We considers the cache line size in calculation of the address stride referenced by the same instruction, and enhances the prefetching algorithm so that the effect of prefetching could be maintained even if an irregular address stride is inserted into the series of uniform strides. According to experiment results on multimedia benchmark programs, the cache miss ratio has been improved 29% in average compared to the conventional RPT scheme while the bus usage has increased relatively small amount (0.03%).