• Title/Summary/Keyword: 슈퍼스칼라 프로세서

Search Result 44, Processing Time 0.027 seconds

Hybrid Value Predictor using Dynamic Classification (동적 분류를 이용한 하이브리드 결과 값 예측기)

  • Sin, Yeong-Ho;Yun, Seong-Ryong;Jo, Yeong-Il
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.11
    • /
    • pp.899-907
    • /
    • 2000
  • 슈퍼스칼라 프로세서의 성능을 향상시키기 위해서는 데이터 종속성에 의한 장애를 제거해야 한다. 최근 여러 논문들은 이러한 데이터 종속성을 제거하기 위해서 명령어의 결과 값을 예상하는 메커니즘을 제안하였다. 이러한 예상 메커니즘 중 여러 예측기를 혼합해서 사용하는 하이브리드 방법은 각 하나의 예측기만을 사용하는 방법보다 더 좋은 성능을 얻을 수 있다. 그러나 그러한 하이브리드 예측기는 명령어를 중복해서 저장하여 많은 하드웨으 크기를 요구한다. 본 논문에서는 여러 예측기의 장점을 이용하여 높은 성능을 얻을 수 있는 새로운 하이브리드 예측 메커니즘을 제안한다. 또한 예상이 자주 틀리는 명령어를 동적으로 찾아내어 예상하지 않음으로서 잘못 예상시 발생하는 misprediction 페널티를 낮추고 예상 정확도를 높인다. 시뮬레이션 결과 SPECint95 벤치마크프로그램에 대해 제안한 하이브리드 예측기에서 예측율은 평균 79%에서 90%로 향상하였고, misprediction rate는 평균 12%에서 2%로 낮추었다.

  • PDF

Attributed AND-OR Graph for Synthesis of Superscalar Processor Simulator (슈퍼스칼라 프로세서 시뮬레이터의 생성을 위한 Attributed AND-OR 그래프)

  • Jun Kyoung Kim;Tag Gon Kim
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2003.06a
    • /
    • pp.73-78
    • /
    • 2003
  • This paper proposes the simulator synthesis scheme which is based on the exploration of the total design space in attributed AND-OR graph. Attributed AND-OR graph is a systematic design space representation formalism which enables to represent all the design space by decomposition rule and specialization rule. In addition, attributes attached to the design entity provides flexible modeling. Based on this design space representation scheme, a pruning algorithm which can transform the total design space into sub-design space that satisfies the user requirements is given. We have shown the effectiveness of our framework by (ⅰ) constructing the design space of superscalar processor in attributed AND-OR graph (ⅱ) pruning it to obtain the ARM9 processor architecture. (ⅲ) modeling the components of the architecture and (ⅳ) simulating the ARM9 model.

  • PDF

A Hybrid Value Predictor Using Static and Dynamic Classification in Superscalar Processors (슈퍼스칼라 프로세서에서 정적 및 동적 분류를 사용한 혼합형 결과 간 예측기)

  • 김주익;박홍준;고광현;조영일
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.682-684
    • /
    • 2002
  • 최근 여러 논문에서 실 데이터 종속을 제거하기 위하여 결과 값 예상 기법을 제안하였다. 결과 값 예상 기법 중 혼합형 결과 값 예측기는 다양한 패턴을 갖는 명령어를 모두 예측함으로써 높은 예상 정확도를 얻을 수 있지만 하나의 명령어가 여러 개의 예측기 테이블에 중복 저장되어 높은 하드웨어 비용을 요구한다는 단점이 있다. 본 논문에서는 이러한 단점을 극복하기 위하여 프로파일링으로 얻어진 정적 분류 정보를 사용하여, 명령어률 예상 정확도가 높은 예측기에만 할당하여 예상 테이블 크기를 감소 시켰다. 또한 동적으로 적절한 예측기를 선택하도록 함으로써 예상 정확도를 더욱 향상 시켰다. 본 논문에서는 SPECint95 벤치마크 프로그램에 대해 SimpleScalar/PISA 3.0 툴셋을 사용하여 실험하였다. 정적-동적 분류 정보를 모두 사용하였을 경우 87.9%, VHT 크기를 4K로 축소한 경우 87.5%로 비슷한 예상정확도를 얻으면서 예상 테이블의 크기는 50%로 감소하였다. 또한 실행 패턴의 유형 비율에 따라 각 예측기의 VHT를 구성한 경우 예상 테이블 크기를 25%로 줄일 수 있었다.

  • PDF

Hybrid Dynamic Branch Prediction to Reduce Destructive Aliasing (슈퍼스칼라 프로세서를 위한 고성능 하이브리드 동적 분기 예측)

  • Park, Jongsu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1734-1737
    • /
    • 2019
  • This paper presents a prediction structure with a Hybrid Dynamic Branch Prediction (HDBP) scheme which decreases the number of stalls. In the application, a branch history register is dynamically adjusted to produce more unique index values of pattern history table (PHT). The number of stalls is also reduced by using the modified gshare predictor with a long history register folding scheme. The aliasing rate decreased to 44.1% and the miss prediction rate decreased to 19.06% on average compared with the gshare branch predictor, one of the most popular two-level branch predictors. Moreover, Compared with the gshare, an average improvement of 1.28% instructions per cycle (IPC) was achieved. Thus, with regard to the accuracy of branch prediction, the HDBP is remarkably useful in boosting the overall performance of the superscalar processor.

A Branch Misprediction Recovery Mechanism by Control Independence (제어 독립성과 분기예측 실패 복구 메커니즘)

  • Ko, Kwang-Hyun;Cho, Young-Il
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.14 no.1
    • /
    • pp.3-22
    • /
    • 2012
  • Control independence has been put forward as a significant new source of instruction-level parallelism for superscalar processors. In branch prediction mechanisms, all instructions after a mispredicted branch have to be squashed and then instructions of a correct path have to be re-fetched and re-executed. This paper presents a new branch misprediction recovery mechanism to reduce the number of instructions squashed on a misprediction. Detection of control independent instructions is accomplished with the help of the static method using a profiling and the dynamic method using a control flow of program sequences. We show that the suggested branch misprediction recovery mechanism improves the performance by 2~7% on a 4-issue processor, by 4~15% on an 8-issue processor and by 8~28% on a 16-issue processor.

A Hybrid Value Predictor using Static and Dynamic Classification in Superscalar Processors (슈퍼스칼라 프로세서에서 정적 및 동적 분류를 사용한 혼합형 결과 값 예측기)

  • 김주익;박홍준;조영일
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.10
    • /
    • pp.569-578
    • /
    • 2003
  • Data dependencies are one of major hurdles to limit ILP(Instruction Level Parallelism), so several related works have suggested that the limit imposed by data dependencies can be overcome to some extent with use of the data value prediction. Hybrid value predictor can obtain the high prediction accuracy using advantages of various predictors, but it has a defect that same instruction has overlapping entries in all predictor. In this paper, we propose a new hybrid value predictor which achieves high performance by using the information of static and dynamic classification. The proposed predictor can enhance the prediction accuracy and efficiently decrease the prediction table size of predictor, because it allocates each instruction into single best-suited predictor during the fetch stage by using the information of static classification. Also, it can enhance the prediction accuracy because it selects a best- suited prediction method for the “Unknown”pattern instructions by using the dynamic classification mechanism. Simulation results based on the SimpleScalar/PISA tool set and the SPECint95 benchmarks show the average correct prediction rate of 85.1% by using the static classification mechanism. Also, we achieve the average correction prediction rate of 87.6% by using static and dynamic classification mechanism.

A Predicate-Sensitive Scheduling Algorithm in Instruction-Level Parallelism Processors (ILP 프로세서를 위한 조건실행 지원 스케쥴링 알고리즘)

  • Yoo, Byung-Kang;Lee, Sang-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.1
    • /
    • pp.202-214
    • /
    • 1998
  • Exploitation of instruction-level parallelism(ILP) is an effective mechanism for improving the performance of modern super-scalar and VLIW processors. Various software techniques can be applied to increase ILP. Among these techniques, predicated execution is the one that increases the degree of ILP by allowing instructions from different basic blocks to be converted to a single basic block by removing branch instructions. In this paper, a global predicate-sensitive scheduling algorithm is proposed to improve the performance for ILP processors that support predicated execution. In order to examine the performance of proposed algorithm, a C compiler and a simulator are developed. By simulating various benchmark programs with the compiler and the simulator, the performance results of this algorithm are measured and the effectiveness of the algorithm is verified. As a result of measure performance with I, 2, 4 issue execution, this study was confirmed average performance by 20% or more.

  • PDF

Sepculative Updates of a Stride Value Predictor in Wide-Issue Processors (와이드 이슈 프로세서를 위한 스트라이드 값 예측기의 모험적 갱신)

  • Jeon, Byeong-Chan;Lee, Sang-Jeong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.11
    • /
    • pp.601-612
    • /
    • 2001
  • In superscalar processors, value prediction is a technique that breaks true data dependences by predicting the outcome of an instruction in order to exploit instruction level parallelism(ILP). A value predictor looks up the prediction table for the prediction value of an instruction in the instruction fetch stage, and updates with the prediction result and the resolved value after the execution of the instruction for the next prediction. However, as the instruction fetch and issue rates are increased, the same instruction is likely to fetch again before is has been updated in the predictor. Hence, the predictor looks up the stale value in the table and this mostly will cause incorrect value predictions. In this paper, a stride value predictor with the capability of speculative updates, which can update the prediction table speculatively without waiting until the instruction has been completed, is proposed. Also, the performance of the scheme is examined using Simplescalar simulator for SPECint95 benchmarks in which our value predictor is added.

  • PDF

Efficient CHAM-Like Structures on General-Purpose Processors with Changing Order of Operations (연산 순서 변경에 따른 범용 프로세서에서 효율적인 CHAM-like 구조)

  • Myoungsu Shin;Seonkyu Kim;Hanbeom Shin;Insung Kim;Sunyeop Kim;Donggeun Kwon;Deukjo Hong;Jaechul Sung;Seokhie Hong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.4
    • /
    • pp.629-639
    • /
    • 2024
  • CHAM is designed with an emphasis on encryption speed, considering that in the ISO/IEC standard block cipher operation mode, encryption functions are used more often than decryption functions. In the superscalar architecture of modern general-purpose processors, different ordering of operations can lead to different processing speeds, even if the computation configuration is the same. In this paper, we analyze the implementation efficiency and security of CHAM-like structures, which rearrange the order of operations in the ARX-based block cipher CHAM, for single-block and parallel implementations in a general-purpose processor environment. The proposed structures are at least 9.3% and at most 56.4%efficient in terms of encryption speed. The security analysis evaluates the resistance of the CHAM-like structures to differential and linear attacks. In terms of security margin, the difference is 3.4% for differential attacks and 6.8%for linear attacks, indicating that the security strength is similar compared to the efficiency difference. These results can be utilized in the design of ARX-based block ciphers.

Performance Improvement of Single Chip Multiprocessor using Concurrent Branch Execution (분기 동시 수행을 이용한 단일 칩 멀티프로세서의 성능 개선)

  • Lee, Seung-Ryul;Kim, Jun-Shik;Choi, Jae-Hyeok;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.2
    • /
    • pp.61-71
    • /
    • 2007
  • The instruction level parallelism, which has been used to improve the performance of processors, expose its limit. The change of a control flow by a branch miss prediction is one of the obstacles that restrict the instruction level parallelism. The single chip multiprocessors have been developed to utilize the thread level parallelism. However, we could not use the maximum performance of the single chip multiprocessor in case of executing the coded programs without considering the multi-thread. In order to overcome the two performance degradation factors, in this paper, we suggest the concurrent branch execution method that applies to the multi-path execution method at a single chip multiprocessor. We executes all two flows of the conditional branch using the idle core processor. Through this, we can improve the processor's efficiency with blocking the control flow termination by the branch instruction and reducing the idle time. We analyze the effects of concurrent branch execution proposed in this paper through the simulation. As a result of that, concurrent branch execution reduces about 20% of idle time and improves the maximum 10% of the branch prediction accuracy. We show that our scheme improves the overall performance of maximum 39% compared to the normal single chip multiprocessor and maximum 27% compared to the superscalar processor.