• Title/Summary/Keyword: 명령어 수준의 병렬성

Search Result 25, Processing Time 0.023 seconds

A Branch Misprediction Recovery Mechanism by Control Independence (제어 독립성과 분기예측 실패 복구 메커니즘)

  • Ko, Kwang-Hyun;Cho, Young-Il
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.14 no.1
    • /
    • pp.3-22
    • /
    • 2012
  • Control independence has been put forward as a significant new source of instruction-level parallelism for superscalar processors. In branch prediction mechanisms, all instructions after a mispredicted branch have to be squashed and then instructions of a correct path have to be re-fetched and re-executed. This paper presents a new branch misprediction recovery mechanism to reduce the number of instructions squashed on a misprediction. Detection of control independent instructions is accomplished with the help of the static method using a profiling and the dynamic method using a control flow of program sequences. We show that the suggested branch misprediction recovery mechanism improves the performance by 2~7% on a 4-issue processor, by 4~15% on an 8-issue processor and by 8~28% on a 16-issue processor.

Exploiting Parallelism in the Block Encryption Algorithms RC6 and Rijndael (블록 암호화 알고리즘 RC6 및 Rijndael에서의 병렬성 활용)

  • 정용화;정교일;손승원
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.11 no.2
    • /
    • pp.3-12
    • /
    • 2001
  • Currently, the superscalar architecture dominates todays microprocessor marketplase. As, more transistors are integrated onto larger die, however, an on-chip multiprocessor is regarded as a promising alternative to the superscalar microprocessor. This paper examines the behavior of the next generation block encryption algorithms RC6 and Rijndael on the on-chip multiprocessing microprocessor. Based on the simulation results by using a program-driven simulator, the on-chip multiprocessor can exploit thread level parallelism effectively and overcome the limitation of instruction level parallelism in the next generation block encryption algorithms.

Performance Improvement of Single Chip Multiprocessor using Concurrent Branch Execution (분기 동시 수행을 이용한 단일 칩 멀티프로세서의 성능 개선)

  • Lee, Seung-Ryul;Kim, Jun-Shik;Choi, Jae-Hyeok;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.2
    • /
    • pp.61-71
    • /
    • 2007
  • The instruction level parallelism, which has been used to improve the performance of processors, expose its limit. The change of a control flow by a branch miss prediction is one of the obstacles that restrict the instruction level parallelism. The single chip multiprocessors have been developed to utilize the thread level parallelism. However, we could not use the maximum performance of the single chip multiprocessor in case of executing the coded programs without considering the multi-thread. In order to overcome the two performance degradation factors, in this paper, we suggest the concurrent branch execution method that applies to the multi-path execution method at a single chip multiprocessor. We executes all two flows of the conditional branch using the idle core processor. Through this, we can improve the processor's efficiency with blocking the control flow termination by the branch instruction and reducing the idle time. We analyze the effects of concurrent branch execution proposed in this paper through the simulation. As a result of that, concurrent branch execution reduces about 20% of idle time and improves the maximum 10% of the branch prediction accuracy. We show that our scheme improves the overall performance of maximum 39% compared to the normal single chip multiprocessor and maximum 27% compared to the superscalar processor.

A Hybrid Value Predictor using Static and Dynamic Classification in Superscalar Processors (슈퍼스칼라 프로세서에서 정적 및 동적 분류를 사용한 혼합형 결과 값 예측기)

  • 김주익;박홍준;조영일
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.10
    • /
    • pp.569-578
    • /
    • 2003
  • Data dependencies are one of major hurdles to limit ILP(Instruction Level Parallelism), so several related works have suggested that the limit imposed by data dependencies can be overcome to some extent with use of the data value prediction. Hybrid value predictor can obtain the high prediction accuracy using advantages of various predictors, but it has a defect that same instruction has overlapping entries in all predictor. In this paper, we propose a new hybrid value predictor which achieves high performance by using the information of static and dynamic classification. The proposed predictor can enhance the prediction accuracy and efficiently decrease the prediction table size of predictor, because it allocates each instruction into single best-suited predictor during the fetch stage by using the information of static classification. Also, it can enhance the prediction accuracy because it selects a best- suited prediction method for the “Unknown”pattern instructions by using the dynamic classification mechanism. Simulation results based on the SimpleScalar/PISA tool set and the SPECint95 benchmarks show the average correct prediction rate of 85.1% by using the static classification mechanism. Also, we achieve the average correction prediction rate of 87.6% by using static and dynamic classification mechanism.

Speculative Update of a Stride Value Predictor in Superscalar Processors (슈퍼스칼라 프로세서에서 스트라이드 값 예측기의 모험적 갱신)

  • 전병찬;박희룡;이상정
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04a
    • /
    • pp.13-15
    • /
    • 2001
  • 슈퍼스칼라 프로세서에서 값 예측기는 한 명령어의 결과를 미리 예측하여 명령들 간의 데이터 종속관계를 극복하고 실행함으로써 명령어 수준 병렬성 (Instruction Level Parallesim, ILP)을 향상시키는 기법이다. 최근의 값 예측기는 프로세서의 명령 이슈율이 커짐에 따라 예측 테이블의 갱신이 테이블의 참조 속도를 따라가지 못하여 예측기의 성능이 저하되는 경향이 있다. 본 논문에서는 이러한 성능저하를 줄이기 위해 명령의 결과가 나올 때까지 기다리지 않고 테이블 값을 모험적으로 갱신(speculative update)하는 스트라이드 값 예측기를 제안한다. 제안된 방식의 타당성을 검증하기 위해 SimpleScalar 시뮬레이터 상에 제안된 예측기를 구현하여 SPECint95 벤치마트를 시뮬레이션하고 제안된 스트라이드 모험적 갱신(stride speculative update)이 기존의 스트라이드 예측기 보다 성능이 향상됨을 보인다.

Implementing Swing Modulo Scheduler for VLIW Processor (VLIW 프로세서를 위한 Swing Modulo Scheduler 구현)

  • Shin, Jangseop;Han, Sangjun;Jung, Hyungyun;Ahn, Minwook;Youn, Jonghee M.;Paek, Yunheung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.12-14
    • /
    • 2014
  • 하드웨어가 해저드(hazard) 검출을 지원하지 않는 멀티이슈 VLIW 프로세서의 성능을 높이기 위해서는 컴파일러가 명령어 의존성과 하드웨어 자원의 제약을 지키는 범위 안에서 최대한 명령어수준 병렬성(ILP)을 활용하는 것이 중요하다. 기본 블록(basic block) 스케쥴링은 Branch 등 제어 흐름(control flow)의 경계를 넘어선 스케쥴링을 행하지 않아 그 효과가 제한적이다. 소프트웨어 파이프라이닝(software pipelining)은 루프(loop)의 경계를 허물어 여러반본(iteration)의 명령어가 동시에 수행되도록 하는 것으로 모듈로 스케쥴링(modulo scheduling)은 그 중에 한 범주의 스케쥴링 기법들을 일컫는다. 본 연구에서는 그 중 한가지인 스윙 모듈로 스케쥴러(swing modulo scheduler)[1]를 구현하여 그 효과를 알아보고자 한다.

An implementation of a unified ALU in multi-core GPGPU based on SIMT architecture (SIMT 구조 기반 멀티코어 GPGPU의 통합 ALU 설계)

  • Kyung, Gyu-taek;Kwak, Jae-Chang;Lee, Kwang-yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.540-543
    • /
    • 2013
  • This paper describes an implementation of a unified ALU on multi-core GPGPU based on SIMT architecture. Our unified ALU can operate conditional branch instructions, data movement instructions, integer arithmetic instructions and floating-point arithmetic instructions. Since multi-core GPGPU contains a lot of ALU for parallel processing of various types, the main point of this paper is to design the minimum size ALU by unifying similar processing of each operations on circit level. All instrunctions were tested by making a test program. And we compare this results with results of CPU operations to verify our ALU. Our unified ALU's gate size is approximately 20,000 and the maximum operation frequency is 430MHz.

  • PDF

Optimistic Colescing Technique for Copy Elimination in ILP Instruction Scheduling (ILP 명령 스케쥴링에서의 복사 제거를 위한 낙관적 융합 기법)

  • Park, Jin-Pyo;Mun, Su-Muk
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.5
    • /
    • pp.692-701
    • /
    • 1999
  • 수퍼스칼라(superscalar)나 VLIW 와 같은 명령어 수준 병렬화(ILP) 프로세서의 성능을 극대화하는 과감한 명령어 스케쥴링은 소프트웨어 파이프라이닝과같은 스케쥴링 과정을 거치면서 일반적인 복사 명령어 제거 기법으로 없앨 수 없는 서로 간섭하는 복사 명령을 많이 만들어내는데 루프 내부에 생성된 이러한 복사명령은 적절한 루프 펼침을 수행하여 간섭관계를 없앰으로서 제거할 수 있다. 본 논문에서는 이와 같이 루프 펼침이 수행된 루프 내부의 복사명령을 제거하는 기법으로 그래프 컬러링 상에 구현한 낙관적 융합기법을 제안한다. 그래프 컬러링에서의 융합기법은 간선의 개수가 많은 노드를 만들어 낼수 있으므로 채색성에 부정적인 영향을 주는 것으로 알려져 왔으나 본 기법에서는 융합되는 노드에 동시에 간섭하는 노드의 간선의 수가 줄어드는 긍정적인 영향을 최대한 이용하여 채색성을 높이고 융합된 노드에 대한 실제 버림(spill)이 일어나는 경우 유효 범위 분절(live range splitting)을 통하여 버림의 부담을 최대한 줄이도록 하였으며 이를 VLIW 스케쥴링 된 SPEC 정수벤치마크 루프내부의 복사 명령 제거에 적용한 결과 제거 가능한 복사 명령의 99%를 제거하면서도 버림명령은 다른 융합 기법과 비교하여 가장 적게 발생하는 우수한 결과를 얻을수 있었다.

Measurement and Analysis of Power Dissipation of Value Speculation in Superscalar Processors (슈퍼스칼라 프로세서에서 값 예측을 이용한 모험적 실행의 전력소모 측정 및 분석)

  • 이상정;이명근;신화정
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.12
    • /
    • pp.724-735
    • /
    • 2003
  • In recent high-performance superscalar processors, the result value of an instruction is predicted to improve instruction-level parallelism by breaking data dependencies. Using those predicted values, instructions are speculatively executed and substantial performance can be gained. It, however, requires additional power consumption due to the frequent access and update of the value prediction table. In this paper, first, the trade-off between the performance improvement and the increased power consumption for value prediction is measured and analyzed. And, in order to reduce additional power consumption without performance loss, the technique of controlling speculative execution with confidence counter and predicting useful instructions is developed. Also, in order to prove the validity, a tool is developed that can simulate processor behavior at cycle-level and measure total energy consumption and power consumption per cycle.

Accurate Prediction of Polymorphic Indirect Branch Target (간접 분기의 타형태 타겟 주소의 정확한 예측)

  • 백경호;김은성
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.6
    • /
    • pp.1-11
    • /
    • 2004
  • Modern processors achieve high performance exploiting avaliable Instruction Level Parallelism(ILP) by using speculative technique such as branch prediction. Traditionally, branch direction can be predicted at very high accuracy by 2-level predictor, and branch target address is predicted by Branch Target Buffer(BTB). Except for indirect branch, each of the branch has the unique target, so its prediction is very accurate via BTB. But because indirect branch has dynamically polymorphic target, indirect branch target prediction is very difficult. In general, the technique of branch direction prediction is applied to indirect branch target prediction, and much better accuracy than traditional BTB is obtained for indirect branch. We present a new indirect branch target prediction scheme which combines a indirect branch instruction with its data dependent register of the instruction executed earlier than the branch. The result of SPEC benchmark simulation which are obtained on SimpleScalar simulator shows that the proposed predictor obtains the most perfect prediction accuracy than any other existing scheme.