• Title/Summary/Keyword: 분기 명령어

Search Result 70, Processing Time 0.02 seconds

Performance Improvement of Single Chip Multiprocessor using Concurrent Branch Execution (분기 동시 수행을 이용한 단일 칩 멀티프로세서의 성능 개선)

  • Lee, Seung-Ryul;Kim, Jun-Shik;Choi, Jae-Hyeok;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.2
    • /
    • pp.61-71
    • /
    • 2007
  • The instruction level parallelism, which has been used to improve the performance of processors, expose its limit. The change of a control flow by a branch miss prediction is one of the obstacles that restrict the instruction level parallelism. The single chip multiprocessors have been developed to utilize the thread level parallelism. However, we could not use the maximum performance of the single chip multiprocessor in case of executing the coded programs without considering the multi-thread. In order to overcome the two performance degradation factors, in this paper, we suggest the concurrent branch execution method that applies to the multi-path execution method at a single chip multiprocessor. We executes all two flows of the conditional branch using the idle core processor. Through this, we can improve the processor's efficiency with blocking the control flow termination by the branch instruction and reducing the idle time. We analyze the effects of concurrent branch execution proposed in this paper through the simulation. As a result of that, concurrent branch execution reduces about 20% of idle time and improves the maximum 10% of the branch prediction accuracy. We show that our scheme improves the overall performance of maximum 39% compared to the normal single chip multiprocessor and maximum 27% compared to the superscalar processor.

A New Asynchronous Pipeline Architecture for CISC type Embedded Micro-Controller, A8051 (CISC 임베디드 컨트롤러를 위한 새로운 비동기 파이프라인 아키텍쳐, A8051)

  • 이제훈;조경록
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.4
    • /
    • pp.85-94
    • /
    • 2003
  • The asynchronous design methods proved to have the higher performance in power consumption and execution speed than synchronous ones because it just needs to activate the required module without feeding clock in the system. Despite the advantage of CISC machine providing the variable addressing modes and instructions, its execution scheme is hardly suited for a synchronous Pipeline architecture and incurs a lot of overhead. This paper proposes a novel asynchronous pipeline architecture, A80sl, whose instruction set is fully compatible with that of Intel 80C51, an embedded micro controller. We classify the instructions into the group keeping the same execution scheme for the asynchronous pipeline and optimize it eliminating the bubble stage that comes from the overhead of the multi-cycle execution. The new methodologies for branch and various instruction lengths are suggested to minimize the number of states required for instructions execution and to increase its parallelism. The proposed A80C51 architecture is synthesized with 0.35${\mu}{\textrm}{m}$ CMOS standard cell library. The simulation results show higher speed than that of Intel 80C51 with 36 MHz and other asynchronous counterparts by 24 times.

HARP를 위한 최적화 Compiler의 Code 재정렬방법에 관한 연구

  • Kim, In-Hong;Gang, Ik-Tae
    • ETRI Journal
    • /
    • v.10 no.3
    • /
    • pp.73-82
    • /
    • 1988
  • 파이프라인을 사용하는 프로세서에서는 인스트럭션들의 중첩 수행으로 인한 자원 사용에서의 충돌이 일어나거나 분기 명령어로 인해 파이프라인의 효율이 감소하는 특성을 가지고 있다. 전자통신연구소에서 개발하고 있는 프로세서에서는 이를 컴파일러에 의해 해결하며 본 논문은 이에 관하여 기술한다.

  • PDF

Branch Prediction Latency Hiding Scheme using Branch Pre-Prediction and Modified BTB (분기 선예측과 개선된 BTB 구조를 사용한 분기 예측 지연시간 은폐 기법)

  • Kim, Ju-Hwan;Kwak, Jong-Wook;Jhon, Chu-Shik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.1-10
    • /
    • 2009
  • Precise branch predictor has a profound impact on system performance in modern processor architectures. Recent works show that prediction latency as well as prediction accuracy has a critical impact on overall system performance as well. However, prediction latency tends to be overlooked. In this paper, we propose Branch Pre-Prediction policy to tolerate branch prediction latency. The proposed solution allows that branch predictor can proceed its prediction without any information from the fetch engine, separating the prediction engine from fetch stage. In addition, we propose newly modified BTE structure to support our solution. The simulation result shows that proposed solution can hide most prediction latency with still providing the same level of prediction accuracy. Furthermore, the proposed solution shows even better performance than the ideal case, that is the predictor which always takes a single cycle prediction latency. In our experiments, IPC improvement is up to 11.92% and 5.15% in average, compared to conventional predictor system.

A Study on Pipelined Architecture with Branch Prediction and Two Paths Strategy (분기 예측과 이중 경로 전략을 결합한 파이프라인 구조에 관한 연구)

  • Ju, Yeong-Sang;Jo, Gyeong-San
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.1
    • /
    • pp.181-190
    • /
    • 1996
  • Pipelined architecture improves processor performance by overlapping the execution of several different instructions. The effect of control hazard stalls the pipeline and reduces processor performance. In order to reduce the effect of control hazard caused by branch, we proposes a new approach combining both branch prediction and two paths strategy. In addition, we verify the performance improvement in a proposed approach by utilizing system performance metric CPI rather than BEP.

  • PDF

Efficient Indirect Branch Predictor Based on Data Dependence (효율적인 데이터 종속 기반의 간접 분기 예측기)

  • Paik Kyoung-Ho;Kim Eun-Sung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4 s.310
    • /
    • pp.1-14
    • /
    • 2006
  • The indirect branch instruction is a most substantial obstacle in utilizing ILP of modem high performance processors. The target address of an indirect branch has the polymorphic characteristic varied dynamically, so it is very difficult to predict the accurate target address. Therefore the performance of a processor with speculative methodology is reduced significantly due to the many execution cycle delays in occurring the misprediction. We proposed the very accurate and novel indirect branch prediction scheme so called data-dependence based prediction. The predictor results in the prediction accuracy of 98.92% using 1K entries, and. 99.95% using 8K But, all of the proposed indirect predictor including our predictor has a large hardware overhead for restoring expected target addresses as well as tags for alleviating an aliasing. Hence, we propose the scheme minimizing the hardware overhead without sacrificing the prediction accuracy. Our experiment results show that the hardware is reduced about 60% without the performance loss, and about 80% sacrificing only the performance loss of 0.1% in aspect of the tag overhead. Also, in aspect of the overhead of storing target addresses, it can save the hardware about 35% without the performance loss, and about 45% sacrificing only the performance loss of 1.11%.

Prediction Accuracy Enhancement of Function Return Address via RAS Pollution Prevention (RAS 오염 방지를 통한 함수 복귀 예측 정확도 향상)

  • Kim, Ju-Hwan;Kwak, Jong-Wook;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.3
    • /
    • pp.54-68
    • /
    • 2011
  • As the prediction accuracy of conditional branch instruction is increased highly, the importance of prediction accuracy for unconditional branch instruction is also increased accordingly. Except the case of RAS(Return Address Stack) overflow, the prediction accuracy of function return address should be 100% theoretically. However, there exist some possibilities of miss-predictions for RAS return addresses, when miss-speculative execution paths are invalidated, in case of modern speculative microprocessor environments. In this paper, we propose the RAS rename technique to prevent RAS pollution, results in the reduction of RAS miss-prediction. We divide a RAS stack into a soft-stack and a hard-stack and we handle the instructions for speculative execution in the soft-stack. When some overwrites happen in the soft-stack, we move the soft-stack data into the hard-stack. In addition, we propose an enhanced version of RAS rename scheme. In simulation results, our solution provide 1/90 reduction of miss-prediction of function return address, results in up to 6.85% IPC improvement, compared to normal RAS method. Furthermore, it reduce miss-prediction ratio as 1/9, compared to previous technique.

Branch Prediction in Multiprogramming Environment (멀티프로그래밍 환경에서의 분기 예측)

  • Lee, Mun-Sang;Gang, Yeong-Jae;Maeng, Seung-Ryeol
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.9
    • /
    • pp.1158-1165
    • /
    • 1999
  • 조건부 분기 명령어(conditional branch instruction)의 잘못된 분기 예측(branch misprediction)은 프로세서의 성능 향상에 심각한 장애 요인이 되고 있다. 특히 시분할(time-sharing) 시스템과 같이 문맥 교환(context switch)이 발생하는 멀티프로그래밍 환경(multiprogramming environment)에서는 더욱 낮은 분기 예측 정확성(branch prediction accuracy)을 보인다. 본 논문에서는 문맥 교환이 발생하는 멀티프로그래밍 환경에서 높은 분기 예측 정확성을 보이는 중첩 분기 예측표 교환(Overlapped Predictor Table Switch, OPTS) 기법을 소개한다. 분기 예측표(predictor table)를 분할하여 각각의 프로세스(process)에 할당하는 OPTS 기법은 문맥 교환의 영향을 최소화함으로써 높은 분기 예측 정확성을 유지하는 분기 예측 방법이다.Abstract There is wide agreement that one of the most important impediments to the performance of current and future pipelined superscalar processors is the presence of conditional branches in the instruction stream. Accurate branch prediction is required to overcome this performance limitation. Many branch predictors have been proposed to help to alleviate this problem, including the two-level adaptive branch predictor, and more recently, hybrid branch predictor. In a less idealized environment, such as a time-sharing system, code of interest involves context switches. Context switches, even at fairly large intervals, can seriously degrade the performance of many of the most accurate branch prediction schemes. In this study, we measure the effect of context switch on the branch prediction accuracy in various situation and show the feasibility of our new mechanism, OPTS(Overlapped Predictor Table Switch), which save and restore branch history table at every context switch.

Enhancing Instruction Queue Efficiency with Return Address Stack in Shallow-Pipelined EISC Architecture (복귀주소 스택을 활용한 얕은 파이프라인 EISC 아키텍처의 명령어 큐 효율성 향상연구)

  • Kim, Han-Yee;Lee, SeungEun;Kim, Kwan-Young;Suh, Taeweon
    • The Journal of Korean Association of Computer Education
    • /
    • v.18 no.2
    • /
    • pp.71-81
    • /
    • 2015
  • In the EISC processor, the Instruction Queue (IQ) supporting LERI folding and loop buffering occupies roughly 20% of real estate, and its efficient utilization is a key for performance. This paper presents an architectural enhancement for the IQ utilization with return address stack (RAS) in the EISC processor. The proposed architecture eliminates the RAS corruption from the wrong-path, taking advantage of shallow pipeline. In experiments, a 4-entry RAS reduces the number of IQ flushes by up to 58.90% over baseline, and an 8-entry RAS by up to 61.28%. The experiments show up to 3.47% performance improvement with 8-entry RAS and up to 3.15% performance improvement with 4-entry RAS.

Improving Hit Ratio and Hybrid Branch Prediction Performance with Victim BTB (Victim BTB를 활용한 히트율 개선과 효율적인 통합 분기 예측)

  • Joo, Young-Sang;Cho, Kyung-San
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.10
    • /
    • pp.2676-2685
    • /
    • 1998
  • In order to improve the branch prediction accuracy and to reduce the BTB miss rate, this paper proposes a two-level BTB structure that adds small-sized victim BTB to the convetional BTB. With small cost, two-level BTB can reduce the BTB miss rate as well as improve the prediction accuracy of the hybrid branch prediction strategy which combines dynamic prediction and static prediction. Through the trace-driven simulation of four bechmark programs, the performance improvement by the proposed two-level BTB structure is analysed and validated. Our proposed BTB structure can improve the BTB miss rate by 26.5% and the misprediction rate by 26.75%

  • PDF