• Title/Summary/Keyword: misprediction

Search Result 24, Processing Time 0.033 seconds

Design of an ALU for SMT Microprocessors (SMT 마이크로프로세서에 적합한 ALU의 설계)

  • 김상철;홍인표;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1383-1386
    • /
    • 2003
  • In this paper, an ALU for Simultaneous Multi-Threading (SMT) microprocessors is designed. The SMT architecture improves notably performance and utilization of processes compared with conventional superscalar architectures by executing instructions from multiple threads at the same time. This ALU adopts data bypassing method to process multi-threads. And it can flush instructions in the same thread that generate exceptions such as branch misprediction. interrupt etc, performance of SMT microprocessors with data bypassing and exception handler can be improved.

  • PDF

Sequential Value Misprediction Recovery Mechanism in High Performance Microprocessors (고성능 마이크로프로세서에서 순차적 값 예측 실패 복구 방식)

  • 전병찬;박희룡;이상정
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.685-687
    • /
    • 2002
  • 고성능 슈퍼스칼라 프로세서에서 값 예측 실패 시에 잘못 예측된 값을 사용하여 모험적으로 수행된 명령들만을 순차적으로 취소하고 복구한 후에 재이슈하는 값 예측 실패 복구 메커니즘을 제안한다. 제안된 복구 방식은 값 예측이 틀린 종속명령만을 선택적으로 재이슈하여 불필요한 재이슈를 줄임으로써 값 예측 실패 시에 손실을 줄인다. 또한 기존의 방식들처럼 잘못 예측된 명령에 종속적인 명령들의 한번에 병렬로 검색하지 않고 명령들의 종속체인을 따라 순차적으로 검색함으로써 프로세서의 클럭 사이클에 영향을 미치지 않으면서 하드웨어의 구현의 복잡성을 줄인다.

  • PDF

A Branch Misprediction Recovery Mechanism using Control Independence (제어 독립성을 이용한 분기 예상 실패 복구 메커니즘)

  • 윤성룡;신영호;박홍준;조영일
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10c
    • /
    • pp.636-638
    • /
    • 2000
  • 제어 독립성(Control Independence)은 슈퍼스칼라 프로세서에서 명령어 수준 병렬성(Instruction-Level Parallelism)을 향상시키기 위한 중요한 요소로 작용하고 있다. 분기 예상기법(Branch Prediction Mechanism)에서 잘못 예상될 경우에는 예상한 분기 방향의 명령어들을 제거하고 올바른 분기 방향의 명령어들을 다시 반입하여 수행해야 한다. 본 논문에서는 컴파일 시 프로파일링을 통한 정적인 방법과 프로그램상의 제어 흐름을 통해 동적으로 제어 독립적인 명령어를 탐지함으로써 분기 명령어의 잘못된 예상으로 인해 제거되는 명령어를 효과적으로 감소시켜 프로세서의 성능을 향상시키는 메커니즘을 제안한다. SPECint95 벤치마크 프로그램에 대해 기존의 방법과 본 논문에서 제안한 방법 사이의 사이클 당 수행된 명령어 수를 분석한 결과, 4-width 프로세서에서 4%~6%, 8-width 프로세서에서 11%~18%, 16-width 프로세서에서 15%~17%의 성능 향상을 보이고 있다.

  • PDF

Efficient Algorithm for Query Processing of Aggregate functions in ROLAP Environment (ROLAP 환경에서 집단함수 질의처리를 위한 효율적인 알고리즘)

  • 김인식;김종겸;정순기
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.3
    • /
    • pp.40-46
    • /
    • 2003
  • The high-performance processors have recently employed sophisticated techniques to overlap and simultaneously execute multiple computation and memory operations. For the query processing of database management systems, those hardware characteristics are the important research issue. The latest works show that the cache miss penalty between main memory and CPU becomes new bottlenecks and the branch misprediction causes serious resource-waste. An effcient algorithm for query processing of aggregate functions considering these hardware characteristics was proposed in this dissertation.

  • PDF

Exploring Branch Target Buffer Architecture on Intel Processors with Performance Monitor Counter (Performance Monitor Counter를 이용한 Intel Processor의 Branch Target Buffer 구조 탐구)

  • Jeong, Juhye;Kim, Han-Yee;Suh, Taeweon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.24-27
    • /
    • 2019
  • Meltdown, Spectre 등 하드웨어의 취약점을 이용하는 side-channel 공격이 주목을 받으면서 주요 microarchitecture 구조에 대한 철저한 이해의 필요성이 커지고 있다. 현대 마이크로프로세서에서 branch prediction이 갖는 중요성에도 불구하고 세부적인 사항은 거의 알려지지 않았으며 잠재적 공격에 대비하기 위해서는 반드시 현재 드러난 정보 이상의 detail을 탐구하기 위한 시도가 필요하다. 본 연구에서는 Performance Monitor Counter를 이용해 branch 명령어를 포함한 프로그램이 실행되는 동안 Branch Prediction Unit에 의한 misprediction 이벤트가 발생하는 횟수를 체크하여 인텔 하스웰, 스카이레이크에서 사용되는 branch target buffer의 구조를 파악하기 위한 실험을 수행하였다. 연구를 통해 해당 프로세서의 BTB의 size, number of way를 추정할 수 있었다.

Improving Hit Ratio and Hybrid Branch Prediction Performance with Victim BTB (Victim BTB를 활용한 히트율 개선과 효율적인 통합 분기 예측)

  • Joo, Young-Sang;Cho, Kyung-San
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.10
    • /
    • pp.2676-2685
    • /
    • 1998
  • In order to improve the branch prediction accuracy and to reduce the BTB miss rate, this paper proposes a two-level BTB structure that adds small-sized victim BTB to the convetional BTB. With small cost, two-level BTB can reduce the BTB miss rate as well as improve the prediction accuracy of the hybrid branch prediction strategy which combines dynamic prediction and static prediction. Through the trace-driven simulation of four bechmark programs, the performance improvement by the proposed two-level BTB structure is analysed and validated. Our proposed BTB structure can improve the BTB miss rate by 26.5% and the misprediction rate by 26.75%

  • PDF

Performance Improvement of Operand Fetching with the Operand Reference Prediction Cache(ORPC) (오퍼랜드 참조 예측 캐쉬(ORPC)를 활용한 오퍼랜드 페치의 성능 개선)

  • Kim, Heung-Jun;Cho, Kyung-San
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1652-1659
    • /
    • 1998
  • To provide performance gains by reducing the operand referencing latency and data cache bandwidth requirements, we present an operand reference prediction cache (ORPC) which predicts operand value and address translation during the instruction fetch stage. The prediction is verified in the early stage, and thus it minimizes the performance penalty caused by the misprediction. Through the trace-driven simulation of six benchmark programs, the performance improvement by proposed three aRPC stmctures (OfiPC1, OfiPC2. ORPC3)is analysed and validated.

  • PDF

An Implementation of Efficient Quicksort Utilizing SIMD-Based VBP Technique (SIMD 기반의 VBP 기법을 적용한 효율적인 퀵정렬의 구현)

  • Hong, Gilseok;Kim, Hongyeon;Kang, Seonghyeon;Min, Jun-Ki
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.498-503
    • /
    • 2017
  • SIMD (Single Instruction Multiple Data) is a representative parallelization architecture that processes multiple data loaded in a SIMD register with a single instruction. Quicksort is a sorting algorithm that picks an element as a pivot from the array and reorders the array such that all elements having the values less than the pivot value are located in the left side on the pivot as well as all elements having the value greater than the pivot value are located in the right side on the pivot and then the algorithm performs the same task on both sublist recursively. In this paper, we propose an efficient Quicksort algorithm applying the SIMD instructions which minimally invokes conditional branches to avoid the performance degradation incurred by branch misprediction in a pipeline architecture. In addition, we improve the performance of the Quicksort algorithm by fetching data into a SIMD register as a byte unit to apply VBP (Vertical Bit Parallel) and the early pruning technique.

An Dynamic Branch Prediction Scheme to Reduce Negative Interferences for ILP Processors (ILP 프로세서를 위한 부정적 간섭을 감소시키는 동적 분기예상 기법)

  • 박홍준;조영일
    • Journal of Internet Computing and Services
    • /
    • v.2 no.1
    • /
    • pp.23-30
    • /
    • 2001
  • ILP processors require an accurate branch prediction scheme to achieve higher performance. Two-Level branch predictor has been known to achieve high prediction accuracy. But, when a branch accesses a PHT entry that was, previously updated by other branch, Two-level predictor may cause interferences. Negative interferences among all interferences have a negative effect on performance, since they can cause branch mispredictions. Agree predictor achieve high prediction accuracy by converting negative interferences to positive interferences by adding bias bits to BTB, but negative interferences may occur when bias bit is set incorrectly. This paper presents a new dynamic branch predictor which reduces negative interferences. In the proposed predictor, we attach hit bits to entries in BTB to change bias bit dynamically during the execution time, h a result the proposed scheme improve the accuracy of prediction by reducing negative Interferences effectively, To illustrate the effect of the proposed scheme, we evaluate the performance of this scheme using SPEC92int benchmarks, The results show that the proposed scheme can outperform traditional branch predictors.

  • PDF

Branch Prediction in Multiprogramming Environment (멀티프로그래밍 환경에서의 분기 예측)

  • Lee, Mun-Sang;Gang, Yeong-Jae;Maeng, Seung-Ryeol
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.9
    • /
    • pp.1158-1165
    • /
    • 1999
  • 조건부 분기 명령어(conditional branch instruction)의 잘못된 분기 예측(branch misprediction)은 프로세서의 성능 향상에 심각한 장애 요인이 되고 있다. 특히 시분할(time-sharing) 시스템과 같이 문맥 교환(context switch)이 발생하는 멀티프로그래밍 환경(multiprogramming environment)에서는 더욱 낮은 분기 예측 정확성(branch prediction accuracy)을 보인다. 본 논문에서는 문맥 교환이 발생하는 멀티프로그래밍 환경에서 높은 분기 예측 정확성을 보이는 중첩 분기 예측표 교환(Overlapped Predictor Table Switch, OPTS) 기법을 소개한다. 분기 예측표(predictor table)를 분할하여 각각의 프로세스(process)에 할당하는 OPTS 기법은 문맥 교환의 영향을 최소화함으로써 높은 분기 예측 정확성을 유지하는 분기 예측 방법이다.Abstract There is wide agreement that one of the most important impediments to the performance of current and future pipelined superscalar processors is the presence of conditional branches in the instruction stream. Accurate branch prediction is required to overcome this performance limitation. Many branch predictors have been proposed to help to alleviate this problem, including the two-level adaptive branch predictor, and more recently, hybrid branch predictor. In a less idealized environment, such as a time-sharing system, code of interest involves context switches. Context switches, even at fairly large intervals, can seriously degrade the performance of many of the most accurate branch prediction schemes. In this study, we measure the effect of context switch on the branch prediction accuracy in various situation and show the feasibility of our new mechanism, OPTS(Overlapped Predictor Table Switch), which save and restore branch history table at every context switch.