Search | Korea Science

An Improved Dynamic Branch Predictor by Selective Access of a Specific Element in 4-Way Cache (4-Way 캐쉬의 선택된 Element를 이용한 향상된 동적 분기 예측기 구현)

Hwang, In-Sung;Hwang, Sun-Young
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.38A no.12
- /
- pp.1094-1101
- /
- 2013
This paper proposes an improved branch predictor that reduces the number execution cycles of applications by selectively accessing a specific element in 4-way associative cache. When a branch instruction is fetched, the proposed branch predictor acquires a branch target address from the selected element in the cache by referring to MRU buffer. Branch prediction rate and application execution speed are considerably improved by increasing the number of BTAC entries in restricted power condition, when compared with that of previous branch predictor which accesses all elements. The effectiveness of the proposed dynamic branch predictor is verified by executing benchmark applications on the core simulator. Experimental results show that number of execution cycles decreases by an average of 10.1%, while power consumption increases an average of 7.4%, when compared to that of a core without a dynamic branch predictor. Execution cycles are reduced by 4.1% in comparison with a core which employs previous dynamic branch predictor.
https://doi.org/10.7840/kics.2013.38A.12.1094 인용 PDF KSCI

A Design and Implementation of Branch Predictor for High Performance Superscalar Processors (고성능 슈퍼스칼라 프로세서를 위한 분기예측기의 설계 및 구현)

서정민;김귀우;이상정
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04a
- /
- pp.22-24
- /
- 2001
슈퍼스칼라 프로세서에서는 분기 명령의 결과 지연으로 명령의 공급이 중단되는 것을 방지하고 지속적인 파이프라인 처리를 위해서 분기의 결과를 미리 예측하여 명령을 폐치하고 있다. 본 논문에서는 심플스칼라 툴 셋을 사용하여 슈퍼스칼라 프로세서에서 사용되는 대표적인 동적 분기예측 방법 시뮬레이션 환경을 구축한다. 동적 분기예측 방법으로 분기 타겟버퍼(Branch Target Buffer, BTB) 상에서 분기명령의 자기 히스토리에 근거한 BTB 방식과 이전 분기명령의 히스토리와의 상관관계를 고려한 Gshare 분기예측기를 적용 구현한다. 심플스칼라 시뮬레이터에 SPEC95 벤치마크 프로그램을 실행시켜 디자인 파라미터 변화에 따른 분기 예측기의 예측정확도를 실험한다. 또한 BTB와 Gshare 분기예측기를 VHDL로 구현하고 Synopsys 툴을 이용하여 시뮬레이션 및 합성 과정을 거쳐 게이트 크기와 파워 소모량을 측정한다.

2-Level Adaptive Branch Prediction Based on Set-Associative Cache (세트 연관 캐쉬를 사용한 2단계 적응적 분기 예측)

Shim, Won
- The KIPS Transactions:PartA
- /
- v.9A no.4
- /
- pp.497-502
- /
- 2002
Conditional branches can severely limit the performance of instruction level parallelism by causing branch penalties. 2-level adaptive branch predictors were developed to get accurate branch prediction in high performance superscalar processors. Although 2 level adaptive branch predictors achieve very high prediction accuracy, they tend to be very costly. In this paper, set-associative cached correlated 2-level branch predictors are proposed to overcome the cost problem in conventional 2-level adaptive branch predictors. According to simulation results, cached correlated predictors deliver higher prediction accuracy than conventional predictors at a significantly lower cost. The best misprediction rates of global and local cached correlated predictors using set-associative caches are 5.99% and 6.28% respectively. They achieve 54% and 17% improvements over those of the conventional 2-level adaptive branch predictors.
https://doi.org/10.3745/KIPSTA.2002.9A.4.497 인용 PDF KSCI

Design and Implementation of an Automatic Embedded Core Generation System Using Advanced Dynamic Branch Prediction (동적 분기 예측을 지원하는 임베디드 코어 자동 생성 시스템의 설계와 구현)

Lee, Hyun-Cheol;Hwang, Sun-Young
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.38B no.1
- /
- pp.10-17
- /
- 2013
This thesis proposes an automatic embedded core generator system that supports branch prediction. The proposed system includes a dynamic branch prediction module that enhances execution speed of target applications by inserting history/direction flags into BTAC(Branch Target Address Cache). Entries of BHT(Branch History Table) and BTAC are determined based on branch informations extracted by simulation. To verify the effectiveness of the proposed branch prediction module, ARM9TDMI core including a dynamic branch predictor was described in SMDL and generated. Experimental results show that as the number of entry rises, area increase up to 60% while application execution cycle and BTAC miss rate drop by an average of 1.7% and 9.6%, respectively.
https://doi.org/10.7840/kics.2013.38B.1.10 인용 PDF KSCI

Dynamic Per-Branch History Length Fitting for High-Performance Processor (고성능 프로세서를 위한 분기 명령어의 동적 History 길이 조절 기법)

Kwak, Jong-Wook;Jhang, Seong-Tae;Jhon, Chu-Shik
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.44 no.2 s.314
- /
- pp.1-10
- /
- 2007
Branch prediction accuracy is critical for the overall system performance. Branch miss-prediction penalty is the one of the significant performance limiters for improving processor performance, as the pipeline deepens and the instruction issued per cycle increases. In this paper, we propose "Dynamic Per-Branch History Length Fitting Method" by tracking the data dependencies among the register writing instructions. The proposed solution first identifies the key branches, and then it selectively uses the histories of the key branches. To support this mechanism, we provide a history length adjustment algorithm and a required hardware module. As the result of simulation, the proposed mechanism outperforms the previous fixed static method, up to 5.96% in prediction accuracy. Furthermore, our method introduces the performance improvement, compared to the profiled results which are generally considered as the optimal ones.
PDF KSCI

Analysis on the Thermal Efficiency of Branch Prediction Techniques in 3D Multicore Processors (3차원 구조 멀티코어 프로세서의 분기 예측 기법에 관한 온도 효율성 분석)

Ahn, Jin-Woo;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
- The KIPS Transactions:PartA
- /
- v.19A no.2
- /
- pp.77-84
- /
- 2012
Speculative execution for improving instruction-level parallelism is widely used in high-performance processors. In the speculative execution technique, the most important factor is the accuracy of branch predictor. Unfortunately, complex branch predictors for improving the accuracy can cause serious thermal problems in 3D multicore processors. Thermal problems have negative impact on the processor performance. This paper analyzes two methods to solve the thermal problems in the branch predictor of 3D multi-core processors. First method is dynamic thermal management which turns off the execution of the branch predictor when the temperature of the branch predictor exceeds the threshold. Second method is thermal-aware branch predictor placement policy by considering each layer's temperature in 3D multi-core processors. According to our evaluation, the branch predictor placement policy shows that average temperature is $87.69^{\circ}C$, and average maximum temperature gradient is $11.17^{\circ}C$. And, dynamic thermal management shows that average temperature is $89.64^{\circ}C$ and average maximum temperature gradient is $17.62^{\circ}C$. Proposed branch predictor placement policy has superior thermal efficiency than the dynamic thermal management. In the perspective of performance, the proposed branch predictor placement policy degrades the performance by 3.61%, while the dynamic thermal management degrades the performance by 27.66%.
https://doi.org/10.3745/KIPSTA.2012.19A.2.077 인용 PDF KSCI

Peridynamic models for dynamic fracture in brittle materials (취성 재료의 동적 파괴 해석을 위한 Peridynamics 모델)

Ha, Youn-Doh
- Proceedings of the Computational Structural Engineering Institute Conference
- /
- 2011.04a
- /
- pp.561-564
- /
- 2011
다양한 공학/산업적 측면에서 동적 취성 파괴 현상은 매우 중요하다. 취성 균열은 다른 균열 전파에 비해 그 전파 속도가 매우 빠르고 전파 범위가 넓기 때문에 대규모의 파괴 현상을 일으킨다. 동적 전파 중인 취성 균열 거동을 모델화하기 위해 오랜 기간 동안 많은 연구가 진행되었지만, 여전히 많은 부분들이 해석되지 못한 채 남아있다. 특히 균열 생성 및 전파를 위해 인위적인 조건들을 도입해야 하는 것은 기존 방법론들이 가지는 공통적인 문제점이다. 본 연구는 peridynamics를 동적 분기 균열 문제 해석에 도입한다. Peridynamics는 전통적인 연속체 이론에 기반한 수치해석 모델화 기법으로 균열과 같은 비연속성이 있는 문제의 모델화에 강점이 있으며, 인위적인 조건 없이 매우 간단한 방법으로 파괴 현상을 해석할 수 있다. 본 연구에서는 peridynamics 모델이 실험적으로 관측된 분기균열 형상과 균열 전파 속도를 매우 잘 예측해 낼 수 있음을 보인다. 또한 균열팁 주변에 높은 응력이 발생할 때 나타나는 연쇄 분기 현상도 해석할 수 있다. 이와 같은 연구를 통해 응력파가 균열 전파 속도를 변화시키고 전파 방향에도 영향을 주는 것을 알 수 있었다. 수치해석 결과도 또한 실험 결과들과 잘 부합함을 확인하였다.
PDF

Accurate Prediction of Polymorphic Indirect Branch Target (간접 분기의 타형태 타겟 주소의 정확한 예측)

백경호;김은성
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.41 no.6
- /
- pp.1-11
- /
- 2004
Modern processors achieve high performance exploiting avaliable Instruction Level Parallelism(ILP) by using speculative technique such as branch prediction. Traditionally, branch direction can be predicted at very high accuracy by 2-level predictor, and branch target address is predicted by Branch Target Buffer(BTB). Except for indirect branch, each of the branch has the unique target, so its prediction is very accurate via BTB. But because indirect branch has dynamically polymorphic target, indirect branch target prediction is very difficult. In general, the technique of branch direction prediction is applied to indirect branch target prediction, and much better accuracy than traditional BTB is obtained for indirect branch. We present a new indirect branch target prediction scheme which combines a indirect branch instruction with its data dependent register of the instruction executed earlier than the branch. The result of SPEC benchmark simulation which are obtained on SimpleScalar simulator shows that the proposed predictor obtains the most perfect prediction accuracy than any other existing scheme.
PDF KSCI

Cache and Pipeline Architecture Improvement and Low Power Design of Embedded Processor (임베디드 프로세서의 캐시와 파이프라인 구조개선 및 저전력 설계)

Jung, Hong-Kyun;Ryoo, Kwang-Ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2008.10a
- /
- pp.289-292
- /
- 2008
This paper presents a branch prediction algorithm and a 4-way set-associative cache for performance improvement of OpenRISC processor and a clock gating algorithm using ODC (Observability Don't Care) operation for a low-power processor. The branch prediction algorithm has a structure using BTB(Branch Target Buffer) and 4-way set associative cache has lower miss rate than direct-mapped cache. The clock gating algorithm reduces dynamic power consumption. As a result of estimation of performance and dynamic power, the performance of the OpenRISC processor using the proposed algorithm is improved about 8.9% and dynamic power of the processor using samsung $0.18{\mu}m$ technology library is reduced by 13.9%.
PDF

Performance and Power Consumption Improvement of Embedded RISC Core (임베디드 RISC 코어의 성능 및 전력 개선)

Jung, Hong-Kyun;Ryoo, Kwang-Ki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.2
- /
- pp.453-461
- /
- 2010
This paper presents a branch prediction algorithm and a 4-way set-associative cache for performance improvement of embedded RISC core and a clock-gating algorithm using ODC (Observability Don't Care) operation to improve the power consumption of the core. The branch prediction algorithm has a structure using BTB(Branch Target Buffer) and 4-way set associative cache has lower miss rate than direct-mapped cache. Pseudo-LRU Policy, which is one of the Line Replacement Policies, is used for decreasing the number of bits that store LRU value. The clock gating algorithm reduces dynamic power consumption. As a result of estimation of performance and dynamic power, the performance of the OpenRISC core applied the proposed architecture is improved about 29% and dynamic power of the core using Chartered $0.18{\mu}m$ technology library is reduced by 16%.
https://doi.org/10.6109/jkiice.2010.14.2.453 인용 PDF KSCI

Search Result 24, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)