통합 검색 | Korea Science

Instruction Flow based Early Way Determination Technique for Low-power L1 Instruction Cache

Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
- 한국컴퓨터정보학회논문지
- /
- 제21권9호
- /
- pp.1-9
- /
- 2016
Recent embedded processors employ set-associative L1 instruction cache to improve the performance. The energy consumption in the set-associative L1 instruction cache accounts for considerable portion in the embedded processor. When an instruction is required from the processor, all ways in the set-associative instruction cache are accessed in parallel. In this paper, we propose the technique to reduce the energy consumption in the set-associative L1 instruction cache effectively by accessing only one way. Gshare branch predictor is employed to predict the instruction flow and determine the way to fetch the instruction. When the branch prediction is untaken, next instruction in a sequential order can be fetched from the instruction cache by accessing only one way. According to our simulations with SPEC2006 benchmarks, the proposed technique requires negligible hardware overhead and shows 20% energy reduction on average in 4-way L1 instruction cache.
https://doi.org/10.9708/jksci.2016.21.9.001 인용 PDF KSCI

캐쉬 미스와 분기예측 실패를 고려한 명령어 페치 모델의 성능분석 (Performance Analyses of Instruction Fetch Models Considering Cache Miss and Branch Misprediction)

김선모;정진하;최상방
- 한국정보과학회논문지:시스템및이론
- /
- 제28권12호
- /
- pp.685-697
- /
- 2001
캐쉬 메모리는 명령어와 데이터의 참조시간을 줄이기 위하여 프로세서에 의해 참조되어질 가능성이 높은 주 메모리의 내용을 일시적으로 저장하는 용량이 작고 빠른 메모리이다. 본 논문에서는 슈퍼스칼라 프로세서에 적용될 수 있는 네 가지 명령어 캐쉬 구조에 대하여 캐쉬 미스와 분기예측 실패를 고려한 해석적 모델을 제안하고 성능을 분석하였다. 슈퍼스칼라 구조의 다양한 파라미터들을 정의하여 명령어 페치를 모델링하였으며, 해석적 모델의 타당성을 검증하기 위하여 시뮬레이션을 수행하여 얻은 결과와 비교하였다. 명령어 페치율에 있어서는 분기예측 실패로 인한 영향보다는 캐쉬 미스로 인한 성능저하가 더욱 큰 것으로 나타났다. 본 연구를 통하여 얻은 해석적 모델을 사용하면 시뮬레이션에서는 드러나지 않는 성능제약의 원인에 대한 명확한 규명이 가능하며, 캐쉬 성능에 있어서 캐쉬 미스와 분기예측 실패간의 관계에 대한 정확한 분석이 가능하다.
PDF

명령어 선 인출기의 전력 성능 (Power Performance of Instruction Pre-Fetch Unit)

송영규;오형철
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 1999년도 하계종합학술대회 논문집
- /
- pp.365-368
- /
- 1999
In this paper, we investigate the effect of adopting branch-penalty compensation schemes on the power performance of TLBs(Translation Look-aside Buffers) and instruction caches. We found that the double-buffer branch-penalty compensation scheme can reduce the power consumption of the TLBs and the instruction caches considered by up to 14-21.3%. The power consumption is estimated through simulation at the architectural level, using the Kamble/Ghose method
PDF

농업정보기술을 위한 ILP 프로세서에서 새로운 복구 메커니즘 적용 분기예측기 (A Branch Predictor with New Recovery Mechanism in ILP Processors for Agriculture Information Technology)

고광현;조영일
- Agribusiness and Information Management
- /
- 제1권2호
- /
- pp.43-60
- /
- 2009
To improve the performance of wide-issue superscalar processors, it is essential to increase the width of instruction fetch and the issue rate. Removal of control hazard has been put forward as a significant new source of instruction-level parallelism for superscalar processors and the conditional branch prediction is an important technique for improving processor performance. Branch mispredictions, however, waste a large number of cycles, inhibit out-of-order execution, and waste electric power on mis-speculated instructions. Hence, the branch predictor with higher accuracy is necessary for good processor performance. In global-history-based predictors like gshare and GAg, many mispredictions come from commit update of the branch history. Some works on this subject have discussed the need for speculative update of the history and recovery mechanisms for branch mispredictions. In this paper, we present a new mechanism for recovering the branch history after a misprediction. The proposed mechanism adds an age_counter to the original predictor and doubles the size of the branch history register. The age_counter counts the number of outstanding branches and uses it to recover the branch history register. Simulation results on the SimpleScalar 3.0/PISA tool set and the SPECINT95 benchmarks show that gshare and GAg with the proposed recovery mechanism improved the average prediction accuracy by 2.14% and 9.21%, respectively and the average IPC by 8.75% and 18.08%, respectively over the original predictor.
PDF

임베디드 시스템에서 후방 분기 명령어 정보를 이용한 저전력 명령어 캐쉬 설계 기법 (Energy-aware Instruction Cache Design using Backward Branch Information for Embedded Processors)

양나라;김종면;김철홍
- 한국컴퓨터정보학회논문지
- /
- 제13권6호
- /
- pp.33-39
- /
- 2008
반도체 기술의 급속한 발달과 함께 임베디드 프로세서의 성능이 점차 강력해지면서 몇 가지 문제점이 발생하게 되었다. 그 중에서도 프로세서 내에서 소비되는 에너지의 급격한 증가는 심각한 문제이다. 이러한 이유로 인해 최신의 임베디드 프로세서를 설계할 때에는 성능과 함께 에너지 효율성이 반드시 고려되어야 한다. 본 논문에서는 프로세서에서 소비되는 에너지의 상당 부분을 차지하고 있는 명령어 캐쉬의 에너지 효율성을 향상시키기 위해 후방 분기 명령어 정보를 이용하는 기법을 제안하고자 한다. 큰 크기의 주 명령어 캐쉬와 작은 크기의 순환문 캐쉬로 구성되는 제안된 기법을 통해 프로세서의 요청이 올 때 주 명령어 캐쉬와 순환문 캐쉬 중에서 하나의 캐쉬만이 선택적으로 접근되도록 하여 주 명령어 캐쉬의 접근 횟수를 크게 감소시킴으로써 우수한 에너지 효율성을 얻을 수 있다. 실험 결과, 제안하는 저전력 명령어 캐쉬는 기존의 명령어 캐쉬와 비교하여 평균 20%의 에너지 소비를 감소시킨다는 사실을 확인하였다.
PDF

멀티프로그래밍 환경에서의 분기 예측 (Branch Prediction in Multiprogramming Environment)

이문상;강영재;맹승렬
- 한국정보과학회논문지:시스템및이론
- /
- 제26권9호
- /
- pp.1158-1165
- /
- 1999
조건부 분기 명령어(conditional branch instruction)의 잘못된 분기 예측(branch misprediction)은 프로세서의 성능 향상에 심각한 장애 요인이 되고 있다. 특히 시분할(time-sharing) 시스템과 같이 문맥 교환(context switch)이 발생하는 멀티프로그래밍 환경(multiprogramming environment)에서는 더욱 낮은 분기 예측 정확성(branch prediction accuracy)을 보인다. 본 논문에서는 문맥 교환이 발생하는 멀티프로그래밍 환경에서 높은 분기 예측 정확성을 보이는 중첩 분기 예측표 교환(Overlapped Predictor Table Switch, OPTS) 기법을 소개한다. 분기 예측표(predictor table)를 분할하여 각각의 프로세스(process)에 할당하는 OPTS 기법은 문맥 교환의 영향을 최소화함으로써 높은 분기 예측 정확성을 유지하는 분기 예측 방법이다.Abstract There is wide agreement that one of the most important impediments to the performance of current and future pipelined superscalar processors is the presence of conditional branches in the instruction stream. Accurate branch prediction is required to overcome this performance limitation. Many branch predictors have been proposed to help to alleviate this problem, including the two-level adaptive branch predictor, and more recently, hybrid branch predictor. In a less idealized environment, such as a time-sharing system, code of interest involves context switches. Context switches, even at fairly large intervals, can seriously degrade the performance of many of the most accurate branch prediction schemes. In this study, we measure the effect of context switch on the branch prediction accuracy in various situation and show the feasibility of our new mechanism, OPTS(Overlapped Predictor Table Switch), which save and restore branch history table at every context switch.

분기 동시 수행을 이용한 단일 칩 멀티프로세서의 성능 향상 기법 (Performance improvement of single chip multiprocessor using concurrent branch execution)

이승렬;정진하;최재혁;최상방
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2006년도 하계종합학술대회
- /
- pp.723-724
- /
- 2006
Exploiting the instruction level parallelism encountered with the limit. Single chip multiprocessor was introduced to overcome the limit of traditional processor using the instruction level parallelism. Also, a branch miss prediction is one of the causes that reduce the processor performance. In order to overcome the problems, in this paper, we make single chip multiprocessor having the idle core execute the two control flow of conditional branch. This scheme is a kind of multi-path execution technique based on single chip multiprocessor architecture.
PDF

패턴/패스 통합 분기 예측 전략의 성능 분석 (Performance Analysis of Pattern/Path Hybrid Branch Prediction Strategy)

조경산
- 한국시뮬레이션학회논문지
- /
- 제8권3호
- /
- pp.17-28
- /
- 1999
Recently studies have shown that conditional branches can be accurately predicted by recording the path leading up to the branch. But path predictors are more complex and uncompatible with existing pattern branch predictors. In order to solve these problems, we propose a simple path branch predictor(SPBP) that hashes together two most recent branch instruction addresses. In addition, we propose a pattern/path hybrid branch predictor composed of the SPBP and existing pattern branch predictors. Through the trace-driven simulation of six benchmark programs, the performance improvement by the proposed pattern/path hybrid branch prediction is analysed and validated. The proposed predictor can improve the prediction accuracy from 94.21% to 95.03%.
PDF

프로그램 상의 제어 독립성을 이용한 분기 예상 실패 복구 메커니즘 (Branch Misprediction Recovery Mechanism That Exploits Control Independence on Program)

윤성룡;이원모;조영일
- 한국정보과학회논문지:시스템및이론
- /
- 제29권7호
- /
- pp.401-410
- /
- 2002
제어 독립성은 슈퍼스칼라 프로세서에서 명령어 수준 병렬성을 향상시키기 위한 중요한 요소로 작용하고 있다. 분기 예측기에서 예상이 잘못된 경우에는 예상한 분기 방향의 명령어들을 무효화시키고 올바른 분기 방향의 명령어들을 다시 반입하여 수행해야 한다. 본 논문에서는 컴파일 시 프로파일링을 통한 정적인 방법과 프로그램상의 제어 흐름을 통해 동적으로 제어 독립적인 명령어를 탐지해서 분기 명령어의 잘못된 예상으로 인해 무효화되는 명령어를 효과적으로 감소시켜 프로세서의 성능을 향상시키는 메커니즘을 제안한다. SPECint95 벤치마크 프로그램에 대해 기존의 방법과 본 논문에서 제안한 방법 사이의 사이클 당 수행된 명령어 수를 분석한 결과, 4-이슈 프로세서에서 2%~7%, 8-이슈 프로세서에서 4%~15%, 16-이슈 프로세서에서 18%~28%의 성능 향상을 보이고 있다.
PDF KSCI

분기 동시 수행을 이용한 단일 칩 멀티프로세서의 성능 개선 (Performance Improvement of Single Chip Multiprocessor using Concurrent Branch Execution)

이승렬;김준식;최재혁;최상방
- 대한전자공학회논문지SD
- /
- 제44권2호
- /
- pp.61-71
- /
- 2007
프로세서 성능향상에 일반적으로 이용되어 오던 명령어 수준의 병렬성은 이제 그 한계를 드러내고 있다. 명령어 수준의 병렬성을 이용하는데 장애가 되는 요인 중에 하나는 분기문에 의한 제어 흐름의 변화이다. 단일 칩 멀티프로세서는 쓰레드 수준의 병렬성을 이용하는 프로세서이다. 그러나 다중 쓰레드를 고려하지 않고 작성된 프로그램을 수행하는 경우에는 단일 칩 멀티프로세서의 성능을 최대한 사용할 수 없는 단점이 있다. 이와 같은 두 가지 성능 저하 요인을 극복하기 위해 본 논문에서는 다중 경로 수행 기법을 단일 칩 멀티프로세서에 적용한 분기 동시 수행 기법을 제안한다. 제안된 방법에서는 유휴 중인 프로세서를 이용하여 조건 분기의 두 흐름을 모두 수행하게 한다. 이를 통하여 분기문에 의한 제어 흐름이 끊기는 것을 막고 유휴 시간을 줄여서 프로세서의 효율을 높일 수 있다. 시뮬레이션을 통하여 본 논문에서 제시한 분기 동시 수행의 효과를 분석한 결과 분기 동시 수행으로 약 20%의 유휴 시간이 감소하였고, 분기 예측 성공률은 최대 10% 향상 되었다. 전체적으로 일반적인 단일 칩 멀티프로세서에 비해 최대 39%의 성능 향상을 이루었고, 슈퍼스칼라 프로세서에 비해 최대 27%의 성능 향상을 이루었다.
PDF KSCI

검색결과 73건 처리시간 0.033초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)