DOI QR코드

DOI QR Code

Branch Prediction Latency Hiding Scheme using Branch Pre-Prediction and Modified BTB

분기 선예측과 개선된 BTB 구조를 사용한 분기 예측 지연시간 은폐 기법

  • 김주환 (영남대학교 전자정보공학부) ;
  • 곽종욱 (영남대학교 전자정보공학부) ;
  • 전주식 (서울대학교 전기컴퓨터공학부)
  • Published : 2009.10.31

Abstract

Precise branch predictor has a profound impact on system performance in modern processor architectures. Recent works show that prediction latency as well as prediction accuracy has a critical impact on overall system performance as well. However, prediction latency tends to be overlooked. In this paper, we propose Branch Pre-Prediction policy to tolerate branch prediction latency. The proposed solution allows that branch predictor can proceed its prediction without any information from the fetch engine, separating the prediction engine from fetch stage. In addition, we propose newly modified BTE structure to support our solution. The simulation result shows that proposed solution can hide most prediction latency with still providing the same level of prediction accuracy. Furthermore, the proposed solution shows even better performance than the ideal case, that is the predictor which always takes a single cycle prediction latency. In our experiments, IPC improvement is up to 11.92% and 5.15% in average, compared to conventional predictor system.

현대의 프로세서 아키텍처에서 정확한 분기 예측은 시스템의 성능에 지대한 영향을 끼친다. 최근의 연구들은 예측 정확도뿐만 아니라, 예측 지연시간 또한 성능에 막대한 영향을 끼친다는 것을 보여준다. 하지만, 예측 지연시간은 간과되는 경향이 있다. 본 논문에서는 분기 예측지연시간을 극복하기 위한 분기 선예측 기법을 제안한다. 이 기법은 분기장치를 인출 단계에서 분리함으로써, 분기 예측기가 명령어 인출 장치로부터의 아무런 정보도 없이 스스로 분기 예측을 진행 가능하게 한다. 또한, 제안된 기법을 지원하기 위해, BTB의 구조를 새롭게 개선하였다. 실험 결과는 제안된 기법이 동일한수준의 분기 예측정확도를 유지하면서, 대부분의 예측지연시간을 은폐한다는 것을 보여준다. 더욱이 제안된 기법은 항상 1 싸이클의 예측 지연시간을 가지는 이상적인 분기 예측기를 사용한 경우보다도 더 나은 성능을 보여준다. 본 논문의 실험 결과에 따르면, 기존의 방식과 비교했을 때, 최대 11.92% 평균 5.15%의 IPC 향상을 가져온다.

Keywords

References

  1. Patterson, D.A., and Hennessy, J.L. "Computer architecture: a quantitative approach." Morgan Kaufman. 2007. 4th Edition.
  2. D. A. Jimenez, "Reconsidering complex branch predictors," In Proceedings of the 9th HPCA. pp. 43-52, 2003.
  3. Santana, O.J. et al. "Latency Tolerant Branch Predictors," In Proceedings of Innovative Architecture for Future Generation High-Performance Processors and Systems, pp. 30-39, 2003.
  4. G. H. Loh, "Revisiting the Performance Impact of Branch Predictor Latencies," IEEE International Symposium on Performance Analysis of Systems and Software, pp. 59-69, 2006.
  5. D. A. Jimenez, S. W. Keckler, and C. Lin, "The impact of delay on the design of branch predictors," In Proc. 33rd Int'l Symp. on Microarchitecture, pp. 67-76, 2000.
  6. A. Seznec, S. Jourdan, P. Sainrat, P. Michaud, 'Multiple-block ahead branch predictor." In Proceedings of 7th ASPLOS, pp. 116-127, 1996
  7. A. Seznec et al., "Effective ahead pipelining of instruction block address generation," In Proceedings of the 30th ISCA, pp. 241-252, 2003.
  8. J. W. Kwak, J. H. Kim, S. T. Jhang and C. S. Jhon, "Early Start Prediction to Tolerate Branch Prediction Latency," Information an International Interdisciplinary journal. Vol. 11, No.5, 2008.
  9. A. Seznec, "The L-TAGE Branch Predictor," Journal of Instruction-Level Parallelism, Vol. 9, May, 2007.
  10. D. Jimenez, "Piecewise liner branch prediction," In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, Dec, 2003.
  11. D. JiMenez and C. Lin. "Neural methods for dynamic branch prediction," ACM Transactions on Computer Systems, Nov. 2002.
  12. P. Michaud, "A ppm-like, tag-basedpredictor," Journal of Instruction Level Parallelism, Vol. 7, Apr. 2005.
  13. Hongliang Gao and Huiyang Zhou, "PMPM: Prediction by Combining Multiple Partial Matches," Journal of Instruction-Level Parallelism, Vol. 9, May, 2007.
  14. Yasuo Ishii, "Fused Two-Level Branch Prediction with Ahead Calculation." Journal of Instruction-Level Parallelism. Vol. 9, May, 2007.
  15. Yasuyuki Ninomiya and Koki Abe, "A3PBP: A Path Traced Perceptron Branch Predictor Using Local History for Weight Selection," Journal of Instruction-Level Parallelism, Vol. 9, May, 2007.
  16. Hans Vandierendoncl and Andre Seznec, "Speculative Return Address Stack Management Revisted," ACM Transaction on Architecture and Code Optimization, Vol. 5, No.3, Article 15, 2008.
  17. Falcon Ayose, Santana Oliverio J., Ramirez Alex, Valero Mateo, ''A latency-conscious SMT branch prediction architecture." International journal of high performance computing and networking, Vol. 2, No. 1, pp. 11-21, 2004. https://doi.org/10.1504/IJHPCN.2004.009264
  18. A. Falcon, O. Santana, A. Ramirez and M. Valero, "Tolerating Branch Predictor Latency on SMT," ISHPC2003, LNCS 2858, pp. 86-98, 2003.
  19. J. W. Kwak, J.-H. Kim, and C. S. Jhon, "The Impact of Branch Direction History combined with Global Branch History in Branch Prediction," IEICE Transactions on Information and System, Vol. E88-D, No.7, pp. 1754-1758, July 2005. https://doi.org/10.1093/ietisy/e88-d.7.1754
  20. Kaveh Aasaraai, Amirali Baniasadi and Ehsan Atoofian, "Computational and storage power optimizations for the O-GEHL branch predictor," Proceedings of the 4th international conference on Computing frontiers, pp. 105-112, May, 2007.
  21. R. Thomas, M. Franklin, C. Wilkerson and J. Stark. "Improving Branch Prediction By Dynamic Dataflow-based Identification of Correlated Branches From a Large Global History," In Proc. of the International Symposium on Computer Architecture, pp. 314-323, 2003.
  22. McFarling, S., "Combining branch predictors. Tech. Rep. TN-36m," Digital Western Research Lab., June, 1993.
  23. SimpleScalar LLC, http://www.simplescalar.com/
  24. SPEC CPU Benchmarks, http://www.specbench.org/