DOI QR코드

DOI QR Code

Design and Implementation of an Automatic Embedded Core Generation System Using Advanced Dynamic Branch Prediction

동적 분기 예측을 지원하는 임베디드 코어 자동 생성 시스템의 설계와 구현

  • 이현철 (서강대학교 전자공학과 CAD & ES 연구실) ;
  • 황선영 (서강대학교 전자공학과 CAD & ES 연구실)
  • Received : 2012.11.26
  • Accepted : 2012.12.14
  • Published : 2013.01.31

Abstract

This thesis proposes an automatic embedded core generator system that supports branch prediction. The proposed system includes a dynamic branch prediction module that enhances execution speed of target applications by inserting history/direction flags into BTAC(Branch Target Address Cache). Entries of BHT(Branch History Table) and BTAC are determined based on branch informations extracted by simulation. To verify the effectiveness of the proposed branch prediction module, ARM9TDMI core including a dynamic branch predictor was described in SMDL and generated. Experimental results show that as the number of entry rises, area increase up to 60% while application execution cycle and BTAC miss rate drop by an average of 1.7% and 9.6%, respectively.

본 논문은 분기 예측을 지원하는 임베디드 코어 자동 생성 시스템을 제안한다. 제안된 시스템은 동적 분기 예측모듈에 히스토리/분기방향 flag가 추가된 BTAC(Branch Target Address Cache)를 포함하여 타겟 어플리케이션의 수행 속도를 향상 시킬 수 있도록 하였다. 시뮬레이션으로부터 해당 어플리케이션의 분기 정보를 추출하고 이를 토대로 BHT(Branch History Table)와 BTAC의 entry를 결정한다. 제안된 분기 예측의 효율성을 검증하기 위해서 동적 분기 예측 모듈을 포함하는 ARM9TDMI 코어를 SMDL로 기술하고 코어를 생성하였다. 실험 결과는 entry의 수에 따라 면적은 60%까지 증가하였고 어플리케이션의 수행 사이클과 BTAC의 miss rate는 평균 1.7%, 9.6%씩 감소하였다.

Keywords

References

  1. N. Dutt and K. Choi, "Configurable processor for embedded computing," IEEE Comp., vol. 36, no. 1, pp. 120-123, Jan. 2003.
  2. K. Choi and Y. Cho, "Recent trends in the SoC design methodology," Mag. of the Institute of Electronics Engineers of Korea (IEEK), vol. 30, no. 9, pp. 17-27, Sep. 2003.
  3. A. Fauth, M. Fredericks, and A. Knoll, "Generation of hardware machine models from instruction set descriptions," in Proc. IEEE Workshop VLSI Signal Proces., Veldhoven, Netherlands, Oct. 1993.
  4. A. Hoffmann, T. Kogel, A. Nohl, G. Braun, O. Schliebusch, O. Wahlen, A. Wieferink, and H. Meyr, "A novel methodology for the design of application-specific instruction- set processors (ASIPs) using a machine description language," IEEE Trans. CAD of Int. Circuits and Systems, vol. 20, no. 11, pp. 1338-1354, Nov. 2001. https://doi.org/10.1109/43.959863
  5. P. Mishra, A. Kejariwal, and N. Dutt, "Rapid exploration of pipelined processors through automatic generation of synthesizable RTL model," in Proc. IEEE Int. Workshop on Rapid Syst. Prototyping, pp. 226-232, San Diego, CA, Jun. 2003.
  6. M. Itoh, Y, TAKEUCHI, M. IMAI, and A. SHIOMI, "Synthesizable HDL generation for pipelined processors from a micro-operation description," IEICE Trans., vol. E83-A, no. 3, pp. 394-400, Mar. 2000.
  7. G. Hadjiyiannis, S. Hanono, and S. Devadas, "ISDL: an instruction set description language for retargetability," in Proc. Design Automation Conf. pp. 299-302, Anaheim, CA, Jun. 1997.
  8. J. Hennessy and D. Patterson, Computer Architecture : A Quantitative Approach, Morgan Kaufmann Publishers Inc, 1990.
  9. L. Nadav and W. Shlomo, "Low power branch prediction for embedded application processors," in Proc. Low Power Electronics and Design, pp. 67-72, Austin, Texas, Aug. 2010.
  10. T. Juan, S. Sanjeevan, and J. Navarro, "Dynamic history-length fitting: a third level of adaptivity for branch prediction," in Proc. Computer Architecture, pp. 155-166, Barcelona, Spain, Jul. 1998.
  11. J. Lee and A. Smith, "Branch prediction strategies and branch target buffer design," Computer, vol. 17, no. 1, pp. 6-22, Jan. 1984.
  12. H. Lee and S. Hwang, "Design of a high-level synthesis system for automatic generation of pipelined datapath," J. of The Institute of Electronics Engineers of Korea (IEEK), vol. 31-A, no. 4, pp. 53-67, Mar. 1994.
  13. J. Cho, Y. Yoo, and S. Hwang, "Construction of an automatic generation system of embedded processor cores," J. KICS, vol. 30, no. 6A, pp. 526-534, Jun. 2005.
  14. J. Dongarra, High Performance Computing: Technology, Methods and Applications, North-Holland, 1995.
  15. K. Kedzierski, M. Moreto, F. Cazorla, and M. Valero, "Adapting cache partitioning algorithms to pseudo-LRU replacement policies," in Proc. Parallel & Distributed Processing, pp. 1-12, Atlanta, GA, Apr. 2010.
  16. ARM, "ARM922T Technical Reference Manual (rev 0)," 2001.
  17. ARM, "ARM Architecture Reference Manual (rev 0)," 2005.