[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.6109/jicce.2012.10.1.078

Performance Improvement and Power Consumption Reduction of an Embedded RISC Core

Jung, Hong-Kyun (Department of Information and Communication Engineering, Hanbat National University)
Jin, Xianzhe (Department of Information and Communication Engineering, Hanbat National University)
Ryoo, Kwang-Ki (Department of Information and Communication Engineering, Hanbat National University)

Publication Information

Journal of information and communication convergence engineering / v.10, no.1, 2012 , pp. 78-84 More about this Journal

Abstract

This paper presents a branch prediction algorithm and a 4-way set-associative cache for performance improvement of an embedded RISC core and a clock-gating algorithm with observability don’t care (ODC) operation to reduce the power consumption of the core. The branch prediction algorithm has a structure using a branch target buffer (BTB) and 4-way set associative cache that has a lower miss rate than a direct-mapped cache. Pseudo-least recently used (LRU) policy is used for reducing the number of LRU bits. The clock-gating algorithm reduces dynamic power consumption. As a result of estimation of the performance and the dynamic power, the performance of the OpenRISC core applied to the proposed architecture is improved about 29% and the dynamic power of the core with the Chartered 0.18 ${\mu}m$ technology library is reduced by 16%.

Keywords

Set-associative cache; Dynamic branch prediction; ODC; OpenRISC;

Citations & Related Records

Reference

1	J. C. Lee, "A study on economical branch target buffer design," master's thesis, Soonchunhyang University, Asan, 2006.
2	H. M. Yang, "Design of a low-power branch predictor for embedded processors," master's thesis, Yonsei University, Seoul, 2005.
3	R. Khanna, S. Verma, R. Biswas, and J. B. Singh, "Implementation of branch delay in superscalar processors by reducing branch penalties," Proceedings of IEEE 2nd International Advance Computing Conference, Patiala, pp. 14-20, 2010.
4	W. Jin, J. Dong, K. Lu, and Y. Li, "The study of hierarchical branch prediction architecture," Proceedings of IEEE 14th International Conference on Computational Science and Engineering, Dalian, China, pp. 16-20, 2011.
5	C. Piguet, Low-power CMOS circuits: technology, logic design and CAD tools, Boca Raton: CRC Press, 2006.
6	C. Zhang, "Balanced cache: reducing conflict misses of directmapped caches," Proceedings of the 33rd Annual International Symposium on Computer Architecture, Boston, p. 155-166, 2006.
7	J. Nurmi, Processor design: system-on-chip computing for ASICs and FPGAs, Dordrecht: Springer, 2007.
8	J. Balfour, W. Dally, D. Black-Schaffer, V. Parikh, and J Park, "An energy-efficient processor architecture for embedded systems," IEEE Computer Architecture Letters, vol. 7, no. 1, pp. 29-32, 2008. DOI ScienceOn
9	S. AbdelHak, A. Sil, Y. Wang, N. F. Tzeng, and M. Bayoumi, "Reducing misprediction penalty in the branch target buffer," Proceedings of the 50th Midwest Symposium on Circuits and Systems, Montreal, pp. 1102-1105, 2007.
10	J. Hoogerbrugge, "Dynamic branch prediction for a VLIW processor," Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, Philadelphia, p. 207-214, 2000.
11	P. R. Panda, B. V. N. Silpa, A. Shrivastava, and K. Gummidipudi, Power-efficient system design, New York: Springer, 2010.
12	P. Babighian, L. Benini, and E. Macii, "A scalable ODC-based algorithm for RTL insertion of gated clocks," Proceedings of the Conference on Design, Automation and Test in Europe, Paris, p.500-505, 2004.
13	D. Lampret, OpenRISC 1200 IP core specification [Internet]. Available from: http://opencores.org/openrisc,or1200.
14	D. Lampret, OpenRISC 1000 architecture [Internet]. Available from: http://opencores.org/openrisc,architecture.
15	K. Kedzierski, M. Moreto, F. J. Cazorla, and M. Valero, "Adapting cache partitioning algorithms to pseudo-LRU replacement policies," Proceedings of 2010 IEEE International Symposium on Parallel & Distributed Processing, Atlanta, pp. 1- 12, 2010.
16	S. Roy, "H-NMRU: A low area, high performance cache replacement policy for embedded processors," Proceedings of the 22nd International Conference on VLSI Design, New Delhi, pp. 553-558, 2009.