Browse > Article

Enhancing Instruction Queue Efficiency with Return Address Stack in Shallow-Pipelined EISC Architecture  

Kim, Han-Yee (고려대학교 컴퓨터교육학과)
Lee, SeungEun (서울과학기술대학교 전자공학과)
Kim, Kwan-Young (에이디칩스)
Suh, Taeweon (고려대학교 컴퓨터교육과)
Publication Information
The Journal of Korean Association of Computer Education / v.18, no.2, 2015 , pp. 71-81 More about this Journal
Abstract
In the EISC processor, the Instruction Queue (IQ) supporting LERI folding and loop buffering occupies roughly 20% of real estate, and its efficient utilization is a key for performance. This paper presents an architectural enhancement for the IQ utilization with return address stack (RAS) in the EISC processor. The proposed architecture eliminates the RAS corruption from the wrong-path, taking advantage of shallow pipeline. In experiments, a 4-entry RAS reduces the number of IQ flushes by up to 58.90% over baseline, and an 8-entry RAS by up to 61.28%. The experiments show up to 3.47% performance improvement with 8-entry RAS and up to 3.15% performance improvement with 4-entry RAS.
Keywords
EISC; Instruction Queue; Return Address Stack; Branch Predictor; Microarchitecture;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Thomadakis, M. E. (2011). The architecture of the Nehalem processor and Nehalem-EP smp platforms. Resource, 3, 2.
2 Seznec, A., & Michaud, P. (1999). De-aliased hybrid branch predictors.
3 Yeh, T. Y., & Patt, Y. N. (1993). A comparison of dynamic branch predictors that use two levels of branch history. ACM SIGARCH Computer Architecture News, 21(2), 257-266.   DOI
4 McFarling, S. (1993). Combining branch predictors (Vol. 49). Technical Report TN-36, Digital Western Research Laboratory.
5 Papermaster, M., Dinkjian, R., Jayfiield, M., Lenk, P., Ciarfella, B., O'Conell, F., & DuPont, R. (1998). POWER3: Next generation 64-bit PowerPC processor design. IBM White Paper, October.
6 McFarling, S. (1993). Combining branch predictors (Vol. 49). Technical Report TN-36, Digital Western Research Laboratory.
7 ARM. $ARM1156T2-S^{TM}$ Revision: r0p4 Technical Reference Manual.
8 Hennessy JL, Patterson DA. (2002) Computer architecture: a quantitative approach: Morgan Kaufmann
9 Lee, H., Beckett, P., & Appelbe, B. (2001, January). High-performance extendable instruction set computing. In Australian Computer Science Communications (Vol. 23, No. 4, pp. 89-94). IEEE Computer Society.
10 Parikh, D., Skadron, K., Zhang, Y., & Stan, M. (2004). Power-aware branch prediction: Characterization and design. Computers, IEEE Transactions on, 53(2), 168-186.   DOI
11 Das, B., Bhattacharya, G., Maity, I., & Sikdar, B. K. (2011, December). Impact of Inaccurate Design of Branch Predictors on Processors' Power Consumption. In Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth International Conference on (pp. 335-342). IEEE.
12 Webb, C. F. (1988). Subroutine call/return stack. IBM Technical Disclosure Bulletin, 30(11), 221-225.
13 Kaeli, D. R., & Emma, P. G. (1991, April). Branch history table prediction of moving target branches due to subroutine returns. In ACM SIGARCH Computer Architecture News (Vol. 19, No. 3, pp. 34-42). ACM.   DOI
14 Jourdan, S., Stark, J., Hsing, T. H., & Patt, Y. N. (1997). Recovery requirements of branch prediction storage structures in the presence of mispredicted-path execution. International Journal of Parallel Programming, 25(5), 363-383.   DOI
15 Skadron, K., Ahuja, P. S., Martonosi, M., & Clark, D. W. (1998, November). Improving prediction for procedure returns with return-address-stack repair mechanisms. In Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture (pp. 259-271). IEEE Computer Society Press.
16 Wang, G., Hu, X., Zhu, Y., & Zhang, Y. (2012, June). Self-Aligning Return Address Stack. In Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on (pp. 278-282). IEEE.
17 Vandierendonck, H., & Seznec, A. (2008). Speculative return address stack management revisited. ACM Transactions on Architecture and Code Optimization (TACO), 5(3), 15.
18 Kim, H. G., Jung, D. Y., Jung, H. S., Choi, Y. M., Han, J. S., Min, B. G., & Oh, H. C. (2003). AE32000B: a fully synthesizable 32-bit embedded microprocessor core. ETRI journal, 25(5), 337-344.   DOI
19 Intel 64 and IA-32 Architectures Optimization Reference Manual. http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf.
20 Xilinx. ISE Simulator (ISim). http://www.xilinx.com/tools/isim.htm.
21 Weicker, R. P. (1984). Dhrystone: a synthetic systems programming benchmark. Communications of the ACM, 27(10), 1013-1030.   DOI