DOI QR코드

DOI QR Code

Computing and Reducing Transient Error Propagation in Registers

  • Yan, Jun (Mathworks) ;
  • Zhang, Wei (Department of Electrical and Computer Engineering, Virginia Commonwealth University)
  • 투고 : 2011.01.25
  • 심사 : 2011.03.04
  • 발행 : 2011.06.30

초록

Recent research indicates that transient errors will increasingly become a critical concern in microprocessor design. As embedded processors are widely used in reliability-critical or noisy environments, it is necessary to develop cost-effective fault-tolerant techniques to protect processors against transient errors. The register file is one of the critical components that can significantly affect microprocessor system reliability, since registers are typically accessed very frequently, and transient errors in registers can be easily propagated to functional units or the memory system, leading to silent data error (SDC) or system crash. This paper focuses on investigating the impact of register file soft errors on system reliability and developing cost-effective techniques to improve the register file immunity to soft errors. This paper proposes the register vulnerability factor (RVF) concept to characterize the probability that register transient errors can escape the register file and thus potentially affect system reliability. We propose an approach to compute the RVF based on register access patterns. In this paper, we also propose two compiler-directed techniques and a hybrid approach to improve register file reliability cost-effectively by lowering the RVF value. Our experiments indicate that on average, RVF can be reduced to 9.1% and 9.5% by the hyperblock-based instruction re-scheduling and the reliability-oriented register assignment respectively, which can potentially lower the reliability cost significantly, without sacrificing the register value integrity.

키워드

참고문헌

  1. S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T. Austin, "A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor," Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003, pp. 29-40. https://doi.org/10.1109/MICRO.2003.1253181
  2. S. S. Mukherjee, J. Emer, and S. K. Reinhardt, "The soft error problem: an architectural perspective," The 11th International Symposium on High-Performance Computer Architecture, San Francisco, CA, 2005, pp. 243-247. https://doi.org/10.1109/HPCA.2005.37
  3. M. Rebaudengo, M. S. Reorda, and M. Violante, "An accurate analysis of the effects of soft errors in the instruction and data caches of a pipelined microprocessor," Design, Automation and Test in Europe Conference and Exhibition, Munich, Germany, 2003, pp. 602-607. https://doi.org/10.1109/DATE.2003.1253674
  4. S. K. Reinhardt and S. S. Mukherjee, "Transient fault detection via simultaneous multithreading," Proceedings of the 27th International Symposium on Computer Architecture, Vancouver, BC, 2000, pp. 25-36. https://doi.org/10.1145/342001.339652
  5. M. Tremblay and Y. Tamir, "Support for fault tolerance in VLSI processors," IEEE International Symposium on Circuits and Systems, Portland, OR, 1989, pp. 388-392. https://doi.org/10.1109/ISCAS.1989.100372
  6. R. Phelan, Addressing Soft Errors in ARM Core-Based SoC, Cambridge, UK: ARM Ltd., 2003.
  7. R. Ramanarayanan, V. Degalahal, N. Vijaykrishnan, M. J. Irwin, and D. Duarte, "Analysis of soft error rate in flip-flops and scannable latches," Proceedings of the IEEE International Systems-on- Chip (SOC) Conference, Portland, OR, 2003, pp. 231-234. https://doi.org/10.1109/SOC.2003.1241499
  8. C. L. Chen and M. Y. Hsiao, "Error-correcting codes for semiconductor memory applications: a state of the art review," Reliable Computer Systems: Design and Evaluation, 2nd ed., Burlington, MA: Digital Press, 1992, pp. 771-786.
  9. T. J. Dell, A White Paper on the Benefits of Chipkill-Correct ECC for PC Serve Main Memory. Watson, NY: IBM Microelectronics Division, 1997.
  10. S. Kim and A. K. Somani, "Area efficient architectures for information integrity in cache memories," Proceedings of the 26th Annual International Symposium on Computer Architecture, Atlanta, GA, 1999, pp. 246-255. https://doi.org/10.1145/300979.301000
  11. C. H. Chen and A. K. Somani, "Fault-containment in cache memories for TMR redundant processor systems," IEEE Transactions on Computers, vol. 48, no. 4, pp. 386-397, Apr. 1999. https://doi.org/10.1109/12.762529
  12. T. M. Austin, "DIVA: a reliable substrate for deep submicron microarchitecture design," Proceedings of the 32nd Annual International Symposium on Microarchitecture, Haifa , Israel, 1999, pp. 196-207. https://doi.org/10.1109/MICRO.1999.809458
  13. J. Ray, J. C. Hoe, and B. Falsafi, "Dual use of superscalar datapath for transient-fault detection and recovery," Proceedings of the 34th ACM/IEEE International Symposium on Microarchitecture, Austin, TX, 2001, pp. 214-224.
  14. G. Memik, M. T. Kandemir, and O. Ozturk, "Increasing register file immunity to transient errors," Proceedings of the Design, Automation and Test in Europe, Munich, Germany, 2005, pp. 586-591. https://doi.org/10.1109/DATE.2005.181
  15. W. M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, "The superblock: an effective technique for VLIW and superscalar compilation," Journal of Supercomputing, vol. 7, no. 1-2, pp. 229-248, May. 1993. https://doi.org/10.1007/BF01205185
  16. S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, "Effective compiler support for predicated execution using the hyperblock," Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, OR, 1992, pp. 45-54. https://doi.org/10.1145/144953.144998
  17. "Trimaran: an infrastructure for research in backend compilation and architecture exploration," http://www.trimaran.org.
  18. C. Lee, M. Potkonjak, and W. H. Mangione-Smith, "MediaBench: a tool for evaluating and synthesizing multimedia and communications systems," Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture, Research Triangle Park, NC, 1997, pp. 330-335. https://doi.org/10.1109/MICRO.1997.645830
  19. S. S. Muchnick, Advanced Compiler Design and Implementation. San Francisco, CA: Morgan Kaufmann Publishers, 1997.
  20. S. W. Kim and A. K. Somani, "Soft error sensitivity characterization for microprocessor dependability enhancement strategy," Proceedings. International Conference on Dependable Systems and Networks, Washington, DC, 2002, pp. 416-425. https://doi.org/10.1109/DSN.2002.1028927
  21. N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel, "Characterizing the effects of transient faults on a high-performance processor pipeline," International Conference on Dependable Systems and Networks, Florence, Italy, 2004, pp. 61-70. https://doi.org/10.1109/DSN.2004.1311877
  22. A. Biswas, P. Racunas, R. Cheveresan, J. Emer, S. S. Mukherjee, and R. Rangan, "Computing architectural vulnerability factors for address-based structures," 32nd Interntional Symposium on Computer Architecture, Madison, WI, 2005, pp. 532-543. https://doi.org/10.1109/ISCA.2005.18
  23. A. Biswas, C. Recchia, S. S. Mukherjee, V. Ambrose, L. Chan, A. Jaleel, A. E. Papathanasiou, M. Plaster, and N. Seifert, "Explaining cache SER anomaly using relative DUE AVF measurement," The 16th IEEE International Symposium on High-Performance Computer Architecture, Bangalore, India, 2010, pp. 1-12. https://doi.org/10.1109/HPCA.2010.5416629
  24. N. K. Soundararajan, A. Parashar, and A. Sivasubramaniam, "Mechanisms for bounding vulnerabilities of processor structures," The 34th Annual International Symposium on Computer Architecture, San Diego, CA, 2007, pp. 506-515. https://doi.org/10.1145/1250662.1250725
  25. N. Soundararajan, A. Sivasubramaniam, and V. Narayanan, "Characterizing the soft error vulnerability of multicores running multithreaded applications," Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, New York, NY, 2010, pp. 379-380. https://doi.org/10.1145/1811039.1811096
  26. J. Lee and A. Shrivastava, "A compiler-microarchitecture hybrid approach to soft error reduction for register files," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 7, pp. 1018-1027, Jul. 2010. https://doi.org/10.1109/TCAD.2010.2049050