Browse > Article
http://dx.doi.org/10.3745/JIPS.2011.7.3.435

Probabilistic Soft Error Detection Based on Anomaly Speculation  

Yoo, Joon-Hyuk (College of Information and Communication Engineering, Daegu University)
Publication Information
Journal of Information Processing Systems / v.7, no.3, 2011 , pp. 435-446 More about this Journal
Abstract
Microprocessors are becoming increasingly vulnerable to soft errors due to the current trends of semiconductor technology scaling. Traditional redundant multi-threading architectures provide perfect fault tolerance by re-executing all the computations. However, such a full re-execution technique significantly increases the verification workload on the processor resources, resulting in severe performance degradation. This paper presents a pro-active verification management approach to mitigate the verification workload to increase its performance with a minimal effect on overall reliability. An anomaly-speculation-based filter checker is proposed to guide a verification priority before the re-execution process starts. This technique is accomplished by exploiting a value similarity property, which is defined by a frequent occurrence of partially identical values. Based on the biased distribution of similarity distance measure, this paper investigates further application to exploit similar values for soft error tolerance with anomaly speculation. Extensive measurements prove that the majority of instructions produce values, which are different from the previous result value, only in a few bits. Experimental results show that the proposed scheme accelerates the processor to be 180% faster than traditional fully-fault-tolerant processor with a minimal impact on overall soft error rate.
Keywords
Probabilistic Soft Error Detection; Reliability; Anomaly Speculation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 N. J. Wang and S. J. Patel, "Restore: Symptom based soft error detection in microprocessors", Proceedings of the International Conference on Dependable Systems and Networks, June, 2005.
2 J. Yoo and M. Franklin, "Hierarchical Verification for Increasing Performance in Reliable Processors", Journal of Electronic Testing: Theory and Applications, Vol.24, No.1-3, Springer, June, 2008.
3 M. H. Lipasti and J. P. Shen, "Exceeding dataflow limit via value prediction", Proceedings of the 29th International Symposium on Microarchitecture, December, 1996.
4 S. S. Mukherjee, M. Kontz and S. K. Reinhardt, "Detailed design and evaluation of redundant multithreading alternatives", Proceedings of the 29th International Symposium on Computer Architecture, June, 2002.
5 S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt and T. M. Austin, "A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor", Proceedings of the 36th International Symposium on Microarchitecture, December, 2003.
6 J. Ray, J. C. Hoe and B. Falsafi, "Dual use of superscalar datapath for transient-fault detection and recovery", Proceedings of the 34th International Symposium on Microarchitecture, December, 2001.
7 S. K. Reinhardt, "Using the M5 Simulator", ISCA tutorials and workshops, University of Michigan, June, 2005.
8 S. K. Reinhardt and S. S. Mukherjee, "Transient fault detection via simultaneous multithreading", Proceedings of the 27th International Symposium on Computer Architecture, June, 2000.
9 E. Rotenberg, "AR-SMT: A microarchitectural approach to fault tolerance in microprocessors", Proceedings of the 29th International Symposium on Fault-Tolerant Computing, June, 1999.
10 K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan and D. Tarjan, "Temparature-aware microarchitecture", Proceedings of the 30th International Symposium on Computer Architecture, June, 2003.
11 A. Sodani and G. S. Sohi, "Understanding the differences between value prediction and instruction reuse", Proceedings of the 31st International Symposium on Microarchitecture, December, 1998.
12 T. M. Austin, "DIVA: A reliable substrate for deep submicron microarchitecture design", Proceedings of the 32nd International Symposium on Microarchitecture, November, 1999.
13 R. Gonzalez, A. Cristal, D. Ortega, A. Veidenbaum and M. Valero, "A content aware integer register file organization", Proceedings of the 31st International Symposium on Computer Architecture, June, 2004.
14 D. Brooks and M. Martonosi, "Dynamically exploiting narrow width operands to improve processorpower and performance", Proceedings of the 5th International Symposium on High Performance Computer Architecture, January, 1999.
15 T. M. Cover and J. A. Thomas, "Elements of Information Theory", John Wiley and Sons, 1991.
16 M. A. Gomaa and T. N. Vijaykumar, "Opportunistic transient fault detection", Proceedings of the 32nd International Symposium on Computer Architecture, June, 2005.
17 J. L. Hennessy and D. A. Patterson, "Computer Architecture: A Quantitative Approach", Morgan Kaufmann, San Francisco, CA, 2003.
18 I. P. Hong, H. Y. Jeong and Y. S. Lee, "Physical register sharing through value similarity detection", IEICE Transactions on Information and Systems, E89-D, October, 2006.
19 J. Hu, S. Wang and S. G. Ziavras, "In-register duplication: Exploiting narrow-width value for improving register file reliability", Proceedings of the 2006 International Conference on Dependable Systems and Networks, June, 2006.
20 S. Kumar and A. Aggarwal, "Reducing resource redundancy for concurrent error detection techniques in high performance microprocessors", Proceedings of the 12th International Symposium on High-Performance Computer Architecture, February, 2006.
21 T. N. Vijaykumar, I. Pomeranz and K. Cheng, "Transient-fault recovery using simultaneous multithreading", Proceedings of the 29th International Symposium on Computer Architecture, June, 2002.
22 K. Sundaramoorthy, Z. Purser and E. Rotenberg, "Slipstream processors: Improving both performance and fault tolerance", Proceedings of the 33rd International Symposium on Microarchitecture, December, 2000.