Adapting to Intermittent Faults in Future Multicore Systems
- 1 September 2007
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 1089795X,p. 431
- https://doi.org/10.1109/pact.2007.4336259
Abstract
As technology continues to scale, future multicore processors become more susceptible to a variety of hardware failures. In particular, intermittent faults, are expected to become especially problematic (S. Borkar et al., 2003), (C. Constantinescu, 2007). A circuit is susceptible to intermittent faults when manufacturing process variation or in-progress wear-out causes the parameters (e.g., resistance, threshold voltage, etc.) of devices within the circuit to vary beyond design expectations (C. Constantinescu, 2007). This susceptibility, combined with certain operating conditions, such as thermal hot-spots and voltage fluctuations, can result in timing errors - even if these temperatures and voltages, for example, are well within the specified "acceptable" margins. Unlike transient faults, which disappear quickly, or permanent faults, which persist indefinitely, the occurrence of intermittent faults is bursty in nature. Depending on the cause, these bursts of frequent faults can last from several cycles to several seconds or more, effectively rendering a core useless during this time.Keywords
This publication has 1 reference indexed in Scilit:
- Hardware support for spin management in overcommitted virtual machinesPublished by Association for Computing Machinery (ACM) ,2006