Adapting to Intermittent Faults in Future Multicore Systems

Abstract
As technology continues to scale, future multicore processors become more susceptible to a variety of hardware failures. In particular, intermittent faults, are expected to become especially problematic (S. Borkar et al., 2003), (C. Constantinescu, 2007). A circuit is susceptible to intermittent faults when manufacturing process variation or in-progress wear-out causes the parameters (e.g., resistance, threshold voltage, etc.) of devices within the circuit to vary beyond design expectations (C. Constantinescu, 2007). This susceptibility, combined with certain operating conditions, such as thermal hot-spots and voltage fluctuations, can result in timing errors - even if these temperatures and voltages, for example, are well within the specified "acceptable" margins. Unlike transient faults, which disappear quickly, or permanent faults, which persist indefinitely, the occurrence of intermittent faults is bursty in nature. Depending on the cause, these bursts of frequent faults can last from several cycles to several seconds or more, effectively rendering a core useless during this time.

This publication has 1 reference indexed in Scilit: