Efficient robust parallel computations

Abstract
A parallel computing system becomes increasingly prone to failure as the number of processing elements in it increases. In this paper, we describe a completely general strategy that takes an arbitrary step of an ideal CRCW PRAM and automatically translates it to run efficiently and robustly on a PRAM in which processors are prone to failure. The strategy relies on efficient robust algorithms for solving a core problem, the Certified Write-All Problem. This problem characterizes the core of robustness, because, as we show, its complexity is equal to that of any general strategy for realizing robustness in the model. We analyze the expected parallel time and work of various algorithms for solving this problem. Our results are a non-trivial generalization of Brent's