On the fault tolerance of some popular bounded-degree networks

Abstract
The authors analyze the fault-tolerance properties of several bounded-degree networks that are commonly used for parallel computation. Among other things, they show that an N-node butterfly containing N/sup 1- epsilon / worst-case faults (for any constant epsilon >0) can emulate a fault-free butterfly of the same size with only constant slowdown. Similar results are proved for the shuffle-exchange graph. Hence, these networks become the first connected bounded-degree networks known to be able to sustain more than a constant number of worst-case faults without suffering more than a constant-factor slowdown in performance. They also show that an N-node butterfly whose nodes fail with some constant probability p can emulate a fault-free version of itself with a slowdown of 2/sup O(log* N)/, which is a very slowly increasing function of N. The proofs of these results combine the technique of redundant computation with new algorithms for routing packets around faults in hypercubic networks. Techniques for reconfiguring hypercubic networks around faults that do not rely on redundant computation are also presented. These techniques tolerate fewer faults but are more widely applicable since they can be used with other networks such as binary trees and meshes of trees.

This publication has 32 references indexed in Scilit: