Fault-Tolerant VLSI Systolic Arrays and Two-Level Pipelining

Abstract
This paper addresses two important problems in systolic arrays: fault-tolerance and two-level pipelining. The fault-tolerant scheme we propose maintains the original data flow patterns by simply by-passing defective cells with a small number of registers. As a result, the desirable properties of systolic arrays, such as local and regular communication, massive parallelism and high data throughput, are all preserved. 'Iwo-level pipelining refers to the use of pipelined functional units in the implementation of systolic cells. This paper also addresses the problem of efficiently utilizing such units to increase overall system throughput. We show that both of these problems can be reduced to the same mathematical problem of incorporating extra delays on certain data paths in originally correct systolic designs. We introduce the mathematical notion of a cut which enables us to handle this problem systematically. The results obtained by applying the techniques described in this paper arc encouraging. When applied to systolic arrays without feedback cycles, the arrays can tolerate large numbers of faults with the addition of very little hardware, while maintaining the original throughput. Furthermore, all of the pipeline stages in the cells can be kept fully utilized through the addition of a small number of delay registers. However, adding delays to systolic arrays with cycles typically induces a significant decrease in throughput. In response to this, we have derived a new class of systolic algorithms in which the data cycle around a ring of processing cells The systolic ring architecture has the property of degrading gracefully as cells fail. It can be used in place of many systolic arrays with feedback cycles. Using our cut theory for arrays without feedback and the ring architecture approach for arrays with feedback, we have an effective fault-tolerant scheme for every systolic array that we have considered. Furthermore, as by-products of the ring architecture approach we have derived new systolic algorithms. These algorithms generally require only one-third to one-half of the number of cells used in previous designs to achieve the same throughput. Included in these new systolic algorithms are ones for LU-decomposition, QR-decomposition and the solution of triangular linear systems.