Abstract
A parallel algorithm for Gaussian elimination (GE) is described, which solves a linear system of size n using m ≤ n parallel processors and a shared random access memory. Converting the serial GE algorithm to parallel form involves scheduling its computation DAG (directed acyclic graph) on m processors. A lower bound for schedule length is established for dense GE DAG's and it is proved that the proposed algorithm produces schedules which achieve these bounds. Finally, both the construction and execution of the schedule are incorporated into a single concurrent program which is shown to run in optimal time.