Design and Evaluation of an HPVM-Based Windows NT Supercomputer

Abstract
We describe the design and evaluation of a 192-processor Windows NT cluster for high performance computing based on the High Performance Virtual Machine (HPVM) communication suite. While other clusters have been described in the literature, building a 58 GFlop/s NT cluster to be used as a general-purpose production machine for NCSA required solving new problems. The HPVM software meets the challenges represented by the large number of processors, the peculiarities of the NT operating system, the need for a production-strength job submission facility, and the requirement for mainstream programming interfaces. First, HPVM provides users with a collection of standard APIs like MPI, Shmem, Global Arrays with supercomputer class performance (13 μs minimum latency, 84 MB/s peak bandwidth for MPI), efficiently delivering Myrinet’s hardware performance to application programs. Second, HPVM provides cluster management and scheduling (through integration with Platform Computing’s LSF). Finally, HPVM addresses Windows NT’s remote access problem, providing convenient remote access and job control (through a graphical Java-applet front-end). Given the production nature of the cluster, the performance characterization is largely based on a sample of the NCSA scientific applications the machine will be running. The side-by-side comparison with other present-generation NCSA supercomputers shows the cluster to be within a factor of 2 to 4 of the SGI Origin 2000 and Cray T3E performance at a fraction of the cost. The inherent scalability of the cluster design produces a comparable or better speedup than the Origin 2000 despite a limitation in the HPVM flow control mechanism.

This publication has 7 references indexed in Scilit: