Improving Goodput by Coscheduling CPU and Network Capacity

1 August 1999

journal article
research article
Published by SAGE Publications in The International Journal of High Performance Computing Applications

Vol. 13 (3), 220-230
https://doi.org/10.1177/109434209901300305

Abstract

In a cluster computing environment, executable, checkpoint, and data files must be transferred between application submission and execution sites. As the memory footprint of cluster applications increases, saving and restoring the state of a computation in such an environment may require substantial network resources at both the start and the end of a CPU allocation. During the allocation, the application may also consume network bandwidth to periodically transfer a checkpoint back to the submission site or checkpoint server and to access remote data files. Under most circumstances, the application cannot use the allocated CPU while these transfers are in progress. Furthermore, if the application is unable to transfer a checkpoint or successfully migrate at preemption time, work already accomplished by the application is lost. The authors define goodputas the allocation time when a remotely executing application uses the CPU to make forward progress. Goodput can be significantly less than allocated throughput due to network activity. The authors are currently engaged in an effort to develop coscheduling techniques for CPU and network resources that will improve the goodput delivered by Condor pools. They report techniques that they have developed so far, how they were implemented in Condor, and their preliminary impact on the goodput of the authors’ production Condor pool.

Keywords

This publication has 2 references indexed in Scilit:

Resource management through multilateral matchmaking
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
The available capacity of a privately owned workstation environment
Performance Evaluation, 1991

Cited by 16 articles