Improving Goodput by Coscheduling CPU and Network Capacity
- 1 August 1999
- journal article
- research article
- Published by SAGE Publications in The International Journal of High Performance Computing Applications
- Vol. 13 (3), 220-230
- https://doi.org/10.1177/109434209901300305
Abstract
In a cluster computing environment, executable, checkpoint, and data files must be transferred between application submission and execution sites. As the memory footprint of cluster applications increases, saving and restoring the state of a computation in such an environment may require substantial network resources at both the start and the end of a CPU allocation. During the allocation, the application may also consume network bandwidth to periodically transfer a checkpoint back to the submission site or checkpoint server and to access remote data files. Under most circumstances, the application cannot use the allocated CPU while these transfers are in progress. Furthermore, if the application is unable to transfer a checkpoint or successfully migrate at preemption time, work already accomplished by the application is lost. The authors define goodputas the allocation time when a remotely executing application uses the CPU to make forward progress. Goodput can be significantly less than allocated throughput due to network activity. The authors are currently engaged in an effort to develop coscheduling techniques for CPU and network resources that will improve the goodput delivered by Condor pools. They report techniques that they have developed so far, how they were implemented in Condor, and their preliminary impact on the goodput of the authors’ production Condor pool.Keywords
This publication has 2 references indexed in Scilit:
- Resource management through multilateral matchmakingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The available capacity of a privately owned workstation environmentPerformance Evaluation, 1991