A comparison of receiver-initiated and sender-initiated adaptive load sharing (extended abstract)

1 January 1985

proceedings article
Published by Association for Computing Machinery (ACM)

Vol. 13 (2), 1-3
https://doi.org/10.1145/317795.317802

Abstract

One goal of locally distributed systems is to facilitate resource sharing. Most current locally distributed systems, however, share primarily data, data storage devices, and output devices; there is little sharing of computational resources. Load sharing is the process of sharing computational resources by transparently distributing the system workload. System performance can be improved by transferring work from nodes that are heavily loaded to nodes that are lightly loaded.Load sharing policies may be either static or adaptive. Static policies use only information about the average behavior of the system; transfer decisions are independent of the actual current system state. Static policies may be either deterministic (e.g., “transfer all compilations originating at node A to server B”) or probabilistic (e.g., “transfer half of the compilations originating at node A to server B, and process the other half locally”).Numerous static load sharing policies have been proposed. Early studies considered deterministic rules [Stone 1977, 1978; Bokhari 1979]. More recently, Tantawi and Towsley [1985] have developed a technique to find optimal probabilistic rules.The principal advantage of static policies is their simplicity: there is no need to maintain and process system state information. Adaptive policies , by contrast, are more complex, since they employ information on the current system state in making transfer decisions. This information makes possible significantly greater performance benefits than can be achieved under static policies. This potential was clearly indicated by Livny and Melman [1982], who showed that in a network of homogeneous, autonomous nodes there is a high probability that at least one node is idle while tasks are queued at some other node, over a wide range of network sizes and average node utilizations.In previous work [Eager, Lazowska & Zahorjan 1984] we considered the appropriate level of complexity for adaptive load sharing policies. (For example, how much system state information should be collected, and how should it be used in making transfer decisions?) Rather than advocating specific policies, we considered fairly abstract strategies exhibiting various levels of complexity. We demonstrated that the potential of adaptive load sharing can in fact be realized by quite simple strategies that the use only small amounts of system state information. This result is important because of a number of practical concerns regarding complex policies: the effect of the overhead required to administer a complex policy, the effect of the inevitable inaccuracies in detailed information about system state and workload characteristics, and the potential for instability. (We consciously use the phrase “load sharing” rather than the more common “load balancing” to highlight the fact that load balancing, with its implication of attempting to equalize queue lengths system-wide, is not an appropriate objective.)Adaptive load sharing policies can employ either centralized or distributed control. Distributed control strategies can be of two basic types (although intermediate strategies also are conceivable): sender-initiated (in which congested nodes search for lightly loaded nodes to which work may be transferred), and receiver-initiated (in which lightly loaded nodes search for congested nodes from which work may be transferred). Our earlier paper considered distributed, sender-initiated policies - a sufficiently rich class to allow us to answer the fundamental questions of policy complexity that we were addressing. In the course of understanding the reasons for the degradation of these policies at high system loads, we were led to consider receiver-initiated policies as a possible alternative. The comparison of receiver-initiated and sender-initiated adaptive load sharing is the purpose of the present paper.There have been several experimental studies, using prototypes and simulation models, of specific (typically fairly complex) adaptive load sharing policies [Bryant & Finkel 1981; Livny & Melman 1982; Kreuger & Finkel 1984; Barak & Shiloh 1984]. Both sender-initiated policies and receiver-initiated policies have been considered. However, there has not previously been a rigorous comparison of these two strategies. Such a comparison is made difficult by the problem of choosing appropriate representative policies of each type, and by the potentially quite different costs incurred in effecting transfers. (Receiver-initiated policies typically will require the transfer of executing tasks, which incurs substantial costs in most systems [Powell & Miller 1983]. Sender-initiated policies naturally avoid such costly transfers, since tasks can be transferred upon arrival, prior to beginning execution.)Our present paper is similar to our previous work in that our purpose, rather than to advocate specific policies, is to address a fundamental question concerning policies in general: How should system state information be collected and load sharing actions initiated - by potential receivers of work, or by potential senders of work? In studying this question we consider a set of abstract policies that represent only the essential aspects of receiver-initiated and sender-initiated load sharing strategies. These policies are investigated using simple analytic models. Our objective is not to determine the absolute performance of particular load sharing policies, but rather to gain intuition regarding the relative merits of the different approaches under consideration.We represent locally distributed systems as collections of identical nodes, each consisting of a single processor. The nodes are connected by a local area network (e.g., an Ethernet). All nodes are subjected to the same average arrival rate of tasks, which are of a single type.In contrast to most previous papers on load sharing, we represent the cost of task transfer as a processor...

Keywords

Cited by 20 articles