Decoupling synchronization and data transfer in message passing systems of parallel computers

Abstract
Synchronization is an important issue for the design of a sca lable par- allel computer, and some systems include special hardware support for control messages or barriers. The cost of synchronizati on has a high impact on the design of the message passing (communication) services. In this paper, we investigate three different com munication libraries that are tailored toward the synchronization ser vices avail- able: (1) a version of generic send-receivemessage passing (PVM), which relies on traditional flow control and buffering to syn chronize the data transfers; (2) message passing with pulling, i.e. a message is transferred only when the recipient is ready and requests it (as, e.g., used in NX for large messages); and (3) the decoupled direct deposit message passing that uses separate, global synchronizatio n to ensure that nodes send messages only when the message data can be deposited directly into the final destination in the memory of the remote recipient. Measurements of these three styles on a Cray T3D demonstrate the benefits of the decoupled message pass- ing with direct deposit. The performance advantage of this s tyle is made possible by (1) preemptive synchronization to avoid unneces- sary copies of the data, (2) high-speed barrier synchroniza tion, and (3) improved congestion control in the network. The designers of the communication system of future parallel computers are therefore strongly encouraged to provide good synchronization facilities in ad- dition to high throughput data transfers to support high per formance message passing.