Fault-Tolerant Reference Counting for Garbage Collection in Distributed Systems

Abstract
The function of a garbage collector in a computer system is to reclaim storage that is no longer in use. Developing a garbage collector for a distributed system composed of autonomous computers (nodes) connected by a communication network poses a challenging problem: optimising performance whilst achieving fault-tolerance. The paper presents the design and implementation of a reference-count garbage collection scheme which is both efficient and fault-tolerant. A distributed object-based system is considered where operations on remote objects are invoked via remote procedure calls. The orphan treatment scheme associated with remote procedure calls has been enhanced to enable the collection of garbage arising from node crashes.