Monitoring distributed systems

1 March 1987

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer Systems

Vol. 5 (2), 121-150
https://doi.org/10.1145/13677.22723

Abstract

The monitoring of distributed systems involves the collection, interpretation, and display of information concerning the interactions among concurrently executing processes. This information and its display can support the debugging, testing, performance evaluation, and dynamic documentation of distributed systems. General problems associated with monitoring are outlined in this paper, and the architecture of a general purpose, extensible, distributed monitoring system is presented. Three approaches to the display of process interactions are described: textual traces, animated graphical traces, and a combination of aspects of the textual and graphical approaches. The roles that each of these approaches fulfill in monitoring and debugging distributed systems are identified and compared. Monitoring tools for collecting communication statistics, detecting deadlock, controlling the non-deterministic execution of distributed systems, and for using protocol specifications in monitoring are also described. Our discussion is based on experience in the development and use of a monitoring system within a distributed programming environment called Jade. Jade was developed within the Computer Science Department of the University of Calgary and is now being used to support teaching and research at a number of university and research organizations.

Keywords

This publication has 9 references indexed in Scilit:

Graphical views of parallel programs
ACM SIGSOFT Software Engineering Notes, 1986
Multibug: Interative Debugging in Distributed Systems
IEEE Micro, 1986
Distributed process groups in the V Kernel
ACM Transactions on Computer Systems, 1985
Techniques for Algorithm Animation
IEEE Software, 1985
INCENSE
ACM SIGGRAPH Computer Graphics, 1983
Development of a debugger for a concurrent language
ACM SIGSOFT Software Engineering Notes, 1983
The Interlisp Programming Environment
Computer, 1981
Thoth, a portable real-time operating system
Communications of the ACM, 1979
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM, 1978

Cited by 154 articles