Minimizing monitoring costs: choosing between tracing and sampling

Abstract
A method is presented for reducing communication costs in parallel and distributed systems that use message passing to transmit monitoring information. As groundwork, a hierarchical model of monitoring is reviewed and an existing, sample, distributed environment is briefly described. A probabilistic model is presented for quantifying the cost of monitoring a set of conditions when data collection is done by sampling or tracing. With the model one can select an optimal set of conditions to trace in order to minimize the amount of intercommunication. Because finding an optimal set is difficult, a simple greedy algorithm that finds good solutions is presented, and an empirical analysis of its performance is given.

This publication has 3 references indexed in Scilit: