Grid-level computing needs pervasive debugging

Abstract
Developing applications for parallel and distributed systems is hard due to their nondeterministic nature; developing debugging tools for such systems and applications is even harder. A number of distributed debugging tools and techniques exist; however, we believe that they lack the infrastructure to scale to large-scale distributed systems, systems with hundreds and thousands of nodes, such as grids. In this paper, we introduce PDB, our prototype debugger, which is based on a hierarchical, scalable architecture. We explain the design of the PDB, highlight its functionality, and demonstrate its usability with two case studies. Before concluding, we discuss portability and extensibility issues for PDB, and discuss some solutions.

This publication has 15 references indexed in Scilit: