Dynamic control of performance monitoring on large scale parallel systems
- 1 August 1993
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 185-194
- https://doi.org/10.1145/165939.165969
Abstract
Performance monitoring of large scale parallel computers creates a dilemma: we need to collect detailed information to find performance bottlenecks, yet collecting all this data can introduce serious data collection bottlenecks. At the same time, users are being inundated with volumes of complex graphs and tables that require a performance expert to interpret. We present a new approach called the W3 Search Model, that addresses both these problems by combining dynamic on-the-fly selection of what performance data to collect with decision support to assist users with the selection and presentation of performance data. We present a case study describing how a prototype implementation of our technique was able to identify the bottlenecks in three real programs. In addition, we were able to reduce the amount of performance data collected by a factor ranging from 13 to 700 compared to traditional sampling and trace based instrumentation techniques.Keywords
This publication has 12 references indexed in Scilit:
- SPLASHACM SIGARCH Computer Architecture News, 1992
- A portable platform for distributed event environmentsPublished by Association for Computing Machinery (ACM) ,1991
- Visualizing the performance of parallel programsIEEE Software, 1991
- Performance debugging shared memory multiprocessor programs with MTOOLPublished by Association for Computing Machinery (ACM) ,1991
- A hardware-based performance monitor for the Intel iPSC/2 hypercubePublished by Association for Computing Machinery (ACM) ,1990
- IPS-2: the second generation of a parallel program measurement systemIEEE Transactions on Parallel and Distributed Systems, 1990
- Quartz: a tool for tuning parallel program performancePublished by Association for Computing Machinery (ACM) ,1990
- Monitoring and performance measuring distributed systems during operationPublished by Association for Computing Machinery (ACM) ,1988
- A language and system for the construction and tuning of parallel programsIEEE Transactions on Software Engineering, 1988
- GprofPublished by Association for Computing Machinery (ACM) ,1982