Effect of System Workload on Operating System Reliability: A Study on IBM 3081

Abstract
This paper presents an analysis of operating system failures on an IBM 3081 running VM/SP. We find three broad categories of software failures: error handling (ERH), program control or logic (CTL), and hardware related (HS); it is found that more than 25 percent of software failures occur in the hardware/software interface. Measurements show that results on software reliability cannot be considered representative unless the system workload is taken into account. For example, it is shown that the risk of a software failure increases in a nonlinear fashion with the amount of interactive processing, as measured by parameters such as the paging rate and the amount of overhead (operating system CPU time). The overall CPU execution rate, although measured to be close to 100 percent most of the time, is not found to correlate strongly with the occurrence of failures. The paper discusses possible reasons for the observed workload failure dependency based on detailed investigations of the failure data.

This publication has 12 references indexed in Scilit: