Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating system
- 30 December 2002
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
The authors present a measurement-based study of software failures and recovery in the Tandem GUARDIAN90 operating system using a collection of memory dump analyses of field software failures. They identify the effects of software faults on the processor state and trace the propagation of the effects to other areas of the system. They also evaluate the role of the defensive programming techniques and the software fault tolerance of the process pair mechanism implemented in the Tandem system. Results show that the Tandem system tolerates nearly 82% of reported field software faults, thus demonstrating the effectiveness of the system against software faults. Consistency checks made by the operating system detect 52% of software problems and prevent any error propagation in 31% of software problems. Results also show that 72% of reported field software failures are recurrences of known software faults and 70% of the recurrence groups have identical characteristics.Keywords
This publication has 14 references indexed in Scilit:
- Analysis of software halts in the tandem GUARDIAN operating systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Software defects and their impact on system availability-a study of field failures in operating systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Measurement-based evaluation of operating system fault toleranceIEEE Transactions on Reliability, 1993
- Orthogonal defect classification-a concept for in-process measurementsIEEE Transactions on Software Engineering, 1992
- Effect of System Workload on Operating System Reliability: A Study on IBM 3081IEEE Transactions on Software Engineering, 1985
- Dependability Evaluation of Software Systems in OperationIEEE Transactions on Software Engineering, 1984
- A Study of Software Failures and Recovery in the MVS Operating SystemIEEE Transactions on Computers, 1984
- Software errors and complexity: an empirical investigation0Communications of the ACM, 1984
- Evaluating software development by error analysis: The data from the Architecture Research FacilityJournal of Systems and Software, 1980
- System structure for software fault toleranceIEEE Transactions on Software Engineering, 1975