Fault-tolerance: The survival attribute of digital systems
- 1 January 1978
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE
- Vol. 66 (10), 1109-1125
- https://doi.org/10.1109/proc.1978.11107
Abstract
Fault-tolerance is the architectural attribute of a digital system that keeps the logic machine doing its specified tasks when its host, the physical system, suffers various kinds of failures of its components. A more general concept of fault-tolerance also includes human mistakes committed during software and hardware implementation and during man/machine interaction among the causes of faults that are to be tolerated by the logic machine. This paper discusses the concept of faulttolerance, the reasons for its inclusion in digital system architecture, and the methods of its implementation. A chronological view of the evolution of fault-tolerant systems and an outline of some goals for its further development conclude the presentation.Keywords
This publication has 24 references indexed in Scilit:
- STAREX SELF-REPAIR ROUTINES: SOFTWARE RECOVERY IN THE JPL-STAR COMPUTERPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Computer science as empirical inquiryCommunications of the ACM, 1976
- Development of On-board Space Computer SystemsIBM Journal of Research and Development, 1976
- System structure for software fault toleranceIEEE Transactions on Software Engineering, 1975
- A view of program verificationACM SIGPLAN Notices, 1975
- LAMP: System DescriptionBell System Technical Journal, 1974
- Reliability modeling techniques for self-repairing computer systemsPublished by Association for Computing Machinery (ACM) ,1969
- No. 1 ESS Maintenance PlanBell System Technical Journal, 1964
- Design of Serviceability Features for the IBM System/360IBM Journal of Research and Development, 1964
- Reliable circuits using less reliable relaysJournal of the Franklin Institute, 1956