Checkpointing multicomputer applications
- 10 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
The authors present a checkpointing scheme that is transparent, imposes overhead only during checkpoints, requires minimal message logging, and allows for quick resumption of execution from a checkpointed image. Since checkpointing multicomputer applications poses requirements different from those posed by checkpointing general distributed systems, existing distributed checkpointing schemes are inadequate for multicomputer checkpointing. The proposed checkpointing scheme makes use of special properties of multicomputer interconnection networks to satisfy this set of requirements. The proposed algorithm is efficient both when taking checkpoints and when recovering from checkpointed images.Keywords
This publication has 13 references indexed in Scilit:
- Message-optimal incremental snapshotsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- The inhibition spectrum and the achievement of causal consistencyPublished by Association for Computing Machinery (ACM) ,1990
- The iPSC/2 direct-connect communications technologyPublished by Association for Computing Machinery (ACM) ,1988
- On distributed snapshotsInformation Processing Letters, 1987
- Deadlock-Free Message Routing in Multiprocessor Interconnection NetworksIEEE Transactions on Computers, 1987
- Optimistic recovery in distributed systemsACM Transactions on Computer Systems, 1985
- Distributed snapshotsACM Transactions on Computer Systems, 1985
- Efficient commit protocols for the tree of processes model of distributed transactionsPublished by Association for Computing Machinery (ACM) ,1983
- PublishingPublished by Association for Computing Machinery (ACM) ,1983
- The Recovery Manager of the System R Database ManagerACM Computing Surveys, 1981