Application-transparent process-level error recovery for multicomputers

[1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track Pub Date : 1989-01-03 DOI:10.1109/HICSS.1989.47170

Y. Tamir, T. Frazier

引用次数: 18

Abstract

An application-transparent, process-level, distributed error recovery scheme for multicomputers is proposed. Checkpointing is initiated by timers at intervals determined by the needs of the application. Checkpointing and recovery involve only as much of the system as is necessary: a set of interacting processes. Processes that are not part of the interacting set do not participate in checkpointing or recovery and continue to do useful work. Several checkpoint and/or recovery session may be active simultaneously. The scheme does not require significant overhead during normal operation, since it is not necessary to make message transmission atomic, acknowledge each message, or transmit checkbits with each packet. Variations of the technique using packet-switching or virtual circuits are discussed, and the scheme is compared to previously published techniques.<>

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多台计算机的应用程序透明进程级错误恢复

提出了一种应用透明、进程级、分布式的多机错误恢复方案。检查点由计时器启动，时间间隔由应用程序的需要决定。检查点和恢复只涉及系统中必要的部分:一组交互过程。不属于交互集的进程不参与检查点或恢复，并继续执行有用的工作。多个检查点和/或恢复会话可能同时处于活动状态。该方案在正常操作期间不需要大量开销，因为不需要使消息传输原子化、确认每条消息或与每个数据包一起传输校验位。讨论了使用分组交换或虚拟电路的技术变体，并将该方案与先前发表的技术进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

[1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track

自引率

0.00%

发文量

期刊最新文献

Instruction set architecture of an efficient pipelined dataflow architecture An integrated CAD system for algorithm-specific IC design A massive memory supercomputer Extended ASLM-a reconfigurable database machine NS32532: case study in general-purpose microprocessor design tradeoffs