{"title":"The PTC scheme for designing loosely coupled recoverable processes: issues in realizing bounded recovery time","authors":"K.H. Kim","doi":"10.1109/FTDCS.1992.217482","DOIUrl":null,"url":null,"abstract":"The technology for designing loosely coupled distributed computer systems (DCSs) required to tolerate propagated errors caused by software and/or hardware has remained in an immature state. This paper focuses on the type of DCS applications where a system is structured as a set of loosely coupled interacting processes distributed among multiple physical sites and each process is designed in the 'partitioned design' mode, i.e. designed with its interface specification only, rather than with full knowledge of interfaces between other processes (or sites). The thesis is that fault tolerance capabilities must be designed into loosely coupled processes without violating the design policy. The programmer-transparent coordination (PTC) scheme is one such approach that has been evolving since 1978. While the basic PTC scheme called the PTC/OR (PTC with obedient receiver) scheme is a scheme for facilitating various forms of cooperative backward recovery in systems of loosely coupled processes, it has one drawback: the difficulty of bounding worst-case recovery time. After discussing various possible solution approaches and their limitations, a promising approach called the PTC/SL (PTC with session leaders) scheme which superimposes additional rules on structuring process interactions onto those of the PTC/OR scheme, is presented. Under the PTC/SL scheme various flexible forms of process interactions are still allowed while the task of ensuring bounded recovery time is made a simple one. Several research issues related to the PTC/SL scheme, e.g., efficient implementation techniques, remain as subjects for future research.<<ETX>>","PeriodicalId":186762,"journal":{"name":"Proceedings of the Third Workshop on Future Trends of Distributed Computing Systems","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third Workshop on Future Trends of Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FTDCS.1992.217482","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The technology for designing loosely coupled distributed computer systems (DCSs) required to tolerate propagated errors caused by software and/or hardware has remained in an immature state. This paper focuses on the type of DCS applications where a system is structured as a set of loosely coupled interacting processes distributed among multiple physical sites and each process is designed in the 'partitioned design' mode, i.e. designed with its interface specification only, rather than with full knowledge of interfaces between other processes (or sites). The thesis is that fault tolerance capabilities must be designed into loosely coupled processes without violating the design policy. The programmer-transparent coordination (PTC) scheme is one such approach that has been evolving since 1978. While the basic PTC scheme called the PTC/OR (PTC with obedient receiver) scheme is a scheme for facilitating various forms of cooperative backward recovery in systems of loosely coupled processes, it has one drawback: the difficulty of bounding worst-case recovery time. After discussing various possible solution approaches and their limitations, a promising approach called the PTC/SL (PTC with session leaders) scheme which superimposes additional rules on structuring process interactions onto those of the PTC/OR scheme, is presented. Under the PTC/SL scheme various flexible forms of process interactions are still allowed while the task of ensuring bounded recovery time is made a simple one. Several research issues related to the PTC/SL scheme, e.g., efficient implementation techniques, remain as subjects for future research.<>
设计能够容忍由软件和/或硬件引起的传播性错误的松耦合分布式计算机系统(dcs)的技术仍然处于不成熟的状态。本文关注的是DCS应用类型,其中系统结构为一组松散耦合的交互过程,分布在多个物理站点中,每个过程以“分区设计”模式设计,即仅根据其接口规范设计,而不是充分了解其他过程(或站点)之间的接口。本文的论点是,容错能力必须在不违反设计策略的情况下设计成松散耦合的过程。程序员透明协调(PTC)方案就是自1978年以来不断发展的一种方法。基本的PTC方案称为PTC/OR (PTC with obedient receiver)方案,是一种促进松散耦合过程系统中各种形式的协作向后恢复的方案,但它有一个缺点:难以限定最坏情况恢复时间。在讨论了各种可能的解决方法及其局限性之后,提出了一种很有前途的方法,称为PTC/SL(带会话领导者的PTC)方案,该方案在PTC/OR方案的基础上附加了结构化进程交互的规则。在PTC/SL方案下,各种灵活形式的进程交互仍然被允许,而确保有界恢复时间的任务变得简单。与PTC/SL方案相关的几个研究问题,例如,有效的实施技术,仍然是未来研究的主题。