首页 > 最新文献

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium最新文献

英文 中文
A software fault tolerance experiment for space applications 空间应用软件容错实验
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89363
D. Simon, C. Hourtolle, H. Biondi, J. Bernelas, P. Duverneuil, S. Gallet, P. Vielcanet, S. D. Viguerie, F. Gsell, J. Chelotti
The aim of the experiment described was to implement and assess fault-tolerant software within an industrial framework. Another significant aspect was to adapt the classical software engineering life cycle to this type of project. Two complementary techniques are considered: fault avoidance through the use of higher level language and strict development process; and fault tolerance by using techniques based on design diversity, such as N-version programming and recovery blocks, and exception handling. Starting from the specification of an existing spacecraft orbit and attitude control system, a 3-version software was developed, coded in Ada, and assessed in a fault-tolerant experimental testbed. The authors describe the experiment development and the main study results (on development efforts, observed diversity, and methodology aspects).<>
所描述的实验目的是在工业框架内实现和评估容错软件。另一个重要的方面是使经典的软件工程生命周期适应这种类型的项目。考虑了两种互补的技术:通过使用高级语言和严格的开发过程来避免错误;通过使用基于设计多样性的技术,如n版本编程和恢复块,以及异常处理,提高了容错能力。从现有航天器轨道和姿态控制系统的规范出发,开发了3个版本的软件,用Ada进行了编码,并在容错实验平台上进行了评估。作者描述了实验的发展和主要的研究结果(在发展努力、观察到的多样性和方法论方面)
{"title":"A software fault tolerance experiment for space applications","authors":"D. Simon, C. Hourtolle, H. Biondi, J. Bernelas, P. Duverneuil, S. Gallet, P. Vielcanet, S. D. Viguerie, F. Gsell, J. Chelotti","doi":"10.1109/FTCS.1990.89363","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89363","url":null,"abstract":"The aim of the experiment described was to implement and assess fault-tolerant software within an industrial framework. Another significant aspect was to adapt the classical software engineering life cycle to this type of project. Two complementary techniques are considered: fault avoidance through the use of higher level language and strict development process; and fault tolerance by using techniques based on design diversity, such as N-version programming and recovery blocks, and exception handling. Starting from the specification of an existing spacecraft orbit and attitude control system, a 3-version software was developed, coded in Ada, and assessed in a fault-tolerant experimental testbed. The authors describe the experiment development and the main study results (on development efforts, observed diversity, and methodology aspects).<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"581 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116067915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A highly decentralized implementation model for the programmer-transparent coordination (PTC) scheme for cooperative recovery 一种用于协作恢复的程序员透明协调(PTC)方案的高度分散的实现模型
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89376
K. Kim, J. You
The authors present an implementation model for the programmer-transparent coordination (PTC) scheme that fits well with local-area-network-(LAN-) based systems equipped with broadcasting channels. The model is a significant improvement over the earlier formulated implementation guidelines for developing LAN-based fault-tolerant distributed computer systems (DCSs). The model uses a highly decentralized broadcasting-based approach to the execution of PTC functions. The result is a significant reduction in the PTC-related message traffic, and the extent of reduction could be drastic in many application environments. Another major element of the model is a three-layer software structure in which distributed cooperating application processes and PTC-related operating system components are incorporated in modular forms amenable to cost-effective concurrent processing.<>
作者提出了一种程序透明协调(PTC)方案的实现模型,该方案很好地适用于配备广播信道的基于局域网(LAN)的系统。该模型是对早期制定的用于开发基于局域网的容错分布式计算机系统(dcs)的实现指南的重大改进。该模型使用高度分散的基于广播的方法来执行PTC功能。结果是与ptc相关的消息流量显著减少,并且在许多应用程序环境中减少的程度可能非常大。该模型的另一个主要元素是三层软件结构,在该结构中,分布式协作应用程序进程和与ptc相关的操作系统组件以模块化形式合并,以适应成本效益高的并发处理。
{"title":"A highly decentralized implementation model for the programmer-transparent coordination (PTC) scheme for cooperative recovery","authors":"K. Kim, J. You","doi":"10.1109/FTCS.1990.89376","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89376","url":null,"abstract":"The authors present an implementation model for the programmer-transparent coordination (PTC) scheme that fits well with local-area-network-(LAN-) based systems equipped with broadcasting channels. The model is a significant improvement over the earlier formulated implementation guidelines for developing LAN-based fault-tolerant distributed computer systems (DCSs). The model uses a highly decentralized broadcasting-based approach to the execution of PTC functions. The result is a significant reduction in the PTC-related message traffic, and the extent of reduction could be drastic in many application environments. Another major element of the model is a three-layer software structure in which distributed cooperating application processes and PTC-related operating system components are incorporated in modular forms amenable to cost-effective concurrent processing.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125419970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Availability evaluation of MIN-connected multiprocessors using decomposition technique 基于分解技术的min连接多处理机可用性评估
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89353
C. Das, L. Tien, L. Bhuyan
An analytical technique for the availability evaluation of multiprocessors using a multistage interconnection network (MIN) is presented. The MIN represents a Butterfly-type connection with a 4*4-switching element (SE). The novelty of this approach is that the complexity of constructing a single-level exact Markov chain (MC) is not required. By use of structural decomposition, the system is divided into three subsystems-processors, memories, and MIN. Two simple MCs are solved by using a software package, called HARP, to find the probability of i working processing elements (PEs) and j working memory modules (MMs) at time t. A second level of decomposition is then used to find the approximate number of SEs (x) required for connecting the i PEs and j MMs. A third MC is then solved to find the probability that the MIN will provide the necessary communication. The model has been validated through simulation for up to a 256-node configuration, the maximum size available for a commercial MIN-connected multiprocessor.<>
提出了一种基于多级互连网络的多处理机可用性评估分析方法。MIN表示带有4*4开关元件(SE)的蝴蝶型连接。这种方法的新颖之处在于不需要构造单层精确马尔可夫链(MC)的复杂性。通过使用结构分解,系统分为三个子系统-处理器,存储器和MIN。两个简单的mc通过使用一个名为HARP的软件包来解决,以找到i个工作处理元件(pe)和j个工作存储模块(mm)在时刻t的概率。然后使用第二级分解来找到连接i个pe和j个mm所需的se (x)的近似数量。然后求解第三个MC以找到MIN将提供必要通信的概率。该模型已通过仿真验证,最多可配置256个节点,这是商用min连接的多处理器的最大尺寸。
{"title":"Availability evaluation of MIN-connected multiprocessors using decomposition technique","authors":"C. Das, L. Tien, L. Bhuyan","doi":"10.1109/FTCS.1990.89353","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89353","url":null,"abstract":"An analytical technique for the availability evaluation of multiprocessors using a multistage interconnection network (MIN) is presented. The MIN represents a Butterfly-type connection with a 4*4-switching element (SE). The novelty of this approach is that the complexity of constructing a single-level exact Markov chain (MC) is not required. By use of structural decomposition, the system is divided into three subsystems-processors, memories, and MIN. Two simple MCs are solved by using a software package, called HARP, to find the probability of i working processing elements (PEs) and j working memory modules (MMs) at time t. A second level of decomposition is then used to find the approximate number of SEs (x) required for connecting the i PEs and j MMs. A third MC is then solved to find the probability that the MIN will provide the necessary communication. The model has been validated through simulation for up to a 256-node configuration, the maximum size available for a commercial MIN-connected multiprocessor.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128582346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Error models for robust storage structures 鲁棒存储结构的误差模型
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89396
David J. Taylor
The error models which have appeared in the literature are described and compared. The comparison includes an informal discussion and comparison of detectability and correctability results obtainable with the various models. The ideal comparison basis would be errors produced by real faults in real systems. No such data are available, and an experiment to obtain such data would be extremely costly. One particular case can be used: the errors resulting from crashes (partially completed updates of storage structures) are easily determined and are used as the final basis of comparison.<>
对文献中出现的误差模型进行了描述和比较。比较包括对各种模型的可探测性和可校正性结果的非正式讨论和比较。理想的比较依据是由实际系统中的实际故障产生的误差。没有这样的数据,而获得这样的数据的实验将是极其昂贵的。可以使用一种特殊情况:由崩溃(部分完成的存储结构更新)引起的错误很容易确定,并用作比较的最终基础。
{"title":"Error models for robust storage structures","authors":"David J. Taylor","doi":"10.1109/FTCS.1990.89396","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89396","url":null,"abstract":"The error models which have appeared in the literature are described and compared. The comparison includes an informal discussion and comparison of detectability and correctability results obtainable with the various models. The ideal comparison basis would be errors produced by real faults in real systems. No such data are available, and an experiment to obtain such data would be extremely costly. One particular case can be used: the errors resulting from crashes (partially completed updates of storage structures) are easily determined and are used as the final basis of comparison.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128872049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Identifying the cause of detected errors 识别检测到的错误的原因
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89365
C. Walter
The author presents an approach to the consistent diagnosis of error monitoring observations in a distributed fault-tolerant computing system, even when the faulty source produces arbitrary errors. He describes the online algorithm used in the multicomputer architecture for fault tolerance (MAFT) to diagnose faulty system elements. By the use of syndrome information which categorizes detected errors as either symmetric or asymmetric, bounds for correct diagnosis can be deduced. Finally, an interactive consistency algorithm is employed to guarantee consistent diagnosis in a distributed environment and to provide online verification of all diagnostic units.<>
作者提出了一种在分布式容错计算系统中,即使故障源产生任意错误,也能对错误监测观测结果进行一致诊断的方法。他描述了多计算机容错体系结构(MAFT)中用于诊断故障系统元素的在线算法。通过使用将检测到的错误分类为对称或不对称的证候信息,可以推导出正确诊断的界限。最后,采用交互式一致性算法保证分布式环境下诊断的一致性,并提供所有诊断单元的在线验证
{"title":"Identifying the cause of detected errors","authors":"C. Walter","doi":"10.1109/FTCS.1990.89365","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89365","url":null,"abstract":"The author presents an approach to the consistent diagnosis of error monitoring observations in a distributed fault-tolerant computing system, even when the faulty source produces arbitrary errors. He describes the online algorithm used in the multicomputer architecture for fault tolerance (MAFT) to diagnose faulty system elements. By the use of syndrome information which categorizes detected errors as either symmetric or asymmetric, bounds for correct diagnosis can be deduced. Finally, an interactive consistency algorithm is employed to guarantee consistent diagnosis in a distributed environment and to provide online verification of all diagnostic units.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126898454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Effects of transient gate-level faults on program behavior 瞬态门级故障对程序行为的影响
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89371
E. W. Czeck, D. Siewiorek
Effects of gate-level faults on program behavior are described and used as a basis for fault models at the program level. A simulation model of the IBM RT PC was developed and injected with 18900 gate-level transient faults. A comparison of the system state of good and faulted runs was made to observe internal propagation of errors, while memory traffic and program flow comparisons detected errors in program behavior. Results show several distinct classes of program-level error behavior, including program flow changes, incorrect memory bus traffic, and undetected but corrupted program state. Additionally, the dependencies of fault location, injection time, and workload on error detection coverage are reported. For the IBM RT PC, the error detection latency was shown to follow a Weibull distribution dependent on the error detection mechanism and the two selected workloads. These results aid in the understanding of the effects of gate-level faults and allow for the generation and validation of new fault models, fault injection methods, and error detection mechanisms.<>
门级故障对程序行为的影响被描述并用作程序级故障模型的基础。建立了IBM RT PC的仿真模型,并注入了18900个门级暂态故障。对正常运行和故障运行的系统状态进行比较,以观察错误的内部传播,而内存流量和程序流比较则检测程序行为中的错误。结果显示了几种不同类型的程序级错误行为,包括程序流更改、不正确的内存总线流量以及未检测到但已损坏的程序状态。此外,还报告了故障位置、注入时间和工作负载对错误检测覆盖率的依赖关系。对于IBM RT PC,错误检测延迟遵循威布尔分布,这取决于错误检测机制和两个选择的工作负载。这些结果有助于理解门级故障的影响,并允许生成和验证新的故障模型、故障注入方法和错误检测机制
{"title":"Effects of transient gate-level faults on program behavior","authors":"E. W. Czeck, D. Siewiorek","doi":"10.1109/FTCS.1990.89371","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89371","url":null,"abstract":"Effects of gate-level faults on program behavior are described and used as a basis for fault models at the program level. A simulation model of the IBM RT PC was developed and injected with 18900 gate-level transient faults. A comparison of the system state of good and faulted runs was made to observe internal propagation of errors, while memory traffic and program flow comparisons detected errors in program behavior. Results show several distinct classes of program-level error behavior, including program flow changes, incorrect memory bus traffic, and undetected but corrupted program state. Additionally, the dependencies of fault location, injection time, and workload on error detection coverage are reported. For the IBM RT PC, the error detection latency was shown to follow a Weibull distribution dependent on the error detection mechanism and the two selected workloads. These results aid in the understanding of the effects of gate-level faults and allow for the generation and validation of new fault models, fault injection methods, and error detection mechanisms.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128002116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 113
On the modeling of workload dependent memory faults 基于工作负载的内存故障建模
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89388
J. Dunkel
A modeling approach to investigating the interdependencies between memory faults and system performance is presented. Describing the program behavior by an independent reference model, the author develops a fault occurrence model that depends on workload characteristics such as task sojourn times, the number of page references, and the number of page faults. Determining the probability that a fault is detected at a page test, the author quantifies the workload required for fault handling. Using a queuing network in a stationary analysis, he evaluates the average performance decrease caused by memory faults. The interdependencies between performance and reliability quantities are described by a set of nonlinear equations. An iterative method for evaluating the model is given. The results of some experiments demonstrate that the performance decrease caused by memory error depends on system workload and operating system characteristics.<>
提出了一种研究内存故障与系统性能之间相互依赖关系的建模方法。通过一个独立的参考模型描述程序行为,作者开发了一个故障发生模型,该模型依赖于工作负载特征,如任务逗留时间、页面引用数量和页面错误数量。通过确定在页面测试中检测到故障的概率,作者量化了故障处理所需的工作负载。在平稳分析中使用排队网络,他评估了由内存故障引起的平均性能下降。性能量和可靠性量之间的相互依赖关系用一组非线性方程来描述。给出了一种评估模型的迭代方法。一些实验结果表明,内存错误导致的性能下降取决于系统负载和操作系统特性。
{"title":"On the modeling of workload dependent memory faults","authors":"J. Dunkel","doi":"10.1109/FTCS.1990.89388","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89388","url":null,"abstract":"A modeling approach to investigating the interdependencies between memory faults and system performance is presented. Describing the program behavior by an independent reference model, the author develops a fault occurrence model that depends on workload characteristics such as task sojourn times, the number of page references, and the number of page faults. Determining the probability that a fault is detected at a page test, the author quantifies the workload required for fault handling. Using a queuing network in a stationary analysis, he evaluates the average performance decrease caused by memory faults. The interdependencies between performance and reliability quantities are described by a set of nonlinear equations. An iterative method for evaluating the model is given. The results of some experiments demonstrate that the performance decrease caused by memory error depends on system workload and operating system characteristics.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127822008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
The transformation approach to the modeling and evaluation of the reliability and availability growth 采用转换方法对可靠性和可用性增长进行建模和评估
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89390
J. Laprie, C. Béounes, M. Kaâniche, K. Kanoun
The authors present an approach aimed at modeling and evaluating the reliability and availability of systems from the knowledge of the reliability growth of their components. First system behavior is characterized with respect to reliability and availability. The hyperexponential model for reliability and availability growth modeling is introduced and applied to multicomponent systems. The possibility of accounting for future reliability growth when performing evaluations during the design of the system is considered.<>
作者提出了一种基于系统部件可靠性增长的知识来建模和评估系统可靠性和可用性的方法。首先,系统行为的特点是可靠性和可用性。介绍了可靠性和可用性增长建模的超指数模型,并将其应用于多部件系统。在系统设计过程中进行评估时,考虑了考虑未来可靠性增长的可能性
{"title":"The transformation approach to the modeling and evaluation of the reliability and availability growth","authors":"J. Laprie, C. Béounes, M. Kaâniche, K. Kanoun","doi":"10.1109/FTCS.1990.89390","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89390","url":null,"abstract":"The authors present an approach aimed at modeling and evaluating the reliability and availability of systems from the knowledge of the reliability growth of their components. First system behavior is characterized with respect to reliability and availability. The hyperexponential model for reliability and availability growth modeling is introduced and applied to multicomponent systems. The possibility of accounting for future reliability growth when performing evaluations during the design of the system is considered.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121456186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Static allocation of process replicas in fault tolerant computing systems 容错计算系统中进程副本的静态分配
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89345
L. J. M. Neiuwenhuis
It is proved that there exist allocations that are optimal with respect to reliability. A simple transformation rule that derives an optimal allocation of replicated systems from an allocation of a given nonreplicated system is presented. This transformation preserves performance optimizing properties of the original allocation. Generally, replication gives a large number of processor links. A second transformation rule generates a replicated system with authenticated messages. The reliability of this system is also optimal, with, however, significantly fewer communication links.<>
证明了在可靠性方面存在最优分配。提出了一个简单的转换规则,从给定的非复制系统的分配中导出复制系统的最优分配。这种转换保留了原始分配的性能优化属性。通常,复制提供大量的处理器链接。第二个转换规则生成具有经过身份验证的消息的复制系统。该系统的可靠性也是最佳的,然而,通信链路明显减少。
{"title":"Static allocation of process replicas in fault tolerant computing systems","authors":"L. J. M. Neiuwenhuis","doi":"10.1109/FTCS.1990.89345","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89345","url":null,"abstract":"It is proved that there exist allocations that are optimal with respect to reliability. A simple transformation rule that derives an optimal allocation of replicated systems from an allocation of a given nonreplicated system is presented. This transformation preserves performance optimizing properties of the original allocation. Generally, replication gives a large number of processor links. A second transformation rule generates a replicated system with authenticated messages. The reliability of this system is also optimal, with, however, significantly fewer communication links.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115633575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Distributed probabilistic fault diagnosis for multiprocessor systems 多处理机系统的分布式概率故障诊断
Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89383
P. Berman, A. Pelc
A class of n-unit multiprocessor systems with O(n log n) interconnecting links is constructed, and a distributed probabilistic fault diagnosis algorithm whose probability of correctness converges to 1 as n to infinity is proposed. For small probability of unit failure, a distributed diagnosis whose probability also converges to 1 as the size of the system grows is proposed for the hypercube. On the other hand, it is proved that if a class of systems has fewer than kn log n links for a small constant k, the probability of correctness of every fault diagnosis converges to 0 as n to infinity . By combining the probabilistic and the distributed approach the authors' model of fault diagnosis removes the major drawbacks of the PMC (Preparata-Metze-Chien) model: the assumption of tests with complete fault coverage and the assumption of a fault-free central monitoring unit capable of performing diagnosis.<>
构造了一类具有O(n log n)条互连链路的n单元多处理器系统,提出了一种当n→∞时正确率收敛于1的分布式概率故障诊断算法。在单元故障概率较小的情况下,对超立方体提出了一种分布式诊断方法,该方法的概率也随着系统规模的增大而收敛于1。另一方面,证明了对于一个小常数k,如果一类系统的链路数小于kn log n,则当n趋于无穷时,每一个故障诊断的正确概率收敛于0。通过结合概率和分布式方法,作者的故障诊断模型消除了PMC (Preparata-Metze-Chien)模型的主要缺点:假设具有完全故障覆盖的测试和假设有一个能够执行诊断的无故障中央监控单元。
{"title":"Distributed probabilistic fault diagnosis for multiprocessor systems","authors":"P. Berman, A. Pelc","doi":"10.1109/FTCS.1990.89383","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89383","url":null,"abstract":"A class of n-unit multiprocessor systems with O(n log n) interconnecting links is constructed, and a distributed probabilistic fault diagnosis algorithm whose probability of correctness converges to 1 as n to infinity is proposed. For small probability of unit failure, a distributed diagnosis whose probability also converges to 1 as the size of the system grows is proposed for the hypercube. On the other hand, it is proved that if a class of systems has fewer than kn log n links for a small constant k, the probability of correctness of every fault diagnosis converges to 0 as n to infinity . By combining the probabilistic and the distributed approach the authors' model of fault diagnosis removes the major drawbacks of the PMC (Preparata-Metze-Chien) model: the assumption of tests with complete fault coverage and the assumption of a fault-free central monitoring unit capable of performing diagnosis.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"506 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116171882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
期刊
[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1