Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers最新文献

英文中文

A class of optimal fixed-byte error protection codes for computer systems 计算机系统的一类最佳固定字节错误保护码

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466968

E. Fujiwara, M. Kitakami

Error control codes are now being successfully applied to computer systems, especially to memory systems. This paper proposes a new class of error control codes to protect the fixed-byte in computer words from errors. The fixed-byte stores valuable and important information such as control and address information in communication messages or pointer information in database words. 'Fixed-byte' means the clustered information digits in the word whose position is determined in advance. As a simple class of these unequal error protection codes, this paper proposes two types of optimal fixed-byte error protection codes: single-bit error correction and fixed b-bit byte error correction (SEC-FbEC) codes and single-bit error correction, double-bit error detection, and fixed b-bit byte error detection (SEC-DED-FbED) codes. The obtained optimal SEC-FbEC codes where byte length b=7 bits and information length k=64 bits, for example, require a check-bit length of only 8 bits, which is the same as that of the conventional SEC-DED codes with k=64 bits.<>

错误控制码现已成功地应用于计算机系统，特别是存储系统。本文提出了一类新的错误控制码，以保护计算机字中的固定字节不受错误的影响。固定字节存储有价值和重要的信息，如通信消息中的控制和地址信息或数据库词中的指针信息。“固定字节”是指单词中预先确定位置的信息数字的集合。作为这些不等错保护码的一个简单类别，本文提出了两种最优的固定字节错误保护码:单比特纠错和固定b比特字节纠错(SEC-FbEC)码和单比特纠错、双比特错误检测和固定b比特字节错误检测(sec - ed- fbed)码。以字节长度b=7位，信息长度k=64位为例，得到的最优SEC-FbEC码的校验位长度仅为8位，与k=64位的常规SEC-DED码的校验位长度相同

引用次数: 6

Software schemes of reconfiguration and recovery in distributed memory multicomputers using the actor model 基于参与者模型的分布式存储多机重构与恢复软件方案

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466950

M. Peercy, P. Banerjee

Ideally, a multicomputer system should cope with a processor failure by reconstructing itself-and the application running on itself-in order to maintain the available computational power of the remaining processors. We discuss the continuance of running applications through permanent processor failures. We take advantage of the characteristics of the actor model of parallel computation and dynamically checkpoint the activity of the application. Consequently, the runtime system is able to continue an application through multiple nonconcurrent processor failures. We have implemented our techniques through modifications of the runtime system of the parallel language Charm on an Intel iPSC/s hypercube. After discussing the theory and implementation, we give measurements of overhead due to fault tolerance for a number of applications and demonstrate continuance of the applications after injection of one or more faults.<>

理想情况下，多计算机系统应该通过重构自身(以及在其上运行的应用程序)来处理处理器故障，以保持剩余处理器的可用计算能力。我们将讨论通过永久性处理器故障继续运行应用程序。我们利用并行计算参与者模型的特点，对应用程序的活动进行动态检查点。因此，运行时系统能够在多个非并发处理器故障的情况下继续运行应用程序。我们通过在Intel iPSC/s超立方体上修改并行语言Charm的运行时系统来实现我们的技术。在讨论了理论和实现之后，我们对许多应用程序的容错开销进行了测量，并演示了在注入一个或多个错误后应用程序的连续性。

引用次数: 8

Fault-tolerant clock synchronization for distributed systems using continuous synchronization messages 使用连续同步消息的分布式系统的容错时钟同步

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466987

A. Olson, K. Shin, B. Jambor

We present a probabilistic synchronization algorithm which sends periodic synchronization messages, instead of periodic bursts of synchronization messages as other algorithms do. Our "continuous" approach therefore avoids the burst network loads of other algorithms. Nodes always have current estimates of other nodes' clocks, allowing them to monitor the state of system synchronization, and adjust their clocks as needed. The algorithm is fault-tolerant, and may be easily adapted to a wide variety of systems and networks. We analyze and simulate the algorithm's performance on a 64-node hypercube, and show that the algorithm provides tight synchronization while imposing only a light load on the network.<>

我们提出了一种概率同步算法，它可以周期性地发送同步消息，而不是像其他算法那样发送周期性的同步消息。因此，我们的“连续”方法避免了其他算法的突发网络负载。节点总是对其他节点的时钟有当前的估计，允许它们监视系统同步的状态，并根据需要调整它们的时钟。该算法具有容错性，可以很容易地适应各种系统和网络。我们在一个64节点的超立方体上分析和模拟了该算法的性能，并表明该算法在提供紧密同步的同时仅对网络施加轻负载。

引用次数: 5

Self-stabilizing mutual exclusion in the presence of faulty nodes 故障节点存在时的自稳定互斥

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466988

R. Buskens, R. Bianchini

The paper presents the RatchetFT distributed fault tolerant mutual exclusion algorithm for processor rings. RatchetFT is self-stabilizing, in that if mutual exclusion is lost due to any sequence of online failures and repairs of processors, mutual exclusion will eventually be regained. This research demonstrates that self-stabilization can be achieved in the presence of faulty processors, provided that these faulty processors always appear to behave incorrectly. Self-stabilization is achievable even if faulty processor behavior is not restricted to transient failures or other simple failure models. The key results of the paper include the specification of RatchetFT and a detailed sketch of its correctness proof.<>

提出了一种适用于处理器环的棘轮ft分布式容错互斥算法。棘轮ft是自稳定的，如果由于任何在线故障和处理器修复的序列而丢失互斥，最终将恢复互斥。本研究表明，在存在故障处理器的情况下，只要这些故障处理器总是表现出不正确的行为，就可以实现自稳定。即使故障处理器行为不限于瞬态故障或其他简单故障模型，也可以实现自稳定。本文的主要成果包括棘轮ft的规范和其正确性证明的详细草图。

引用次数: 11

Modeling and testing a critical fault-tolerant multi-process system 建模和测试一个关键的容错多进程系统

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466946

Ronald Riter

The paper discusses modeling and fault insertion testing of the Boeing 777 "fly-by-wire" Primary Flight Computer (PFC) system. The 777 PFC was modeled to perform a behavior analysis. The simulation model includes all systems communicating with the Primary Flight Computers (PFC). The simulation environment allows errors to be injected into the communication portion of the model and into selected PFC internal variables. The model is used to test the system response to errors in the PFC input data and to PFC internal errors. The behavior analysis tests have been chosen to stress the fault tolerant design and to investigate PFC anomalies encountered during either laboratory tests or during flight test. The effects of both input and PFC internal errors were studied and the effects of asynchronous communication were examined. The paper is composed of the following: 1. Introduction which briefly describes both the airplane "fly-by-wire" features and the simulation. 2. PFC description which gives more details about the PFC. 3. Failure model. 4. Simulation description which describes the simulation environment and facilities. 5. Fault-tolerant testing which gives some examples. 6. Summary.<>

讨论了波音777“电传”主飞行计算机(PFC)系统的建模和故障插入试验。对777 PFC进行建模以进行行为分析。仿真模型包括与主飞行计算机(PFC)通信的所有系统。仿真环境允许将错误注入模型的通信部分和选定的PFC内部变量中。该模型用于测试系统对PFC输入数据误差和PFC内部误差的响应。选择行为分析测试来强调容错设计，并调查在实验室测试或飞行测试中遇到的PFC异常。研究了输入误差和PFC内部误差的影响，并考察了异步通信的影响。论文主要由以下几个部分组成:绪论，简要介绍了飞机电传控制的特点和仿真。2. PFC描述，给出了PFC的更多细节。故障模型。4. 仿真描述，描述仿真环境和设施。5. 其中给出了一些容错测试的例子。6. 总结。>

{"title":"Modeling and testing a critical fault-tolerant multi-process system","authors":"Ronald Riter","doi":"10.1109/FTCS.1995.466946","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466946","url":null,"abstract":"The paper discusses modeling and fault insertion testing of the Boeing 777 \"fly-by-wire\" Primary Flight Computer (PFC) system. The 777 PFC was modeled to perform a behavior analysis. The simulation model includes all systems communicating with the Primary Flight Computers (PFC). The simulation environment allows errors to be injected into the communication portion of the model and into selected PFC internal variables. The model is used to test the system response to errors in the PFC input data and to PFC internal errors. The behavior analysis tests have been chosen to stress the fault tolerant design and to investigate PFC anomalies encountered during either laboratory tests or during flight test. The effects of both input and PFC internal errors were studied and the effects of asynchronous communication were examined. The paper is composed of the following: 1. Introduction which briefly describes both the airplane \"fly-by-wire\" features and the simulation. 2. PFC description which gives more details about the PFC. 3. Failure model. 4. Simulation description which describes the simulation environment and facilities. 5. Fault-tolerant testing which gives some examples. 6. Summary.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115028062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

LOCSTEP: a logic simulation based test generation procedure LOCSTEP:一个基于逻辑仿真的测试生成过程

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466992

I. Pomeranz, S. Reddy

We present a method to generate test sequences that detect large numbers of faults (close to or higher than the number of faults that can be detected by deterministic methods) at a cost which is significantly lower than any existing test generation procedure. The generated sequences can be used alone or as prefixes to deterministic test sequences. To generate the sequences, we study the test sequences generated by several deterministic test generation procedures. We show that when deterministic test sequences are applied, the fault free circuits go through sequences of state transitions that have distinct characteristics which are independent of the specific circuit considered. Test sequences with the same characteristics are generated by using logic simulation only on the fault free circuit and considering several random patterns as candidates for inclusion in the test sequence at every time unit. By fault simulating these sequences, we find that the fault coverage achieved is very close to the fault coverage achieved by deterministic sequences and sometimes even higher. In most cases the fault coverage is higher than the fault coverage achieved by nondeterministic procedures based on genetic optimization.<>

我们提出了一种方法来生成检测大量故障的测试序列(接近或高于确定性方法可以检测到的故障数量)，其成本明显低于任何现有的测试生成程序。所生成的序列可以单独使用或作为确定性测试序列的前缀使用。为了生成测试序列，我们研究了由几个确定性测试生成程序生成的测试序列。我们表明，当应用确定性测试序列时，无故障电路会经历具有不同特征的状态转换序列，这些特征与所考虑的特定电路无关。仅在无故障电路上进行逻辑仿真，并在每个时间单元考虑几种随机模式作为测试序列的候选模式，从而生成具有相同特性的测试序列。通过对这些序列的故障模拟，我们发现所获得的故障覆盖率非常接近确定性序列所获得的故障覆盖率，有时甚至更高。在大多数情况下，故障覆盖率高于基于遗传优化的不确定性过程所获得的故障覆盖率。

{"title":"LOCSTEP: a logic simulation based test generation procedure","authors":"I. Pomeranz, S. Reddy","doi":"10.1109/FTCS.1995.466992","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466992","url":null,"abstract":"We present a method to generate test sequences that detect large numbers of faults (close to or higher than the number of faults that can be detected by deterministic methods) at a cost which is significantly lower than any existing test generation procedure. The generated sequences can be used alone or as prefixes to deterministic test sequences. To generate the sequences, we study the test sequences generated by several deterministic test generation procedures. We show that when deterministic test sequences are applied, the fault free circuits go through sequences of state transitions that have distinct characteristics which are independent of the specific circuit considered. Test sequences with the same characteristics are generated by using logic simulation only on the fault free circuit and considering several random patterns as candidates for inclusion in the test sequence at every time unit. By fault simulating these sequences, we find that the fault coverage achieved is very close to the fault coverage achieved by deterministic sequences and sometimes even higher. In most cases the fault coverage is higher than the fault coverage achieved by nondeterministic procedures based on genetic optimization.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134632141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Efficient failure recovery in multi-disk multimedia servers 多磁盘多媒体服务器的高效故障恢复

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.467000

H. Vin, P. Shenoy, Sriram Rao

In this paper, we present a novel disk failure recovery method that utilizes the inherent redundancy in video streams (rather than error-correcting codes) to ensure that the user-invoked on-the-fly failure recovery process does not impose any additional load on the disk array. We also present a disk array architecture that enhances the scalability of multimedia servers by: (1) integrating the recovery process with the decompression of video streams, and thereby distributing the reconstruction process across the clients; and (2) supporting graceful degradation in the quality of recovered images with increase in the number of disk failures.<>

在本文中，我们提出了一种新的磁盘故障恢复方法，该方法利用视频流中的固有冗余(而不是纠错码)来确保用户调用的动态故障恢复过程不会对磁盘阵列施加任何额外的负载。我们还提出了一种磁盘阵列架构，该架构通过以下方式增强了多媒体服务器的可扩展性:(1)将恢复过程与视频流的解压缩集成在一起，从而在客户端之间分布重建过程;(2)随着磁盘故障数量的增加，支持恢复映像质量的优雅退化。

引用次数: 32

Componentwise decomposition for an efficient reliability computation of systems with repairable components 基于部件分解的可修部件系统可靠性计算方法

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466972

M. Balakrishnan, Kishor S. Trivedi

Fault trees and Markov chains are commonly used for dependability modeling. Markov chains are powerful in that various kinds of dependencies can be easily modeled that fault tree models have difficulty capturing, but the state space grows exponentially in the number of components. Fault tree models are adequate for computing the reliability of nonrepairable systems, but a state space description becomes necessary for repairable systems due to induced dependencies (even when all failure and repair processes are otherwise independent). We demonstrate that a decomposition approach can be used to avoid a full-system Markov reliability model for repairable systems with independent failure and repair processes. For an n-component system, n 3-state sub-models can replace a full-system monolithic model. This is an approximation because the parameters used in the sub-model are approximately derived from the monolithic model.<>

故障树和马尔可夫链通常用于可靠性建模。马尔可夫链的强大之处在于，它可以很容易地对故障树模型难以捕获的各种依赖关系进行建模，但状态空间随着组件的数量呈指数增长。故障树模型足以计算不可修复系统的可靠性，但由于诱导依赖性(即使所有故障和修复过程在其他方面是独立的)，状态空间描述对于可修复系统是必要的。我们证明了一种分解方法可以用来避免具有独立故障和维修过程的可修系统的全系统马尔可夫可靠性模型。对于一个有n个组件的系统，n个3状态子模型可以代替一个完整的系统整体模型。这是一个近似值，因为子模型中使用的参数是近似地从整体模型中导出的。

引用次数: 17

Fault simulation of I/sub DDQ/ tests for bridging faults in sequential circuits 时序电路中I/sub DDQ/测试桥接故障的故障模拟

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466965

P. J. Thadikaran, S. Chakravarty, J. Patel

The notion of indistinguishable pairs is introduced. Two methods to compute such pairs-an explicit scheme and an implicit scheme-are presented. The resulting fault simulation algorithms, list-based scheme and tree-based scheme are compared using a variety of faultlists and test sets. The performance of the tree-based scheme is found to be superior to the list-based scheme. Applications where the list-based scheme perform better are discussed.<>

引入了不可区分对的概念。给出了计算这类对的两种方法——显式方案和隐式方案。利用各种故障列表和测试集，比较了基于列表和基于树的故障模拟算法。结果表明，基于树的方案性能优于基于列表的方案。讨论了基于列表的方案性能更好的应用程序。

引用次数: 7

Checking the integrity of trees 检查树的完整性

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466959

J. Bright, G. Sullivan, G. Masson

We describe a general checking the integrity of data structures corrupted by memory faults. Our approach is based on a recursive checksum technique. Basic methods of using checksums have been previously seen to be useful for detecting faults at the bit or word level; among our results is their extension to the node level. The major contributions of our paper are threefold. First, we show how the recursive checksum procedure can be applied to tree data structures that are dynamically changing, whereas the previous work concentrated on trees that were static in their structure. This results in a asymptotic improvement in running time for applications where it; is natural to model the underlying data as a tree. Second, we present a C++ implementation of this scheme. Significantly, it is seen that our software can be used with existing applications which manipulate trees with only minor modification of the application programs. Finally, we have performed fault injection experiments which confirm the fault detection capability of our integrity checking approach.<>

我们描述了一种检查被内存故障损坏的数据结构完整性的通用方法。我们的方法基于递归校验和技术。使用校验和的基本方法以前被认为对检测位或字级别的故障很有用;我们的结果之一是将它们扩展到节点级别。本文的主要贡献有三个方面。首先，我们展示了递归校验和过程如何应用于动态变化的树数据结构，而前面的工作集中在结构上是静态的树。这将导致应用程序运行时间的渐进改进，其中它;将底层数据建模为树是很自然的。其次，给出了该方案的c++实现。值得注意的是，我们的软件可以与现有的应用程序一起使用，这些应用程序只需要对应用程序进行轻微的修改。最后，我们进行了故障注入实验，验证了我们的完整性检测方法的故障检测能力。

引用次数: 21

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀