[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium最新文献

英文中文

Tolerating transient faults in MARS 容忍火星瞬态故障

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89384

H. Kopetz, H. Kantz, G. Grünsteidl, P. Puschner, J. Reisinger

The concepts of transient fault handling in the MARS architecture are discussed. After an overview of the MARS architecture, the mechanisms for the detection of transient faults are discussed in detail. In addition to extensive checks in the hardware and in the operating system, time-redundant execution of application tasks is proposed for the detection of transient faults. The time difference between the effective and the maximum execution time of an application task is used for this purpose. Whenever a transient fault has been detected, the affected component is turned off and reintegrated immediately by retrieving the uncorrupted state of the actively redundant partner component. In order to reduce the probability of spare exhaustion (in the case of permanent faults) 'shadow components' are introduced. The reliability improvement, which can be realized by these techniques, is calculated by detailed reliability models of the architecture, where the parameters are based on experimental results measured on the present MARS prototype implementation.<>

讨论了MARS体系结构中瞬态故障处理的概念。在概述了MARS体系结构之后，详细讨论了瞬态故障检测的机制。除了在硬件和操作系统中进行广泛的检查外，还提出了应用程序任务的时间冗余执行，以检测瞬态故障。应用程序任务的有效执行时间和最大执行时间之间的时间差用于此目的。每当检测到临时故障时，受影响的组件将被关闭，并通过检索主动冗余伙伴组件的未损坏状态立即重新集成。为了减少备用耗尽的概率(在永久性故障的情况下)引入了“影子组件”。这些技术所能实现的可靠性改进是通过详细的体系结构可靠性模型来计算的，其中的参数是基于在目前的MARS原型实现上测量的实验结果。

引用次数: 87

A fault-tolerant strategy for hierarchical control in distributed computing systems 分布式计算系统中分层控制的容错策略

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89343

P. Goyer, Parham Momtahan, B. Selić

The authors describe a practical method for realizing fault-tolerant global control of resources in distributed computing systems. The method is particularly suitable for systems that are based on a centralized arbiter for making control decisions. Many applications in LAN-based computing, online transactions, and telecommunication systems fall into this category. The method exploits the inherent physical separation of distributed computing systems to achieve high reliability in the face of decentralized arbiter failures. A significant feature of the method is that the fault-tolerance mechanisms are imbedded in the normal control signal flow so that the overhead is practically negligible in the absence of faults. The principles behind the method, its internal structure, and its operations are explained. Also, the experience gained through its application is discussed.<>

作者描述了一种在分布式计算系统中实现资源容错全局控制的实用方法。该方法特别适用于基于集中仲裁器做出控制决策的系统。基于局域网的计算、在线交易和电信系统中的许多应用程序都属于这一类。该方法利用分布式计算系统固有的物理分离特性，在分散式仲裁器失效的情况下实现高可靠性。该方法的一个重要特点是将容错机制嵌入到正常的控制信号流中，因此在没有故障的情况下，开销几乎可以忽略不计。解释了该方法背后的原理、内部结构和操作。并对其应用所获得的经验进行了讨论。

引用次数: 4

Burst asymmetric/unidirectional error correcting/detecting codes 突发非对称/单向纠错/检测码

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89375

Seungjin Park, B. Bose

Codes capable of correcting burst asymmetric and unidirectional errors are described. The proposed codes need approximately b+log/sub 2/k check bits to correct a burst of b asymmetric/unidirectional errors, where k is the number of information bits. In most cases, the proposed codes require fewer check bits than the equivalent burst symmetric error-correcting codes. The optimality of the codes is also considered. In addition, efficient codes capable of detecting double burst unidirectional errors are given.<>

描述了能够纠正突发非对称和单向错误的码。所提出的代码大约需要b+log/sub 2/k个校验位来纠正b个非对称/单向错误的突发，其中k是信息位的数量。在大多数情况下，所提出的编码比等效的突发对称纠错码需要更少的校验位。还考虑了代码的最优性。此外，还给出了检测双突发单向错误的有效码。

引用次数: 12

Anomaly detection for diagnosis 异常检测诊断

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89362

R. Maxion

The author presents a method for detecting anomalous events in communication networks and other similarly characterized environments in which performance anomalies are indicative of failure. The methodology, based on automatically learning the difference between normal and abnormal behavior, has been implemented as part of an automated diagnosis system from which performance results are drawn and presented. The dynamic nature of the model enables a diagnostic system to deal with continuously changing environments without explicit control, reaching to the way the world is now, as opposed to the way the world was planned to be. Results of successful deployment in a noisy, real-time monitoring environment are shown.<>

作者提出了一种在通信网络和其他类似特征的环境中检测异常事件的方法，其中性能异常表明故障。该方法基于自动学习正常和异常行为之间的差异，已作为自动诊断系统的一部分实施，从中绘制和呈现性能结果。模型的动态特性使诊断系统能够在没有明确控制的情况下处理不断变化的环境，达到世界现在的方式，而不是世界计划的方式。显示了在嘈杂的实时监测环境中成功部署的结果。

引用次数: 54

Design of microprocessors with built-in on-line test 内置在线测试的微处理器设计

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89381

R. Leveugle, T. Michel, G. Saucier

Control flow checking techniques are discussed. Invariant properties of the control flow can be checked at two different levels: verification of the sequencing in the controller of the microprocessor or verification of the control flow in the application program. Control flow checking has been implemented, at the two levels, in different versions of a 32-b microprocessor designed in a CMOS 1.5- mu technology. Integration of the monitors on silicon is detailed. The silicon overhead due to the different online test devices is precisely discussed. Different versions of this microprocessor have been designed and implemented in order to make real cost comparisons on components with identical functionality but different integrated monitors. Here only the hardware cost of concurrent checking is considered.<>

讨论了控制流检查技术。控制流的不变属性可以在两个不同的层次上进行检查:微处理器控制器中的序列验证或应用程序中的控制流验证。在不同版本的32b微处理器上，采用CMOS 1.5 μ m技术，实现了两个级别的控制流检查。详细介绍了监视器在硅片上的集成。详细讨论了不同在线测试设备所带来的硅开销。该微处理器的不同版本已经设计和实现，以便对具有相同功能但不同集成监视器的组件进行真正的成本比较。这里只考虑并发检查的硬件成本

引用次数: 33

Optimized synthesis of self-testable finite state machines 自测试有限状态机的优化综合

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89393

B. Eschermann, H. Wunderlich

A synthesis procedure for self-testable finite state machines is presented. Testability comes under consideration when the behavioral description of the circuit is being transformed into a structural description. To this end, a novel state encoding algorithm, as well as a modified self-test architecture, is developed. Experimental results show that this approach leads to a significant reduction of hardware overhead. Self-testing circuits generally employ linear feedback shift registers for pattern generation. The impact of choosing a particular feedback polynomial on the state encoding is discussed.<>

给出了有限状态机的一种综合方法。当电路的行为描述被转换成结构描述时，可测试性就会被考虑在内。为此，提出了一种新的状态编码算法和改进的自检结构。实验结果表明，该方法显著降低了硬件开销。自测电路一般采用线性反馈移位寄存器进行模式生成。讨论了选择特定反馈多项式对状态编码的影响。

引用次数: 74

Fault-tolerance in the Advanced Automation System 高级自动化系统中的容错

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1145/504136.504156

F. Cristian, Bob Dancey, Jonathan Dehn

The Advanced Automation System (AAS), a distributed real-time system intended to replace the present en-route and terminal approach US air traffic control computer systems over the next decade, is discussed. High availability of air traffic control services is an essential requirement of the system. The authors discuss the general approach to fault tolerance adopted in the AAS by reviewing some of the questions asked during the system design, various alternative solutions considered, and the reasons for the design choices made.<>

讨论了先进自动化系统(AAS)，一种分布式实时系统，旨在在未来十年取代目前的航路和终端进近美国空中交通管制计算机系统。空中交通管制服务的高可用性是该系统的基本要求。作者通过回顾在系统设计过程中提出的一些问题、考虑的各种备选解决方案以及做出设计选择的原因，讨论了在AAS中采用的容错的一般方法。

引用次数: 86

Limits to the fault-tolerance of a feedforward neural network with learning 具有学习的前馈神经网络容错限制

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89370

J. Nijhuis, B. Höfflinger, A. V. Schaik, L. Spaanenburg

Input data and hardware fault tolerance of neural networks are discussed. It is shown that fault-tolerant behavior is not self-evident but must be activated by an appropriate learning scheme. Practical limitations are demonstrated by an example of neural character recognition. The results show that the effects of learning and synapse weight decay on fault tolerance largely influence the practicality of large-scale silicon implementations. It is anticipated that, owing to implementation issues, such as the use of volatile memories, some neural VLSI architectures will not be sufficiently fault tolerant.<>

讨论了神经网络的输入数据和硬件容错问题。结果表明，系统的容错行为不是自明的，必须通过适当的学习方案来激活。通过一个神经字符识别的例子说明了实际的局限性。结果表明，学习和突触权重衰减对容错性的影响很大程度上影响了大规模芯片实现的实用性。预计，由于实现问题，如使用易失性存储器，一些神经VLSI架构将没有足够的容错性。

引用次数: 41

Fault detection and diagnosis of k-UCP circuits under totally observable condition 全可观测条件下k-UCP电路的故障检测与诊断

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89392

X. Wen, K. Kinoshita

A method is presented for detecting stuck-open faults, as well as stuck-at faults, in CMOS combinational circuits by short test sequences of fixed length. The discussion is based on the assumption that outputs of all the gates in a circuit are observable. This assumption will become reasonable when a new testability solution called CrossCheck, or a new test equipment, called on electron-beam tester, is used. The concept of k-UCP (uniform, having a (k+1)-Color solution and compatible polarity) circuits is introduced, and it is shown that 2(k+1) kinds of test sequences of length k(k+1)+1 are sufficient to detect stuck-open faults, as well as stuck-at faults in a k-UCP circuit. Furthermore, it is shown that single stuck-open faults can be located by using a fault diagnosis table. A method which can speed up the generation of a fault diagnosis table is also proposed.<>

提出了一种利用固定长度的短测试序列检测CMOS组合电路中卡断故障和卡断故障的方法。讨论是基于电路中所有门的输出都是可见的假设。当使用一种叫做CrossCheck的新的可测试性解决方案或一种叫做电子束测试仪的新测试设备时，这种假设将变得合理。引入了k- ucp(均匀，具有(k+1)色解和相容极性)电路的概念，并证明了长度为k(k+1)+1的2(k+1)种测试序列足以检测k- ucp电路中的卡断故障和卡断故障。此外，还证明了利用故障诊断表可以对单个卡开故障进行定位。提出了一种加快故障诊断表生成速度的方法。

引用次数: 4

Failure analysis and modeling of a VAXcluster system VAXcluster系统的故障分析与建模

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

Pub Date : 1990-06-26 DOI: 10.1109/FTCS.1990.89372

D. Tang, R. Iyer, Sujatha S. Subramani

The authors discuss the results of a measurement-based analysis of real error data collected from a DEC VAXcluster multicomputer system. In addition to evaluating basic system dependability characteristics, such as error and failure distributions and hazard rates for both individual machines and the VAXcluster, they develop reward models to analyze the impact of failures on the system as a whole. The results show that more than 46% of all failures were due to errors in shared resources. This is despite the fact that these errors have a recovery probability greater than 0.99. The hazard rate calculations show that not only errors but also failures occur in bursts. Approximately 40% of all failures occur in bursts and involve multiple machines. This result indicates that correlated failures are significant. Analysis of rewards shows that software errors have the lowest reward (0.05 versus 0.74 for disk errors). The expected reward rate (reliability measure) of the VAXcluster drops to 0.5 in 18 hours for the 7-out-of-7 model and in 80 days for the 3-out-of-7 model. The VAXcluster system availability is evaluated to be 0.993 250 days of operation.<>

本文讨论了对从DEC VAXcluster多机系统中采集的实际误差数据进行测量分析的结果。除了评估基本的系统可靠性特征，例如单个机器和VAXcluster的错误和故障分布以及危险率之外，他们还开发奖励模型来分析故障对整个系统的影响。结果表明，超过46%的失败是由于共享资源中的错误造成的。尽管这些错误的恢复概率大于0.99。危险率计算表明，在爆炸中不仅会发生错误，而且会发生故障。大约40%的故障发生在突发事件中，涉及多台机器。这一结果表明，相关失效是显著的。对奖励的分析显示，软件错误的奖励最低(0.05 vs .磁盘错误的奖励为0.74)。VAXcluster的预期奖励率(可靠性度量)在7 / 7模型中在18小时内下降到0.5，在3 / 7模型中在80天内下降到0.5。VAXcluster系统运行250天的可用性评估为0.993。

{"title":"Failure analysis and modeling of a VAXcluster system","authors":"D. Tang, R. Iyer, Sujatha S. Subramani","doi":"10.1109/FTCS.1990.89372","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89372","url":null,"abstract":"The authors discuss the results of a measurement-based analysis of real error data collected from a DEC VAXcluster multicomputer system. In addition to evaluating basic system dependability characteristics, such as error and failure distributions and hazard rates for both individual machines and the VAXcluster, they develop reward models to analyze the impact of failures on the system as a whole. The results show that more than 46% of all failures were due to errors in shared resources. This is despite the fact that these errors have a recovery probability greater than 0.99. The hazard rate calculations show that not only errors but also failures occur in bursts. Approximately 40% of all failures occur in bursts and involve multiple machines. This result indicates that correlated failures are significant. Analysis of rewards shows that software errors have the lowest reward (0.05 versus 0.74 for disk errors). The expected reward rate (reliability measure) of the VAXcluster drops to 0.5 in 18 hours for the 7-out-of-7 model and in 80 days for the 3-out-of-7 model. The VAXcluster system availability is evaluated to be 0.993 250 days of operation.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"697 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133167034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀