[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers最新文献

英文中文

Recoverable distributed shared virtual memory: memory coherence and storage structures 可恢复分布式共享虚拟内存:内存一致性和存储结构

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105629

Kun-Lung Wu, W. Fuchs

An examination is made of the problem of implementing rollback recovery in multicomputer distributed shared virtual memory environments, in which the shared memory is implemented in software and exists only virtually. A user-transparent checkpointing recovery scheme and a twin-page disk storage management are presented to implement a recoverable distributed shared virtual memory. The checkpointing scheme is integrated with the shared virtual memory management. The twin-page disk approach allows incremental checkpointing without an explicit 'undo' at the time of recovery. A single consistent checkpoint state is maintained on stable disk storage. The recoverable distributed shared virtual memory allows the system to restart computation from a previous checkpoint after a processor failure without a global restart.<>

研究了在多计算机分布式共享虚拟内存环境中实现回滚恢复的问题，在这种环境中，共享内存是通过软件实现的，只是虚拟存在的。为了实现可恢复的分布式共享虚拟内存，提出了用户透明的检查点恢复方案和双页磁盘存储管理。检查点方案与共享虚拟内存管理相结合。双页磁盘方法允许增量检查点，而不需要在恢复时显式地“撤消”。在稳定的磁盘存储上保持单一一致的检查点状态。可恢复的分布式共享虚拟内存允许系统在处理器故障后从先前的检查点重新开始计算，而无需全局重新启动

引用次数: 13

Evaluation of fault-tolerant systems with nonhomogeneous workloads 具有非均匀工作负载的容错系统的评估

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105560

B. Aupperle, J. F. Meyer, Lu Wei

A methodology is presented for evaluating fault-tolerant systems when workloads and fault arrivals are not time-homogeneous. Of particular interests are systems whose environments vary considerably between different utilization phases of random duration. In such cases, evaluations of overall system performability must account for the corresponding differences in workload effects, especially with regard to fault recovery. The proposed methodology uses analytic techniques based on Markov processes and stochastic activity networks. Examples of evaluation studies, using this approach, are presented. These include evaluation of a system wherein self-exercising is varied between phases of passive and active use.<>

提出了一种在工作负载和故障到达时间不均匀时评估容错系统的方法。特别感兴趣的是那些环境在随机持续时间的不同利用阶段之间变化很大的系统。在这种情况下，对整个系统可执行性的评估必须考虑到工作负载效果的相应差异，特别是在故障恢复方面。该方法采用基于马尔可夫过程和随机活动网络的分析技术。给出了使用这种方法的评价研究实例。这些包括对一个系统的评估，在这个系统中，自我锻炼在被动和主动使用的阶段是不同的。

引用次数: 26

Detailed modeling of fault-tolerant processor arrays 容错处理器阵列的详细建模

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105633

N. Lopez-Benitez, J. Fortes

Detailed modeling of fault-tolerant processor arrays entails not only an explosive growth in the model state space but also a difficult model construction process. The latter problem is addressed, and a systematic method to construct Markov models for evaluating the reliability of processor arrays is proposed. This method is based on the premise that the fault behavior of a processor array can be modeled by a stochastic Petri net. However, in order to obtain a more compact representation, a set of attributes is associated with each transition in the Petri net model. This set of attributes allows the construction of the corresponding Markov model as the generation of the reachability graph takes place. Included in these attributes is a discrete probability distribution such that the effect of faulty spares in the reconfiguration algorithm is captured each time a configuration change occurs. This distribution includes the probabilities of survival given that a number of components required by the reconfiguration process are faulty. Depending on the type of component and the reconfiguration scheme, probabilities of survival are determined using simulation or closed-form expressions.<>

容错处理器阵列的详细建模不仅需要模型状态空间的爆炸性增长，而且模型构建过程也很困难。针对后一个问题，提出了一种系统的构建马尔可夫模型的方法来评估处理器阵列的可靠性。该方法的前提是处理器阵列的故障行为可以用随机Petri网来建模。然而，为了获得更紧凑的表示，一组属性与Petri网模型中的每个转换相关联。这组属性允许在可达性图生成时构建相应的马尔可夫模型。这些属性中包含一个离散概率分布，使得每次发生配置更改时都可以捕获故障备件在重新配置算法中的影响。该分布包括在重新配置过程所需的许多组件出现故障的情况下的生存概率。根据组件的类型和重新配置方案，使用模拟或封闭形式表达式确定生存概率。

{"title":"Detailed modeling of fault-tolerant processor arrays","authors":"N. Lopez-Benitez, J. Fortes","doi":"10.1109/FTCS.1989.105633","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105633","url":null,"abstract":"Detailed modeling of fault-tolerant processor arrays entails not only an explosive growth in the model state space but also a difficult model construction process. The latter problem is addressed, and a systematic method to construct Markov models for evaluating the reliability of processor arrays is proposed. This method is based on the premise that the fault behavior of a processor array can be modeled by a stochastic Petri net. However, in order to obtain a more compact representation, a set of attributes is associated with each transition in the Petri net model. This set of attributes allows the construction of the corresponding Markov model as the generation of the reachability graph takes place. Included in these attributes is a discrete probability distribution such that the effect of faulty spares in the reconfiguration algorithm is captured each time a configuration change occurs. This distribution includes the probabilities of survival given that a number of components required by the reconfiguration process are faulty. Depending on the type of component and the reconfiguration scheme, probabilities of survival are determined using simulation or closed-form expressions.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124738360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A theoretical investigation of generalized voters for redundant systems 冗余系统广义选民的理论研究

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105617

Paul R. Lorczak, A. Caglayan, D. Eckhardt

The authors generalize several commonly used voting techniques to arbitrary N-version systems with arbitrary output types using a metric space framework. In particular, they introduce the generalized median voter, which extends the thresholdless midvalue selection technique to arbitrary metric spaces and obviates most of the problems associated with inexact voting. They also introduce the formalized majority voter, which allows an inexact notion of equality between version outputs using a threshold. The authors then show that the median output determined by the generalized median voter will always be contained in the set of consensus outputs produced by the formalized majority voter. In addition, the authors introduce the formalized plurality voter which generalizes two-out-of-N type voters and the weighted averaging voter which generalizes dynamic voting. The performance of these voters under different postulated failure scenarios is compared.<>

作者使用度量空间框架将几种常用的投票技术推广到具有任意输出类型的任意n版本系统。特别地，他们引入了广义中值投票人，将无阈值中值选择技术扩展到任意度量空间，并消除了与不精确投票相关的大多数问题。他们还引入了形式化的多数投票人，它允许使用阈值对版本输出之间的相等性进行不精确的概念。然后，作者证明了由广义中位数投票人确定的中位数输出总是包含在由形式化多数投票人产生的共识输出集中。此外，作者还介绍了推广两出n型投票人的形式化多数投票人和推广动态投票的加权平均投票人。比较了这些投票人在不同假设失败情景下的表现。

引用次数: 157

The fault tolerance approach of the Advanced Architecture Onboard Processor 先进架构板载处理器的容错方法

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105535

M. Iacoponi, D. Vail

The Advanced Architecture Onboard Processor is a fault-tolerant multiprocessor for space applications that is based on a fault-tolerant chordal skip-link ring interconnect network. Low-power self-checking circuits in each processor node are combined with distributed reconfiguration control and local rollback recovery to achieve robust fault tolerance within spacecraft weight and power constraints. A ten-processor-node breadboard has been completed. The approach to fault tolerance and the tradeoff analysis leading to the selected implementation are covered. Analytical trade study results such as redundancy overhead as a function of system partitioning for the chordal skip-link ring are discussed.<>

高级架构板载处理器是一种容错多处理器，用于空间应用，基于容错的弦跳链环互连网络。每个处理器节点的低功耗自检电路与分布式重构控制和局部回滚恢复相结合，在航天器重量和功率约束下实现了鲁棒容错。一个包含10个处理器节点的面包板已经完成。介绍了容错方法和导致所选实现的权衡分析。分析贸易研究结果，如冗余开销作为一个函数的系统划分的弦跳环进行了讨论。

引用次数: 7

Dependable onboard computer systems with a new method-stepwise negotiating voting 可靠的机载计算机系统与新方法的逐步协商投票

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105536

N. Kanekawa, H. Maejima, H. Kato, H. Ihara

An algorithm for software voting, called stepwise negotiating voting, which can tolerate the faults in up to N-1 subsystems is introduced. The voter behaves as if it were a majority voter if the number of remaining subsystems is sufficient for majority voting, and standby redundancy is realized if the number of remaining subsystems becomes insufficient. With this voting method, the system can survive if more than one subsystem remains. The authors introduce a method for evaluating the dependability of systems. It is based on the viewpoint that not only the hardware reliability but also the reliability of data processing is important. It is assumed that only transient faults take place in the software behavior. The author's concept can be applied to computers in critical application fields, such as space development or engine control.<>

介绍了一种最多可容错N-1个子系统的软件投票算法——逐步协商投票。如果剩余子系统的数量足以进行多数投票，投票人就会表现得像多数投票人一样，如果剩余子系统的数量不足，就会实现备用冗余。使用这种投票方法，如果保留多个子系统，则系统可以存活。介绍了一种评估系统可靠性的方法。它基于不仅硬件可靠性重要，而且数据处理可靠性重要的观点。假定在软件行为中只发生瞬态故障。作者的概念可以应用于关键应用领域的计算机，如空间开发或发动机控制

引用次数: 33

Replication within atomic actions and conversations: a case study in fault-tolerance duality 原子操作和对话中的复制:容错二元性的案例研究

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105619

L. Mancini, S. Shrivastava

Recently a duality mapping for fault-tolerant system structures was proposed by the authors (1985). Two canonical models of distributed fault-tolerant systems have been constructed and shown to be duals of each other. One model incorporates objects and atomic actions as the entities for program construction, whereas the second model uses communicating processes with conversations. As a consequence of the duality, techniques and mechanisms which have been developed within the domain of just one of the models can be mapped and applied to the other model. This point is illustrated by mapping some well-known object replication techniques developed within the context of an object and actions model to the communicating process model, thereby revealing some interesting process replication techniques.<>

最近，作者(1985)提出了容错系统结构的对偶映射。本文构造了两个典型的分布式容错系统模型，并证明了它们是相互对偶的。一个模型将对象和原子操作合并为程序构造的实体，而第二个模型使用带有对话的通信过程。作为二元性的结果，在其中一个模型的领域内开发的技术和机制可以映射并应用到另一个模型。通过将在对象和操作模型上下文中开发的一些著名的对象复制技术映射到通信流程模型来说明这一点，从而揭示了一些有趣的流程复制技术。

引用次数: 16

An economical scan design for sequential logic test generation 时序逻辑测试生成的经济扫描设计

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105539

K. Cheng, V. Agrawal

A method of partial scan design in which the selection of scan flip-flops is aimed at breaking up the cyclic structure of the circuit is presented. Experimental data are given to show that the test generation complexity may grow exponentially with the length of the cycles in the circuit. This complexity grows only linearly with sequential depth. Graph-theoretic algorithms are presented to select a minimal set of flip-flops for eliminating cycles to reduce sequential depth. Tests for the resulting circuit can be efficiently generated by a sequential logic test generator. An independent control of the scan clock allows the insertion of scan sequences within the vector sequence produced by the test generator. Experimental results on a 5000 gate circuit show that a test coverage above 98% could be obtained by scanning just 5% of the flip-flops. In addition, the authors give the design of a scan flip-flop to reduce the input pin and signal routing overheads in a single-clock design.<>

提出了一种局部扫描设计方法，其中扫描触发器的选择旨在打破电路的循环结构。实验数据表明，测试生成复杂度随电路周期长度呈指数增长。这种复杂性只随着顺序深度线性增长。提出了一种图论算法来选择一个最小的触发器集来消除循环，以减少序列深度。结果电路的测试可以通过顺序逻辑测试发生器有效地生成。扫描时钟的独立控制允许在测试发生器产生的矢量序列内插入扫描序列。在5000栅极电路上的实验结果表明，只需扫描5%的触发器就可以获得98%以上的测试覆盖率。此外，作者还设计了一种扫描触发器，以减少单时钟设计中的输入引脚和信号路由开销。

引用次数: 95

Distributed syndrome decoding for regular interconnected structures 规则互联结构的分布式综合征解码

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105545

Arun Kumar Somani, V. Agarwal

Distributed syndrome decoding algorithms to locate faulty PEs (processing elements) in large-scale regular interconnected structures based on the concepts of system-level diagnosis are developed. These algorithms operate in a systolic manner to locate the faulty processors. The computational complexities of these algorithms are either linear or sublinear, depending on the architecture of the system. Their implementation complexities and diagnosis capabilites differ substantially. The conditions that a fault pattern should satisfy for correct and complete diagnosis and the maximum global size of fault sets which can be diagnosed successfully using these algorithms are also identified.<>

基于系统级诊断的概念，提出了在大规模规则互联结构中定位故障处理元件的分布式症候解码算法。这些算法以一种收缩的方式来定位故障处理器。这些算法的计算复杂性要么是线性的，要么是亚线性的，这取决于系统的体系结构。它们的实现复杂性和诊断能力有很大的不同。本文还确定了正确完整诊断故障模式所需要满足的条件以及使用这些算法可以成功诊断的故障集的最大全局大小。

引用次数: 23

Clock synchronization in MAFT MAFT中的时钟同步

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1989-06-21 DOI: 10.1109/FTCS.1989.105557

Philip M. Thambidurai, A. Finn, R. Kieckhafer, C. Walter

The steady-state clock synchronization algorithm of MAFT (multicomputer architecture for fault tolerance), an extremely reliable system for real-time applications, is discussed. The synchronization algorithm has been implemented in hardware and a system prototype constructed. The algorithm uses an interactive convergence approach, based on synchronized rounds of message transmission. The authors derive the maximum skew between nonfaulty clocks in terms of basic system parameters. The problem of detecting clock faults is also addressed, with attention to the minimum amount of synchronization error guaranteed to be unambiguously detected. The authors discuss the various practicalities which arise in the implementation of the algorithm as an integrated part of the whole system. Relationships between the synchronization subsystem and the total system are discussed.<>

讨论了多计算机容错系统(MAFT)的稳态时钟同步算法，这是一种非常可靠的实时应用系统。在硬件上实现了同步算法，并构建了系统原型。该算法采用基于同步轮消息传输的交互式收敛方法。作者根据基本系统参数推导出无故障时钟之间的最大偏差。同时还讨论了检测时钟故障的问题，并注意保证能够明确检测到最小数量的同步错误。作者讨论了算法作为整个系统的组成部分在实现过程中出现的各种实用性。讨论了同步子系统与整个系统的关系。

引用次数: 27

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀