Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers最新文献

英文中文

Checkpointing and its applications 检查点及其应用

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466999

Yi-Min Wang, Yennun Huang, Kiem-Phong Vo, Pi-Yu Chung, C. Kintala

The paper describes our experience with the implementation and applications of the Unix checkpointing library libckp, and identifies two concepts that have proven to be the key to making checkpointing a powerful tool. First, including all persistent states, i.e., user files, as part of the process state that can be checkpointed and recovered provides a truly transparent and consistent rollback. Second, excluding part of the persistent state from the process state allows user programs to process future inputs from a desirable state, which leads to interesting new applications of checkpointing. We use real-life examples to demonstrate the use of libckp for bypassing premature software exits, for fast initialization and for memory rejuvenation.<>

本文描述了我们在Unix检查点库libckp的实现和应用方面的经验，并确定了两个已被证明是使检查点成为强大工具的关键概念。首先，将所有持久状态(即用户文件)作为进程状态的一部分，可以进行检查点和恢复，从而提供真正透明和一致的回滚。其次，从进程状态中排除部分持久状态允许用户程序处理来自理想状态的未来输入，这将导致有趣的检查点新应用程序。我们使用现实生活中的例子来演示libckp在绕过过早的软件退出、快速初始化和内存恢复方面的使用。

引用次数: 209

Algorithm-based diskless checkpointing for fault tolerant matrix operations 基于算法的容错矩阵操作无磁盘检查点

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466964

J. Plank, Youngbae Kim, J. Dongarra

The paper is an exploration of diskless checkpointing for distributed scientific computations. With the widespread use of the "network of workstations" (NOW) platform for distributed computing, long-running scientific computations need to tolerate the changing and often faulty nature of NOW environments. We present high-performance implementations of several algorithms for distributed scientific computing, including Cholesky factorization, LU factorization, QR factorization, and preconditioned conjugate gradient. These implementations are able to run on PVM networks of at least N processors, and can complete with low overhead as long as any N processors remain functional. We discuss the details of how the algorithms are tuned for fault-tolerance, and present the performance results on a PVM network of SUN workstations, and on the IBM SP2.<>

本文是对分布式科学计算的无磁盘检查点的探索。随着分布式计算“工作站网络”(NOW)平台的广泛使用，长期运行的科学计算需要容忍NOW环境的变化和经常出错的性质。我们提出了几种分布式科学计算算法的高性能实现，包括Cholesky分解、LU分解、QR分解和预条件共轭梯度。这些实现能够在至少有N个处理器的PVM网络上运行，并且只要任何N个处理器保持正常工作，就可以以低开销完成。我们将详细讨论如何调优算法以实现容错性，并给出在SUN工作站的PVM网络和IBM SP2. b>上的性能结果

引用次数: 79

Synthesis for testability by sequential redundancy removal using retiming 通过时序冗余去除的可测试性综合

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466981

H. Yotsuyanagi, S. Kajihara, K. Kinoshita

The existence of sequential redundancy degrades testability of sequential circuits. By using retiming which rearranges flip-flops, some sequential redundancy is converted into combinational redundancy, which can be easily identified and removed by a combinational test generation technique. Retiming is utilized for two purposes: one is for finding sequential redundancy and another is for reducing the number of flip-flops. Applying retiming and redundancy removal techniques concurrently, testability of sequential circuits is enhanced. Experimental results for ISCAS'89 benchmark circuits show the effectiveness of this method for optimizing circuits.<>

顺序冗余的存在降低了顺序电路的可测试性。通过对触发器进行重新排列，将序列冗余转换为组合冗余，通过组合测试生成技术可以很容易地识别和去除序列冗余。重定时用于两个目的:一个是为了找到顺序冗余，另一个是为了减少触发器的数量。同时采用重定时和冗余去除技术，提高了顺序电路的可测试性。ISCAS’89基准电路的实验结果表明了该方法对电路优化的有效性。

引用次数: 6

A switch-level algorithm for simulation of transients in combinational logic 组合逻辑中瞬态仿真的开关级算法

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466977

P. Dahlgren, P. Lidén

A two-step switch-level algorithm for fault simulation of transients in CMOS networks is presented. The first step models the fault propagation locally from the fault injection site to the subsequent CMOS blocks. It is shown that the pulse width of a transient is a vital parameter in the propagation process. A first-order RC network model for the prediction of the width of transients is used. The second step consists of a set of rules for the propagation of fully developed transients through basic CMOS blocks. The fact that transients may fade out during propagation is efficiently modeled by taking into account their pulse widths. The proposed algorithm shows good agreement with electrical-level simulations in predicting the effects of device-level transients.<>

提出了一种用于CMOS网络暂态故障仿真的两步开关级算法。第一步建模故障从故障注入点到后续CMOS块的局部传播。结果表明，瞬态脉冲的脉宽是传输过程中的一个重要参数。采用一阶RC网络模型预测暂态宽度。第二步包括一组规则，用于通过基本CMOS模块传播完全开发的瞬态。考虑瞬态信号的脉冲宽度，有效地模拟了瞬态信号在传输过程中可能逐渐消失的事实。该算法在预测器件级瞬变效应方面与电级仿真结果吻合较好。

引用次数: 42

A new diagnosis approach for short faults in interconnects 一种新的互连短故障诊断方法

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466966

C. Feng, Wei-Kang Huang, F. Lombardi

Existing one-step diagnosis approaches for faults in interconnects either yield a long test sequence, or use a non-generalized procedure to generate a shorter test sequence. We propose a new diagnosis approach for short faults in interconnects. The pin-adjacency fault model is assumed. By using a divide-and-conquer strategy, our approach can generate a very compact test vector sequence which can diagnose an unrestricted number of short faults. Our experiments for three benchmarks as well as large random interconnects (up to 50,000 nets) show that our approach can achieve more than 50% savings in the length of the generated test sequence. This can significantly save the diagnosis cost for boundary-scan testing. An adaptive diagnosis approach is further proposed to dynamically truncate the originally generated test sequence based on the current information of faulty nets. The performance of our adaptive approach in terms of the on-line test generation time and the resulting test sequence length is better than for existing adaptive diagnosis approaches when the fault rate is not very small, such as in a new product line. If a low complexity for the ATE is of major importance, then the proposed one-step approach is the best choice.<>

现有的互连故障一步诊断方法要么产生较长的测试序列，要么使用非广义程序生成较短的测试序列。提出了一种新的互连短故障诊断方法。假设引脚邻接故障模型。该方法采用分而治之的策略，生成了一个非常紧凑的测试向量序列，该序列可以诊断不受数量限制的短故障。我们对三个基准测试以及大型随机互连(多达50,000个网络)的实验表明，我们的方法可以在生成的测试序列的长度上节省50%以上。这可以显著节省边界扫描检测的诊断成本。提出了一种基于故障网络当前信息动态截断原生成测试序列的自适应诊断方法。当故障率不是很小时，如在新生产线中，我们的自适应方法在在线测试生成时间和生成的测试序列长度方面的性能优于现有的自适应诊断方法。如果ATE的低复杂性非常重要，那么建议的一步方法是最佳选择

{"title":"A new diagnosis approach for short faults in interconnects","authors":"C. Feng, Wei-Kang Huang, F. Lombardi","doi":"10.1109/FTCS.1995.466966","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466966","url":null,"abstract":"Existing one-step diagnosis approaches for faults in interconnects either yield a long test sequence, or use a non-generalized procedure to generate a shorter test sequence. We propose a new diagnosis approach for short faults in interconnects. The pin-adjacency fault model is assumed. By using a divide-and-conquer strategy, our approach can generate a very compact test vector sequence which can diagnose an unrestricted number of short faults. Our experiments for three benchmarks as well as large random interconnects (up to 50,000 nets) show that our approach can achieve more than 50% savings in the length of the generated test sequence. This can significantly save the diagnosis cost for boundary-scan testing. An adaptive diagnosis approach is further proposed to dynamically truncate the originally generated test sequence based on the current information of faulty nets. The performance of our adaptive approach in terms of the on-line test generation time and the resulting test sequence length is better than for existing adaptive diagnosis approaches when the fault rate is not very small, such as in a new product line. If a low complexity for the ATE is of major importance, then the proposed one-step approach is the best choice.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127298173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

The Totem system 图腾系统

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-27 DOI: 10.1109/FTCS.1995.466998

L. Moser, P. Melliar-Smith, D. Agarwal, R. K. Budhia, C. Lingley-Papadopoulos, T. P. Archambault

The Totem system supports fault-tolerant applications in which distributed processes cooperate to perform a common task and in which replicated data must be updated consistently in the presence of asynchrony and faults. Reliable totally ordered delivery of messages to processes within process groups is provided on a single local-area network or over multiple local-area networks interconnected by gateways. Message ordering is consistent across the entire network, despite processor and communication faults, without requiring all processes to deliver all messages. The Totem system handles processor failure and recovery, as well as network partitioning and remerging, and provides membership and topology maintenance services.<>

Totem系统支持容错应用程序，在这些应用程序中，分布式进程协作执行公共任务，并且在存在异步和错误的情况下必须一致地更新复制的数据。在单个局域网或通过网关连接的多个局域网上，向进程组内的进程提供可靠的、完全有序的消息传递。消息排序在整个网络中是一致的，尽管存在处理器和通信错误，但不需要所有进程交付所有消息。Totem系统处理处理器故障和恢复，以及网络分区和合并，并提供成员和拓扑维护服务。

引用次数: 65

Implementing fault tolerant applications using reflective object-oriented programming 使用反射式面向对象编程实现容错应用程序

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

Pub Date : 1995-06-01 DOI: 10.1109/FTCS.1995.466949

J. Fabre, V. Nicomette, T. Pérennou, R. Stroud, Zhixue Wu

Shows how reflection and object-oriented programming can be used to ease the implementation of classical fault tolerance mechanisms in distributed applications. When the underlying runtime system does not provide fault tolerance transparently, classical approaches to implementing fault tolerance mechanisms often imply mixing functional programming with non-functional programming (e.g. error processing mechanisms). The use of reflection improves the transparency of fault tolerance mechanisms to the programmer and more generally provides a clearer separation between functional and non-functional programming. The implementations of some classical replication techniques using a reflective approach are presented in detail and illustrated by several examples, which have been prototyped on a network of Unix workstations. Lessons learnt from our experiments are drawn and future work is discussed.<>

展示了如何使用反射和面向对象编程来简化分布式应用程序中经典容错机制的实现。当底层运行时系统不能透明地提供容错时，实现容错机制的经典方法通常意味着将函数式编程与非函数式编程(例如错误处理机制)混合在一起。反射的使用提高了容错机制对程序员的透明度，并且更普遍地在函数式和非函数式编程之间提供了更清晰的分离。本文详细介绍了使用反射方法实现的一些经典复制技术，并通过几个示例进行了说明，这些示例在Unix工作站的网络上进行了原型化。总结了实验的经验教训，并讨论了今后的工作。

引用次数: 98

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀