International Conference on Dependable Systems and Networks, 2004最新文献

英文中文

Benchmarking the dependability of Windows NT4, 2000 and XP 测试Windows NT4, 2000和XP的可靠性

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311938

A. Kalakech, K. Kanoun, Y. Crouzet, J. Arlat

The aim of this paper is to compare the dependability of three operating systems (Windows NT4, Windows 2000 and Windows XP) with respect to erroneous behavior of the application layer. The results show a similar behavior of the three OSs with respect to robustness and a noticeable difference in OS reaction and restart times. They also show that the application state (mainly the hang and abort states) significantly impacts the restart time for the three OSs.

本文的目的是比较三种操作系统(Windows NT4、Windows 2000和Windows XP)在应用层错误行为方面的可靠性。结果表明，这三种操作系统在鲁棒性方面具有相似的行为，在操作系统反应和重启时间方面存在显著差异。它们还表明，应用程序状态(主要是挂起和中止状态)显著影响了这三种操作系统的重启时间。

引用次数: 59

QoS of timeout-based self-tuned failure detectors: the effects of the communication delay predictor and the safety margin 基于超时的自调优故障检测器的QoS:通信延迟预测器和安全裕度的影响

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311946

Raul Ceretta Nunes, Ingrid Jansch-Pôrto

Unreliable failure detectors have been an important abstraction to build dependable distributed applications over asynchronous distributed systems subject to faults. Their implementations are commonly based on timeouts to ensure algorithm termination. However, for systems built on the Internet, it is hard to estimate this time value due to traffic variations. Thus, different types of predictors have been used to model this behavior and make predictions of delays. In order to increase the quality of service (QoS), self-tuned failure detectors dynamically adapt their timeouts to the communication delay behavior added of a safety margin. In this paper, we evaluate the QoS of a failure detector for different combinations of communication delay predictors and safety margins. As the results show, to improve the QoS, one must consider the relation between the pair predictor/margin, instead of each one separately. Furthermore, performance and accuracy requirements should be considered for a suitable relationship.

不可靠的故障检测器一直是在容易发生故障的异步分布式系统上构建可靠的分布式应用程序的重要抽象。它们的实现通常基于超时来确保算法终止。然而，对于建立在互联网上的系统，由于流量的变化，很难估计这个时间值。因此，不同类型的预测器被用来模拟这种行为，并对延迟进行预测。为了提高服务质量(QoS)，自调优故障检测器动态调整其超时值以适应增加安全裕度的通信延迟行为。在本文中，我们评估了不同组合的通信延迟预测和安全裕度的故障检测器的QoS。结果表明，为了提高QoS，必须考虑对预测器/余量之间的关系，而不是单独考虑每个预测器/余量。此外，应考虑性能和精度要求以建立合适的关系。

引用次数: 44

A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints 可靠性和实时性约束下分布式嵌入式系统的双准则调度启发式算法

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311904

I. Assayad, A. Girault, Hamoudi Kalla

Multi-criteria scheduling problems, involving optimization of more than one criterion, are subject to a growing interest. In this paper, we present a new bi-criteria scheduling heuristic for scheduling data-flow graphs of operations onto parallel heterogeneous architectures according to two criteria: first the minimization of the schedule length, and second the maximization of the system reliability. Reliability is defined as the probability that none of the system components will fail while processing. The proposed algorithm is a list scheduling heuristics, based on a bi-criteria compromise function that introduces priority between the operations to be scheduled, and that chooses on what subset of processors they should be scheduled. It uses the active replication of operations to improve the reliability. If the system reliability or the schedule length requirements are not met, then a parameter of the compromise function can be changed and the algorithm re-executed. This process is iterated until both requirements are met.

多准则调度问题，涉及多个准则的优化，受到越来越多的兴趣。本文提出了一种新的双准则调度启发式算法，根据调度长度最小化和系统可靠性最大化两个准则对并行异构体系结构上的操作数据流图进行调度。可靠性被定义为系统组件在处理过程中不发生故障的概率。所提出的算法是一种列表调度启发式算法，基于双标准折衷函数，该函数引入要调度的操作之间的优先级，并选择应该调度的处理器子集。它通过主动复制操作来提高可靠性。如果不满足系统可靠性或调度长度要求，则可以更改折衷函数的一个参数，重新执行算法。这个过程不断迭代，直到两个需求都得到满足。

引用次数: 102

Model checking a fault-tolerant startup algorithm: from design exploration to exhaustive fault simulation 容错启动算法的模型检验:从设计探索到穷举故障仿真

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311889

W. Steiner, J. Rushby, M. Sorea, H. Pfeifer

The increasing performance of modern model-checking tools offers high potential for the computer-aided design of fault-tolerant algorithms. Instead of relying on human imagination to generate taxing failure scenarios to probe a fault-tolerant algorithm during development, we define the fault behavior of a faulty process at its interfaces to the remaining system and use model checking to automatically examine all possible failure scenarios. We call this approach "exhaustive fault simulation". In this paper we illustrate exhaustive fault simulation using a new startup algorithm for the time-triggered architecture (TTA) and show that this approach is fast enough to be deployed in the design loop. We use the SAL toolset from SRI for our experiments and describe an approach to modeling and analyzing fault-tolerant algorithms that exploits the capabilities of tools such as this.

现代模型检查工具的性能不断提高，为容错算法的计算机辅助设计提供了巨大的潜力。在开发过程中，我们不是依靠人类的想象力来生成复杂的故障场景来探测容错算法，而是在故障过程与剩余系统的接口处定义故障过程的故障行为，并使用模型检查来自动检查所有可能的故障场景。我们称这种方法为“穷举故障模拟”。在本文中，我们使用一种新的时间触发体系结构(TTA)启动算法来演示穷举故障仿真，并表明该方法足够快，可以部署在设计回路中。我们使用来自SRI的SAL工具集进行实验，并描述了一种建模和分析容错算法的方法，该方法利用了诸如此类的工具的功能。

引用次数: 83

Optimal object state transfer - recovery policies for fault tolerant distributed systems 容错分布式系统的最优对象状态转移-恢复策略

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311947

P. Katsaros, C. Lazos

Recent developments in the field of object-based fault tolerance and the advent of the first OMG FT-CORBA compliant middleware raise new requirements for the design process of distributed fault-tolerant systems. In this work, we introduce a simulation-based design approach based on the optimum effectiveness of the compared fault tolerance schemes. Each scheme is defined as a set of fault tolerance properties for the objects that compose the system. Its optimum effectiveness is determined by the tightest effective checkpoint intervals, for the passively replicated objects. Our approach allows mixing miscellaneous fault tolerance policies, as opposed to the published analytic models, which are best suited in the evaluation of single-server process replication schemes. Special emphasis has been given to the accuracy of the generated estimates using an appropriate simulation output analysis procedure. We provide showcase results and compare two characteristic warm passive replication schemes: one with periodic and another one with load-dependent object state checkpoints. Finally, a trade-off analysis is applied, for determining appropriate checkpoint properties, in respect to a specified design goal.

基于对象的容错领域的最新发展和第一个OMG FT-CORBA兼容中间件的出现对分布式容错系统的设计过程提出了新的要求。在这项工作中，我们介绍了一种基于仿真的设计方法，该方法基于比较的容错方案的最佳有效性。每个方案都被定义为组成系统的对象的一组容错属性。对于被动复制对象，其最优有效性由最紧密的有效检查点间隔决定。我们的方法允许混合各种容错策略，这与已发布的分析模型相反，后者最适合评估单服务器流程复制方案。特别强调了使用适当的模拟输出分析程序所产生的估计的准确性。我们提供了展示结果，并比较了两种典型的热被动复制模式:一种具有周期性，另一种具有负载依赖的对象状态检查点。最后，应用权衡分析，根据指定的设计目标确定适当的检查点属性。

{"title":"Optimal object state transfer - recovery policies for fault tolerant distributed systems","authors":"P. Katsaros, C. Lazos","doi":"10.1109/DSN.2004.1311947","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311947","url":null,"abstract":"Recent developments in the field of object-based fault tolerance and the advent of the first OMG FT-CORBA compliant middleware raise new requirements for the design process of distributed fault-tolerant systems. In this work, we introduce a simulation-based design approach based on the optimum effectiveness of the compared fault tolerance schemes. Each scheme is defined as a set of fault tolerance properties for the objects that compose the system. Its optimum effectiveness is determined by the tightest effective checkpoint intervals, for the passively replicated objects. Our approach allows mixing miscellaneous fault tolerance policies, as opposed to the published analytic models, which are best suited in the evaluation of single-server process replication schemes. Special emphasis has been given to the accuracy of the generated estimates using an appropriate simulation output analysis procedure. We provide showcase results and compare two characteristic warm passive replication schemes: one with periodic and another one with load-dependent object state checkpoints. Finally, a trade-off analysis is applied, for determining appropriate checkpoint properties, in respect to a specified design goal.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"97 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129998517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Min-max checkpoint placement under incomplete failure information 不完整失败信息下最小-最大检查点位置

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311943

T. Ozaki, T. Dohi, H. Okamura, N. Kaio

In this paper we consider two kinds of sequential checkpoint placement problems with infinite/finite time horizon. For these problems, we apply the approximation methods based on the variational principle and develop the computation algorithms to derive the optimal checkpoint sequence approximately. Next, we focus on the situation where the knowledge on system failure is incomplete, i.e. the system failure time distribution is unknown. We develop the so-called min-max checkpoint placement methods to determine the optimal checkpoint sequence under the uncertain circumstance in terms of the system failure time distribution. In numerical examples, we investigate quantitatively the min-max checkpoint placement methods, and refer to their potential applicability in practice.

本文研究了两类具有无限/有限时间范围的顺序检查点布置问题。针对这些问题，我们采用了基于变分原理的近似方法，并发展了近似求出最优检查点序列的计算算法。其次，我们关注系统故障知识不完全的情况，即系统故障时间分布是未知的。针对系统故障时间分布不确定情况下的最优检查点序列，提出了最小-最大检查点布置方法。在数值算例中，我们定量地研究了最小-最大检查点放置方法，并参考了它们在实践中的潜在适用性。

引用次数: 19

A Markov reward model for reliable synchronous dataflow system design 可靠同步数据流系统设计的马尔可夫奖励模型

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311952

Vinu Vijay Kumar, Rashi Verma, J. Lach, J. Dugan

The design of quality digital systems depends on models that accurately evaluate various options in the design space against a set of prioritized metrics. While individual models for evaluating area, performance, reliability, power, etc. are well established, models combining multiple metrics are less mature. This paper introduces a formal methodology for comprehensively analyzing performance, area and reliability in the design of synchronous dataflow systems using a novel Markov Reward Model. A Markov chain system reliability model is constructed for various design options in the presence of possible component failures, and high-level synthesis techniques are used to associate performance and area rewards with each state in the chain. The cumulative reward for a chain is then used to evaluate the corresponding design option with respect to the metrics of interest. Application of the model to a benchmark DSP circuit provides insights into reliable synchronous dataflow system design.

高质量数字系统的设计依赖于根据一组优先指标准确评估设计空间中的各种选项的模型。虽然用于评估面积、性能、可靠性、功率等的单个模型已经建立，但结合多个指标的模型还不太成熟。本文介绍了一种利用新颖的马尔可夫奖励模型对同步数据流系统设计中的性能、面积和可靠性进行综合分析的形式化方法。在存在可能的组件故障的情况下，为各种设计选项构建了马尔可夫链系统可靠性模型，并使用高级综合技术将性能和区域奖励与链中的每个状态关联起来。然后使用链的累积奖励来评估与兴趣指标相关的相应设计选项。将该模型应用于基准DSP电路，为可靠的同步数据流系统设计提供了见解。

引用次数: 7

Delivering packets during the routing convergence latency interval through highly connected detours 在路由收敛延迟时间内，通过高连接弯路发送报文

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311919

E. P. Duarte, Rogério Santini, Jaime Cohen

Routing protocols present a convergence latency for all routers to update their tables after a fault occurs and the network topology changes. During this time interval, which in the Internet has been shown to be of up to minutes, packets may be lost before reaching their destinations. In order to allow nodes to continue communicating during the convergence latency interval, we propose the use of alternative routes called detours. In this work we introduce new criteria for selecting detours based on network connectivity. Detours are chosen without the knowledge of which node or link is faulty. Highly connected components present a larger number of distinct paths, thus increasing the probability that the detour will work correctly. Experimental results were obtained with simulation on random Internet-like graphs generated with the Waxman method. Results show that the fault coverage obtained through the usage of the best detour is up to 90%. When the three best detours are considered, the fault coverage is up to 98%.

路由协议为所有路由器在故障发生和网络拓扑变化后更新表提供了收敛延迟。在这段时间间隔(在因特网上已被证明长达几分钟)中，数据包可能在到达目的地之前丢失。为了允许节点在收敛延迟期间继续通信，我们建议使用称为绕路的替代路由。在这项工作中，我们引入了基于网络连通性选择弯路的新标准。在不知道哪个节点或链路有故障的情况下选择弯路。高度连接的组件呈现更多不同的路径，从而增加绕行正确工作的概率。对Waxman方法生成的随机类互联网图进行了仿真，得到了实验结果。结果表明，采用最佳绕行方法获得的故障覆盖率可达90%。当考虑三个最佳弯路时，故障覆盖率高达98%。

引用次数: 15

Experience with evaluating human-assisted recovery processes 具有评估人类辅助恢复过程的经验

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311910

Aaron B. Brown, Leonard Chung, William Kakes, C. Ling, D. Patterson

We describe an approach to quantitatively evaluating human-assisted failure-recovery tools and processes in the environment of modern Internetand enterprise-class server systems. Our approach can quantify the dependability impact of a single recovery system, and also enables comparisons between different recovery approaches. The approach combines aspects of dependability benchmarking with human user studies, incorporating human participants in the system evaluations yet still producing typical dependability-related metrics as results. We illustrate our methodology via a case study of a system-wide undo/redo recovery tool for e-mail services; our approach is able to expose the dependability benefits of the tool as well as point out areas where its behavior could use improvement.

我们描述了一种在现代互联网和企业级服务器系统环境中定量评估人工辅助故障恢复工具和过程的方法。我们的方法可以量化单个恢复系统的可靠性影响，也可以对不同的恢复方法进行比较。该方法将可靠性基准测试与人类用户研究相结合，将人类参与者纳入系统评估，但仍然产生典型的与可靠性相关的度量作为结果。我们通过电子邮件服务的全系统撤销/重做恢复工具的案例研究来说明我们的方法;我们的方法能够揭示该工具的可靠性优势，并指出其行为可以改进的领域。

引用次数: 19

Customizing dependability attributes for mobile service platforms 自定义移动服务平台的可靠性属性

International Conference on Dependable Systems and Networks, 2004

Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311932

Jun He, M. Hiltunen, R. Schlichting

Mobile service platforms are used to facilitate access to enterprise services such as email, product inventory, or design drawing databases by a wide range of mobile devices using a variety of access protocols. This paper presents a quality of service (QoS) architecture that allows flexible combinations of dependability attributes such as reliability, timeliness, and security to be enforced on a per service request basis. In addition to components that implement the underlying dependability techniques, the architecture includes policy components that evaluate a request's requirements and dynamically determine an appropriate execution strategy. The architecture has been integrated into an experimental version of iMobile, a mobile service platform being developed at AT&T. This paper describes the design and implementation of the architecture, and gives initial experimental results for the iMobile prototype.

移动服务平台用于通过使用各种访问协议的各种移动设备方便地访问企业服务，如电子邮件、产品库存或设计图纸数据库。本文提出了一种服务质量(QoS)体系结构，它允许在每个服务请求的基础上实现可靠性、时效性和安全性等可靠性属性的灵活组合。除了实现底层可靠性技术的组件外，体系结构还包括评估请求需求并动态确定适当执行策略的策略组件。该架构已被整合到AT&T正在开发的移动服务平台iMobile的实验版本中。本文描述了该体系结构的设计和实现，并给出了iMobile原型的初步实验结果。

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Conference on Dependable Systems and Networks, 2004

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀