首页 > 最新文献

International Conference on Dependable Systems and Networks, 2004最新文献

英文 中文
Collective endorsement and the dissemination problem in malicious environments 集体背书与恶意环境下的传播问题
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311922
Subramanian Lakshmanan, D. J. Manohar, M. Ahamad, H. Venkateswaran
We consider the problem of disseminating an update known to a set of servers to other servers in the system via a gossip protocol. Some of the servers can exhibit malicious behavior. We require that only the updates introduced by authorized clients are accepted by non-malicious servers. Spurious updates, in particular those generated by compromised nodes, are not accepted by non-malicious servers. We take the approach of collective endorsement where each server endorses an accepted update by computing a list of message authentication codes with symmetric keys allocated to it. We use a novel key allocation scheme that allocates a set of symmetric keys to each participating server to minimize the total number of keys. Our protocol is designed to minimize update diffusion time. In the absence of faulty nodes, its diffusion time is O(log n), which is the best possible time achieved when nodes only suffer from benign faults. If the actual number of Byzantine faults experienced during an update's dissemination is f, diffusion time increases to O(log n) + f. This is better than the latency of previously known protocols that take O(log n) +b time, where b is the assumed threshold that defines the maximum number of malicious servers that can be tolerated rather than f, the actual number of failures. The buffer requirements and message sizes are higher in our protocol than other known protocols, thus it trades off memory and bandwidth resources to improve latency.
我们考虑通过八卦协议将一组服务器已知的更新传播到系统中的其他服务器的问题。有些服务器可能表现出恶意行为。我们要求只有授权客户端引入的更新才能被非恶意服务器接受。虚假的更新,特别是那些由受损节点生成的更新,不会被非恶意服务器接受。我们采用集体认可的方法,其中每个服务器通过计算分配给它的具有对称密钥的消息身份验证码列表来认可可接受的更新。我们使用一种新的密钥分配方案,为每个参与的服务器分配一组对称密钥,以最小化密钥总数。我们的协议旨在最小化更新扩散时间。在无故障节点时,其扩散时间为O(log n),这是节点仅发生良性故障时所能达到的最佳时间。如果在更新传播过程中经历的实际拜占庭故障数量为f,则扩散时间增加到O(log n) + f。这比之前已知协议的延迟时间(O(log n) +b)要好,其中b是定义可以容忍的最大恶意服务器数量的假设阈值,而不是f,实际故障数量。与其他已知协议相比,我们协议中的缓冲区需求和消息大小更高,因此它需要权衡内存和带宽资源来改善延迟。
{"title":"Collective endorsement and the dissemination problem in malicious environments","authors":"Subramanian Lakshmanan, D. J. Manohar, M. Ahamad, H. Venkateswaran","doi":"10.1109/DSN.2004.1311922","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311922","url":null,"abstract":"We consider the problem of disseminating an update known to a set of servers to other servers in the system via a gossip protocol. Some of the servers can exhibit malicious behavior. We require that only the updates introduced by authorized clients are accepted by non-malicious servers. Spurious updates, in particular those generated by compromised nodes, are not accepted by non-malicious servers. We take the approach of collective endorsement where each server endorses an accepted update by computing a list of message authentication codes with symmetric keys allocated to it. We use a novel key allocation scheme that allocates a set of symmetric keys to each participating server to minimize the total number of keys. Our protocol is designed to minimize update diffusion time. In the absence of faulty nodes, its diffusion time is O(log n), which is the best possible time achieved when nodes only suffer from benign faults. If the actual number of Byzantine faults experienced during an update's dissemination is f, diffusion time increases to O(log n) + f. This is better than the latency of previously known protocols that take O(log n) +b time, where b is the assumed threshold that defines the maximum number of malicious servers that can be tolerated rather than f, the actual number of failures. The buffer requirements and message sizes are higher in our protocol than other known protocols, thus it trades off memory and bandwidth resources to improve latency.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128966389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Timed uniform consensus resilient to crash and timing faults 对崩溃和定时错误具有弹性的定时一致共识
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311894
Taisuke Izumi, Akinori Saitoh, T. Masuzawa
/spl Delta/-timed uniform consensus is a stronger variant of the traditional consensus and it satisfies the following additional property: The correct process terminates its execution within a constant time /spl Delta/ (/spl Delta/-timeliness), and no two processes decide differently (uniformity). In this paper, we consider the /spl Delta/-timed uniform consensus problem in presence of f/sub t/ crash processes and f/sub c/ timing-faulty processes. This paper proposes a /spl Delta/-timed uniform consensus algorithms. The proposed algorithm is adaptive in the following sense: It solves the /spl Delta/-timed uniform consensus when at least f/sub t/ + 1 correct processes exist in the system. If the system has less than f/sub t/ + 1 correct processes, the algorithm cannot solve the /spl Delta/-timed uniform consensus. However, as long as f/sub t/ + 1 processes are non-crashed, the algorithm solves (non-timed) uniform consensus. We also investigate the maximum number of faulty processes that can be tolerated. We show that any /spl Delta/-timed uniform consensus algorithm tolerating up to f/sub t/ timing-faulty processes requires that the system has at least f/sub t/ + 1 correct processes. This impossibility result implies that the proposed algorithm attains the maximal resilience about the number of faulty processes. We also show that any /spl Delta/-timed uniform consensus algorithm tolerating up to f/sub t/ timing-faulty processes cannot solve the (non-timed) uniform consensus when the system has less than f/sub t/ + 1 non-crashed processes. This impossibility result implies that our algorithm attains the maximum adaptiveness.
/spl Delta/-时间统一共识是传统共识的一个更强的变体,它满足以下附加性质:正确的进程在恒定的时间/spl Delta/ (/spl Delta/-时效性)内终止其执行,并且没有两个进程的决定不同(一致性)。本文研究了存在f/sub -t /崩溃过程和f/sub - c/定时错误过程的/spl Delta/定时一致一致问题。本文提出了a/ spl Delta/定时一致一致性算法。该算法具有自适应性:解决了系统中至少存在f/ t/ + 1个正确进程时的/spl Delta/-时间一致一致性问题。如果系统的正确进程小于f/ t/ + 1,则算法无法解决/spl Delta/-时间一致一致性。然而,只要f/下标t/ + 1进程不崩溃,该算法解决(非定时)一致共识。我们还研究了可以容忍的错误过程的最大数量。我们证明了任何允许f/ t/时间错误过程的/spl Delta/时间一致一致算法要求系统至少具有f/ t/ + 1个正确过程。这一不可能结果表明,该算法对故障进程的数量具有最大的弹性。我们还证明,当系统的非崩溃进程少于f/ t/ + 1个时,任何允许f/ t/定时错误进程的/spl Delta/定时一致共识算法都不能解决(非定时)一致共识。这个不可能的结果表明我们的算法达到了最大的自适应。
{"title":"Timed uniform consensus resilient to crash and timing faults","authors":"Taisuke Izumi, Akinori Saitoh, T. Masuzawa","doi":"10.1109/DSN.2004.1311894","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311894","url":null,"abstract":"/spl Delta/-timed uniform consensus is a stronger variant of the traditional consensus and it satisfies the following additional property: The correct process terminates its execution within a constant time /spl Delta/ (/spl Delta/-timeliness), and no two processes decide differently (uniformity). In this paper, we consider the /spl Delta/-timed uniform consensus problem in presence of f/sub t/ crash processes and f/sub c/ timing-faulty processes. This paper proposes a /spl Delta/-timed uniform consensus algorithms. The proposed algorithm is adaptive in the following sense: It solves the /spl Delta/-timed uniform consensus when at least f/sub t/ + 1 correct processes exist in the system. If the system has less than f/sub t/ + 1 correct processes, the algorithm cannot solve the /spl Delta/-timed uniform consensus. However, as long as f/sub t/ + 1 processes are non-crashed, the algorithm solves (non-timed) uniform consensus. We also investigate the maximum number of faulty processes that can be tolerated. We show that any /spl Delta/-timed uniform consensus algorithm tolerating up to f/sub t/ timing-faulty processes requires that the system has at least f/sub t/ + 1 correct processes. This impossibility result implies that the proposed algorithm attains the maximal resilience about the number of faulty processes. We also show that any /spl Delta/-timed uniform consensus algorithm tolerating up to f/sub t/ timing-faulty processes cannot solve the (non-timed) uniform consensus when the system has less than f/sub t/ + 1 non-crashed processes. This impossibility result implies that our algorithm attains the maximum adaptiveness.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117001829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Support for mobility and fault tolerance in Mykil 在Mykil中支持移动性和容错
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311923
Jyh-How Huang, Shivakant Mishra
This paper describes the support provided for mobility and fault tolerance in Mykil, which is a key distribution protocol for large, secure group multicast. Mykil is based on a combination of group-based hierarchy and key-based hierarchy systems. Important advantages of Mykil include a fast and efficient rekeying operation for large group sizes, continuous availability of the key management service in a disconnected network environment, an ability to map group structure to the underlying network infrastructure, fault tolerance, and support for member mobility and smaller hand-held devices.
Mykil是一种大型安全组播密钥分发协议,本文描述了该协议对可移动性和容错性的支持。Mykil基于基于组的层次结构和基于键的层次结构系统的组合。Mykil的重要优点包括:针对大型组规模的快速高效的密钥更新操作、在断开连接的网络环境中密钥管理服务的持续可用性、将组结构映射到底层网络基础设施的能力、容错性以及对成员移动性和小型手持设备的支持。
{"title":"Support for mobility and fault tolerance in Mykil","authors":"Jyh-How Huang, Shivakant Mishra","doi":"10.1109/DSN.2004.1311923","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311923","url":null,"abstract":"This paper describes the support provided for mobility and fault tolerance in Mykil, which is a key distribution protocol for large, secure group multicast. Mykil is based on a combination of group-based hierarchy and key-based hierarchy systems. Important advantages of Mykil include a fast and efficient rekeying operation for large group sizes, continuous availability of the key management service in a disconnected network environment, an ability to map group structure to the underlying network infrastructure, fault tolerance, and support for member mobility and smaller hand-held devices.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128160887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Implementing simple replication protocols using CORBA portable interceptors and Java serialization 使用CORBA可移植拦截器和Java序列化实现简单复制协议
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311924
M. Bennani, L. Blain, Ludovic Courtès, J. Fabre, M. Killijian, E. Marsden, François Taïani
The goal of this paper is to assess the value of simple features that are widely available in off-the-shelf CORBA and Java platforms for the implementation of fault-tolerance mechanisms in industry-grade systems. This work builds on knowledge gained at LAAS from previous work on the prototyping of reflective fault tolerant frameworks. We describe how we used the interception and state capture mechanisms that are available in CORBA and Java to implement a simple replication strategy on a small middleware-based system built upon GNU/Linux and JOrbacus. We discuss the benefits and the limits of the resulting system from a practical point of view.
本文的目标是评估在工业级系统中实现容错机制的现成CORBA和Java平台中广泛可用的简单特性的价值。这项工作建立在LAAS从以前关于反射容错框架原型的工作中获得的知识之上。我们将描述如何使用CORBA和Java中可用的拦截和状态捕获机制,在基于GNU/Linux和JOrbacus的基于中间件的小型系统上实现简单的复制策略。我们从实际的角度讨论了由此产生的系统的优点和局限性。
{"title":"Implementing simple replication protocols using CORBA portable interceptors and Java serialization","authors":"M. Bennani, L. Blain, Ludovic Courtès, J. Fabre, M. Killijian, E. Marsden, François Taïani","doi":"10.1109/DSN.2004.1311924","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311924","url":null,"abstract":"The goal of this paper is to assess the value of simple features that are widely available in off-the-shelf CORBA and Java platforms for the implementation of fault-tolerance mechanisms in industry-grade systems. This work builds on knowledge gained at LAAS from previous work on the prototyping of reflective fault tolerant frameworks. We describe how we used the interception and state capture mechanisms that are available in CORBA and Java to implement a simple replication strategy on a small middleware-based system built upon GNU/Linux and JOrbacus. We discuss the benefits and the limits of the resulting system from a practical point of view.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127235314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
The recursive nanobox processor grid: a reliable system architecture for unreliable nanotechnology devices 递归纳米盒处理器网格:不可靠纳米技术设备的可靠系统架构
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311887
A. KleinOsowski, K. KleinOsowski, V. Rangarajan, P. Ranganath, D. Lilja
Advanced molecular nanotechnology devices are expected to have exceedingly high transient fault rates and large numbers of inherent device defects compared to conventional CMOS devices. We introduce the recursive nanobox processor grid as an application specific, fault-tolerant, parallel computing system designed for fabrication with unreliable nanotechnology devices. In this initial study we construct VHDL models of the nanobox processor cell ALU and evaluate the effectiveness of our recursive fault masking approach in the presence of random transient errors. Our analysis shows that the ALU can calculate correctly 100 percent of the time with raw FIT (failures in time) rates as high as 10/sub 23/. We achieve this error correction with an area overhead on the order of 9x, which is quite reasonable given the high integration densities expected with nanodevices.
与传统CMOS器件相比,先进的分子纳米技术器件预计具有极高的瞬态故障率和大量的固有器件缺陷。我们介绍递归纳米盒处理器网格作为一个特定的应用,容错,并行计算系统,设计用于制造不可靠的纳米技术设备。在这项初步研究中,我们构建了纳米盒处理器单元ALU的VHDL模型,并评估了递归故障掩蔽方法在随机瞬态错误存在下的有效性。我们的分析表明,在原始FIT(时间失效)率高达10/sub 23/的情况下,ALU可以在100%的时间内正确计算。我们用大约9倍的面积开销实现了这种误差校正,考虑到纳米器件预期的高集成密度,这是相当合理的。
{"title":"The recursive nanobox processor grid: a reliable system architecture for unreliable nanotechnology devices","authors":"A. KleinOsowski, K. KleinOsowski, V. Rangarajan, P. Ranganath, D. Lilja","doi":"10.1109/DSN.2004.1311887","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311887","url":null,"abstract":"Advanced molecular nanotechnology devices are expected to have exceedingly high transient fault rates and large numbers of inherent device defects compared to conventional CMOS devices. We introduce the recursive nanobox processor grid as an application specific, fault-tolerant, parallel computing system designed for fabrication with unreliable nanotechnology devices. In this initial study we construct VHDL models of the nanobox processor cell ALU and evaluate the effectiveness of our recursive fault masking approach in the presence of random transient errors. Our analysis shows that the ALU can calculate correctly 100 percent of the time with raw FIT (failures in time) rates as high as 10/sub 23/. We achieve this error correction with an area overhead on the order of 9x, which is quite reasonable given the high integration densities expected with nanodevices.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130450549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Repairable fault tree for the automatic evaluation of repair policies 可修复故障树,用于自动评估修复策略
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311936
D. Raiteri, M. Iacono, G. Franceschinis, V. Vittorini
Fault trees are a well known mean for the evaluation of dependability of complex systems. Many extensions have been proposed to the original formalism in order to enhance the advantages of fault tree analysis for the design and assessment of systems. In this paper we propose an extension, repairable fault trees, which allows the designer to evaluate the effects of different repair policies on a repairable system: this extended formalism has been integrated in a multi-formalism multi-solution framework, and it is supported by a solution technique which transparently exploits generalized stochastic Petri nets (GSPN)for modelling the repairing process. The modelling technique and the solution process are illustrated through an example.
故障树是评估复杂系统可靠性的常用方法。为了增强故障树分析在系统设计和评估中的优势,对原有的形式体系进行了许多扩展。在本文中,我们提出了一种扩展,可修故障树,它允许设计者评估不同的修理策略对可修系统的影响:这种扩展的形式已经集成在一个多形式的多解框架中,并由一种求解技术支持,该技术透明地利用广义随机Petri网(GSPN)对修理过程进行建模。通过算例说明了建模技术和求解过程。
{"title":"Repairable fault tree for the automatic evaluation of repair policies","authors":"D. Raiteri, M. Iacono, G. Franceschinis, V. Vittorini","doi":"10.1109/DSN.2004.1311936","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311936","url":null,"abstract":"Fault trees are a well known mean for the evaluation of dependability of complex systems. Many extensions have been proposed to the original formalism in order to enhance the advantages of fault tree analysis for the design and assessment of systems. In this paper we propose an extension, repairable fault trees, which allows the designer to evaluate the effects of different repair policies on a repairable system: this extended formalism has been integrated in a multi-formalism multi-solution framework, and it is supported by a solution technique which transparently exploits generalized stochastic Petri nets (GSPN)for modelling the repairing process. The modelling technique and the solution process are illustrated through an example.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131532414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Safety optimization: a combination of fault tree analysis and optimization techniques 安全优化:故障树分析和优化技术的结合
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311935
F. Ortmeier, W. Reif
We present a new form of quantitative safety analysis -safety optimization. This method is a combination of fault tree analysis (FTA) and mathematical optimization techniques. With the use of the results of FTA, statistics, and a quantification of the costs of hazards, it allows to find the optimal configuration of a given system with respect to opposed safety requirements. Furthermore, the system may not only be examined for safety, but usability as well. We illustrate this method on a real-world case study: the height control system of the Elbtunnel in Hamburg. Safety optimization showed some significant problems in trustworthiness of the system, yielded optimal values for configuration of free parameters and showed possible modifications to improve the system.
提出了一种定量安全分析的新形式——安全优化。该方法将故障树分析与数学优化技术相结合。通过使用FTA的结果、统计数据和危害成本的量化,它可以根据相反的安全要求找到给定系统的最佳配置。此外,该系统不仅要检查安全性,还要检查可用性。我们通过一个现实世界的案例研究来说明这种方法:汉堡肘部隧道的高度控制系统。安全优化显示了系统可靠性方面的一些重大问题,给出了自由参数配置的最优值,并显示了改进系统的可能修改。
{"title":"Safety optimization: a combination of fault tree analysis and optimization techniques","authors":"F. Ortmeier, W. Reif","doi":"10.1109/DSN.2004.1311935","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311935","url":null,"abstract":"We present a new form of quantitative safety analysis -safety optimization. This method is a combination of fault tree analysis (FTA) and mathematical optimization techniques. With the use of the results of FTA, statistics, and a quantification of the costs of hazards, it allows to find the optimal configuration of a given system with respect to opposed safety requirements. Furthermore, the system may not only be examined for safety, but usability as well. We illustrate this method on a real-world case study: the height control system of the Elbtunnel in Hamburg. Safety optimization showed some significant problems in trustworthiness of the system, yielded optimal values for configuration of free parameters and showed possible modifications to improve the system.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133860357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Automated system design for availability 自动化系统设计的可用性
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311911
G. Janakiraman, J. R. Santos, Yoshio Turner
Large-scale systems experience frequent failures which can result in unacceptably high service downtime or application execution time. To meet performance and availability requirements, the user must perform a complex design task including the selection and configuration of hardware and software components and mechanisms for handling failures. We believe users should be relieved of this burden by automating the design process in order to generate cost-effective solutions from high-level application requirements. In this paper, we present Aved, a proof of concept design automation engine which is a first step toward this goal. We describe how infrastructure choices, application models, and user requirements are represented with Aved to automate design space search and reason about design alternatives. We additionally present examples to illustrate how Aved can generate a complete picture of the cost-availability and cost-performance tradeoffs for the infrastructure design.
大规模系统会经历频繁的故障,这可能会导致不可接受的高服务停机时间或应用程序执行时间。为了满足性能和可用性要求,用户必须执行复杂的设计任务,包括选择和配置硬件和软件组件以及处理故障的机制。我们相信用户应该通过自动化设计过程来减轻这种负担,以便从高级应用程序需求中生成具有成本效益的解决方案。在本文中,我们提出了一个概念验证设计自动化引擎,它是实现这一目标的第一步。我们描述了如何使用ave来表示基础设施选择、应用程序模型和用户需求,从而自动化设计空间搜索和设计备选方案的推理。我们还提供了一些示例来说明Aved如何生成基础设施设计的成本-可用性和成本-性能权衡的完整图像。
{"title":"Automated system design for availability","authors":"G. Janakiraman, J. R. Santos, Yoshio Turner","doi":"10.1109/DSN.2004.1311911","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311911","url":null,"abstract":"Large-scale systems experience frequent failures which can result in unacceptably high service downtime or application execution time. To meet performance and availability requirements, the user must perform a complex design task including the selection and configuration of hardware and software components and mechanisms for handling failures. We believe users should be relieved of this burden by automating the design process in order to generate cost-effective solutions from high-level application requirements. In this paper, we present Aved, a proof of concept design automation engine which is a first step toward this goal. We describe how infrastructure choices, application models, and user requirements are represented with Aved to automate design space search and reason about design alternatives. We additionally present examples to illustrate how Aved can generate a complete picture of the cost-availability and cost-performance tradeoffs for the infrastructure design.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124813821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
In advance activation of backup channels for real-time transmission 提前激活备份信道进行实时传输
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311925
Enrique Hernández-Orallo, Joan Vila i Carbó
Real-time transmission implies guaranteeing a given quality of service (QoS), requiring large use of network resources. Backup channels introduce the notion of availability to real-time transmission at the cost of increasing the use of network resources. However, this over-provisioning of resources is potentially wasted, since fault rate is very low. This paper introduces a new failure detection scheme for real-time transmission called proactive backup channel. This scheme is based on activating the backup channel before a fail is produced. As proven in the paper, this scheme reduces the use of network resources and is suitable for integrated and differentiated services.
实时传输意味着保证给定的服务质量(QoS),需要大量使用网络资源。备份通道以增加网络资源的使用为代价,为实时传输引入了可用性的概念。然而,由于故障率非常低,这种资源的过度供应可能会被浪费。本文介绍了一种新的实时传输故障检测方案——主动备份信道。该方案基于在故障发生之前激活备份通道。经论文验证,该方案减少了网络资源的使用,适合于综合差异化业务。
{"title":"In advance activation of backup channels for real-time transmission","authors":"Enrique Hernández-Orallo, Joan Vila i Carbó","doi":"10.1109/DSN.2004.1311925","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311925","url":null,"abstract":"Real-time transmission implies guaranteeing a given quality of service (QoS), requiring large use of network resources. Backup channels introduce the notion of availability to real-time transmission at the cost of increasing the use of network resources. However, this over-provisioning of resources is potentially wasted, since fault rate is very low. This paper introduces a new failure detection scheme for real-time transmission called proactive backup channel. This scheme is based on activating the backup channel before a fail is produced. As proven in the paper, this scheme reduces the use of network resources and is suitable for integrated and differentiated services.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125356374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hierarchical computation of interval availability and related metrics 区间可用性及相关度量的分层计算
Pub Date : 2004-06-28 DOI: 10.1109/DSN.2004.1311940
D. Tang, Kishor S. Trivedi
As the new generation high-availability commercial computer systems incorporate deferred repair service strategies, steady-state availability metrics may no longer reflect reality. Transient solution of availability models for such systems to calculate interval availability over shorter time horizon is desirable. While many solution methods for transient analysis have been proposed, how to apply these methods on hierarchical models has not been well addressed. This paper describes an approach to computing interval availability and related metrics for hierarchical Markov models. The approach divides the time interval of interest into small subintervals such that the input parameters can be treated as constants in each subinterval to make the model satisfy the homogeneous Markov property, and then pass the output interval availability metrics as constants from the sub-model to its parent model. Finally, these quantities are integrated to obtain the expected interval availability for the entire interval. The study also addresses methods of passing parameters across levels for generating multiple metrics from a hierarchical model. The approach is illustrated with an example model and has been implemented in RAScad. All computations for the example model have also been carried out using the SHARPE textual language interface.
随着新一代高可用性商用计算机系统采用延迟维修服务策略,稳态可用性度量可能不再反映现实。这类系统需要可用性模型的暂态解,以便在较短的时间范围内计算区间可用性。虽然已有许多暂态分析的求解方法,但如何将这些方法应用于层次模型还没有得到很好的解决。本文描述了一种计算层次马尔可夫模型的区间可用性和相关度量的方法。该方法将感兴趣的时间区间划分为小的子区间,将输入参数作为每个子区间的常数处理,使模型满足齐次马尔可夫性质,然后将输出区间可用性指标作为常数从子模型传递给父模型。最后,对这些量进行积分,得到整个区间的预期区间可用性。该研究还讨论了从层次模型生成多个度量的跨级别传递参数的方法。通过实例模型说明了该方法,并在RAScad中实现了该方法。实例模型的所有计算也使用SHARPE文本语言接口进行。
{"title":"Hierarchical computation of interval availability and related metrics","authors":"D. Tang, Kishor S. Trivedi","doi":"10.1109/DSN.2004.1311940","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311940","url":null,"abstract":"As the new generation high-availability commercial computer systems incorporate deferred repair service strategies, steady-state availability metrics may no longer reflect reality. Transient solution of availability models for such systems to calculate interval availability over shorter time horizon is desirable. While many solution methods for transient analysis have been proposed, how to apply these methods on hierarchical models has not been well addressed. This paper describes an approach to computing interval availability and related metrics for hierarchical Markov models. The approach divides the time interval of interest into small subintervals such that the input parameters can be treated as constants in each subinterval to make the model satisfy the homogeneous Markov property, and then pass the output interval availability metrics as constants from the sub-model to its parent model. Finally, these quantities are integrated to obtain the expected interval availability for the entire interval. The study also addresses methods of passing parameters across levels for generating multiple metrics from a hierarchical model. The approach is illustrated with an example model and has been implemented in RAScad. All computations for the example model have also been carried out using the SHARPE textual language interface.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125937769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
International Conference on Dependable Systems and Networks, 2004
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1