Availability Assessment of HA Standby Redundant Clusters

S. Distefano, F. Longo, M. Scarpa
{"title":"Availability Assessment of HA Standby Redundant Clusters","authors":"S. Distefano, F. Longo, M. Scarpa","doi":"10.1109/SRDS.2010.37","DOIUrl":null,"url":null,"abstract":"Computing systems are becoming the heart of modern technology, implementing critical tasks usually demanded to and implying human interactions. This highlights the problem of dependability in computer science contexts. High availability computing/clusters is a possible solution in such cases, implementing standby redundancy as a trade-off between dependability and costs. From the engineering perspective, this implies the use of specific techniques and tools for adequately evaluating the reliability/availability of high availability clusters, also taking into account dependencies among nodes (standby, repair, etc.) and the effect of wear and tear into such nodes, especially when failure and repair times are not exponentially distributed. The solution proposed in this paper is based on the use of phase type distributions and Kronecker algebra. In fact, we represent the reliability and maintainability of each component by specific phase type distributions, whose interactions describe the system availability. This latter is thus modeled by an expanded Markov chain expressed in terms of Kronecker algebra in order to face the state space explosion problem of expansion techniques and to represent the memory policies related to the aging process. More specifically, the paper firstly details the technique and then applies it to the evaluation of a standby redundant system representing a high availability cluster taken as example with the aim of demonstrating its effectiveness. Moreover, in order to show the potentiality of the technique, different maintenance strategies are evaluated and therefore compared.","PeriodicalId":219204,"journal":{"name":"2010 29th IEEE Symposium on Reliable Distributed Systems","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 29th IEEE Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2010.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

Abstract

Computing systems are becoming the heart of modern technology, implementing critical tasks usually demanded to and implying human interactions. This highlights the problem of dependability in computer science contexts. High availability computing/clusters is a possible solution in such cases, implementing standby redundancy as a trade-off between dependability and costs. From the engineering perspective, this implies the use of specific techniques and tools for adequately evaluating the reliability/availability of high availability clusters, also taking into account dependencies among nodes (standby, repair, etc.) and the effect of wear and tear into such nodes, especially when failure and repair times are not exponentially distributed. The solution proposed in this paper is based on the use of phase type distributions and Kronecker algebra. In fact, we represent the reliability and maintainability of each component by specific phase type distributions, whose interactions describe the system availability. This latter is thus modeled by an expanded Markov chain expressed in terms of Kronecker algebra in order to face the state space explosion problem of expansion techniques and to represent the memory policies related to the aging process. More specifically, the paper firstly details the technique and then applies it to the evaluation of a standby redundant system representing a high availability cluster taken as example with the aim of demonstrating its effectiveness. Moreover, in order to show the potentiality of the technique, different maintenance strategies are evaluated and therefore compared.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HA备冗余集群可用性评估
计算系统正在成为现代技术的核心,实现通常需要和暗示人类互动的关键任务。这突出了计算机科学背景下的可靠性问题。在这种情况下,高可用性计算/集群是一种可能的解决方案,实现备用冗余作为可靠性和成本之间的权衡。从工程的角度来看,这意味着使用特定的技术和工具来充分评估高可用性集群的可靠性/可用性,同时考虑节点之间的依赖关系(待机、维修等)以及对这些节点的磨损的影响,特别是当故障和维修时间不是指数分布的时候。本文提出的解决方案是基于相位类型分布和Kronecker代数的使用。实际上,我们通过特定的阶段类型分布来表示每个组件的可靠性和可维护性,它们的交互描述了系统的可用性。为了解决扩展技术的状态空间爆炸问题,并表示与老化过程相关的存储策略,将后者用Kronecker代数表示的扩展马尔可夫链进行建模。具体来说,本文首先详细介绍了该技术,然后以一个代表高可用性集群的备用冗余系统为例,验证了该技术的有效性。此外,为了展示该技术的潜力,对不同的维护策略进行了评估和比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimization Based Topology Control for Wireless Ad Hoc Networks to Meet QoS Requirements An Entity-Centric Approach for Privacy and Identity Management in Cloud Computing On-Demand Recovery in Middleware Storage Systems Adaptive Routing Scheme for Emerging Wireless Ad Hoc Networks Diskless Checkpointing with Rollback-Dependency Trackability
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1