首页 > 最新文献

Proceedings. International Conference on Dependable Systems and Networks最新文献

英文 中文
Generic timing fault tolerance using a timely computing base 通用时序容错使用及时计算库
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1028883
A. Casimiro, P. Veríssimo
Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper we follow the perspective of timing fault tolerance: tinting errors occur and they are processed using redundancy, e.g., component replication, to recover and deliver timely service. We introduce a paradigm for generic tinting fault tolerance with replicated state machines. The paradigm is based on the existence of Timing Failure Detection with tinted completeness and accuracy properties. Generic timing fault tolerance implies the ability to dependably observe the system and to timely notify timing failures, which we discuss in the paper On the other hand, it ensures replica determinism with respect to time (temporal consistency), and safety in case of spare exhaustion. We show that the paradigm can be addressed and realized in the framework of the timely computing base (TCB) model and architecture. Furthermore, we illustrate the generality, of our approach by reviewing previous existing solutions and by showing that in contrast with ours, they, only secure a restricted semantics, or simply provide ad-hoc solutions.
众所周知,在不确定的同步环境中设计具有时效性需求的应用程序是一个难题。本文从时序容错的角度出发,采用组件复制等冗余处理方法,及时恢复和提供服务。我们引入了一个具有复制状态机的通用着色容错范例。该范式基于时序故障检测的存在性,该时序故障检测具有一定的完备性和准确性。通用定时容错意味着能够可靠地观察系统并及时通知定时故障,这在本文中进行了讨论。另一方面,它保证了副本在时间上的确定性(时间一致性)和备用耗尽时的安全性。我们表明,该范式可以在及时计算库(TCB)模型和体系结构的框架中解决和实现。此外,我们通过回顾以前的现有解决方案来说明我们方法的通用性,并表明与我们的解决方案相比,它们只保护受限制的语义,或者只是提供特别的解决方案。
{"title":"Generic timing fault tolerance using a timely computing base","authors":"A. Casimiro, P. Veríssimo","doi":"10.1109/DSN.2002.1028883","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028883","url":null,"abstract":"Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper we follow the perspective of timing fault tolerance: tinting errors occur and they are processed using redundancy, e.g., component replication, to recover and deliver timely service. We introduce a paradigm for generic tinting fault tolerance with replicated state machines. The paradigm is based on the existence of Timing Failure Detection with tinted completeness and accuracy properties. Generic timing fault tolerance implies the ability to dependably observe the system and to timely notify timing failures, which we discuss in the paper On the other hand, it ensures replica determinism with respect to time (temporal consistency), and safety in case of spare exhaustion. We show that the paradigm can be addressed and realized in the framework of the timely computing base (TCB) model and architecture. Furthermore, we illustrate the generality, of our approach by reviewing previous existing solutions and by showing that in contrast with ours, they, only secure a restricted semantics, or simply provide ad-hoc solutions.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"1 1","pages":"27-36"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82998908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Resource management policies in GPRS wireless internet access systems GPRS无线互联网接入系统中的资源管理策略
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1029016
M. Meo, M. Marsan, Cecilia Batetta
In this paper we consider the problem of resource management in GSM/GPRS cellular networks offering not only mobile telephony services, but also data services for the wireless access to the Internet. In particular we investigate channel allocation policies that can provide a good tradeoff between the QoS guaranteed to voice and data services end users, considering three different alternatives, and developing analytical techniques for the assessment of their relative merits. The first channel allocation policy is called voice priority, since it gives priority to voice in the access to radio channels; we show that this policy cannot provide acceptable performance to data services, and we discuss the reasons for this shortcoming. The second channel allocation policy is called R-reservation; it statically reserves a fixed number of channels to data services, thus drastically improving their performance, but subtracting resources from voice users, even when these are not needed for data, thus inducing an unnecessary performance degradation for voice services. The third channel allocation policy is called dynamic reservation; as the name implies, it dynamically allocates channels to data when necessary, using the information about the queue length of GPRS data units within the base station. A threshold on the queue length is used in order to decide when channels must be allocated to data. Numerical results, show that the dynamic reservation channel allocation policy can provide very effective performance tradeoffs for data and voice services, with the additional advantage of being easily managed through the setting of the threshold value.
本文研究了GSM/GPRS蜂窝网络的资源管理问题,GSM/GPRS蜂窝网络不仅提供移动电话业务,而且还提供无线接入Internet的数据业务。我们特别研究了可以在保证语音和数据服务最终用户的QoS之间提供良好权衡的信道分配策略,考虑了三种不同的替代方案,并开发了评估其相对优点的分析技术。第一信道分配策略称为语音优先,因为它在对无线电信道的接入中给予语音优先权;我们展示了此策略不能为数据服务提供可接受的性能,并讨论了造成此缺点的原因。第二种信道分配策略称为R-reservation;它静态地为数据服务保留了固定数量的通道,从而大大提高了它们的性能,但减少了语音用户的资源,即使这些资源不是数据所需要的,从而导致语音服务的不必要的性能下降。第三种通道分配策略称为动态预留;顾名思义,它使用有关基站内GPRS数据单元的队列长度的信息,在必要时动态地为数据分配通道。使用队列长度的阈值来决定何时必须将通道分配给数据。数值结果表明,动态预留信道分配策略可以为数据和语音业务提供非常有效的性能权衡,并且通过设置阈值易于管理。
{"title":"Resource management policies in GPRS wireless internet access systems","authors":"M. Meo, M. Marsan, Cecilia Batetta","doi":"10.1109/DSN.2002.1029016","DOIUrl":"https://doi.org/10.1109/DSN.2002.1029016","url":null,"abstract":"In this paper we consider the problem of resource management in GSM/GPRS cellular networks offering not only mobile telephony services, but also data services for the wireless access to the Internet. In particular we investigate channel allocation policies that can provide a good tradeoff between the QoS guaranteed to voice and data services end users, considering three different alternatives, and developing analytical techniques for the assessment of their relative merits. The first channel allocation policy is called voice priority, since it gives priority to voice in the access to radio channels; we show that this policy cannot provide acceptable performance to data services, and we discuss the reasons for this shortcoming. The second channel allocation policy is called R-reservation; it statically reserves a fixed number of channels to data services, thus drastically improving their performance, but subtracting resources from voice users, even when these are not needed for data, thus inducing an unnecessary performance degradation for voice services. The third channel allocation policy is called dynamic reservation; as the name implies, it dynamically allocates channels to data when necessary, using the information about the queue length of GPRS data units within the base station. A threshold on the queue length is used in order to decide when channels must be allocated to data. Numerical results, show that the dynamic reservation channel allocation policy can provide very effective performance tradeoffs for data and voice services, with the additional advantage of being easily managed through the setting of the threshold value.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"3 1","pages":"707-716"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80497567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
CLAIRE: an event-driven simulation tool for test and validation of software programs 克莱儿:一个事件驱动的模拟工具,用于测试和验证软件程序
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1028954
A. Carloganu, J. Raguideau
Malfunctions of systems in domains such as medicine, avionics, traffic control, defense and nuclear applications can cause human injuries. Test and validation of such systems is a difficult task, because many situations cannot be safely reproduced. Simulation makes possible to assess the correctness of a safety-critical system, even in dangerous situations. This paper presents CLAIRE, a purely software simulation tool with graphic facilities for system modelling, designed for test, validation and non-intrusive dynamic analysis of real time applications.
医疗、航空电子、交通管制、国防和核应用等领域的系统故障可能会造成人身伤害。测试和验证这种系统是一项艰巨的任务,因为许多情况不能安全地再现。即使在危险的情况下,模拟也可以评估安全关键系统的正确性。本文介绍了CLAIRE,一个纯软件仿真工具,具有用于系统建模的图形设施,专为实时应用程序的测试,验证和非侵入式动态分析而设计。
{"title":"CLAIRE: an event-driven simulation tool for test and validation of software programs","authors":"A. Carloganu, J. Raguideau","doi":"10.1109/DSN.2002.1028954","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028954","url":null,"abstract":"Malfunctions of systems in domains such as medicine, avionics, traffic control, defense and nuclear applications can cause human injuries. Test and validation of such systems is a difficult task, because many situations cannot be safely reproduced. Simulation makes possible to assess the correctness of a safety-critical system, even in dangerous situations. This paper presents CLAIRE, a purely software simulation tool with graphic facilities for system modelling, designed for test, validation and non-intrusive dynamic analysis of real time applications.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"68 1","pages":"538-"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72579771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Developing a heterogeneous intrusion tolerant CORBA system 开发异构入侵容忍CORBA系统
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1028905
D. Sames, B. Matt, B. Niebuhr, G. Tally, B. Whitmore, D. Bakken
Intrusion tolerant systems provide high-integrity and high-availability services to their clients in the face of successful attacks from an adversary. The Intrusion Tolerant Distributed Object Systems (ITDOS) research project is developing an architecture for a heterogeneous intrusion tolerant distributed object system. ITDOS integrates a Byzantine Fault Tolerant multicast protocol into an open-source CORBA ORB to provide intrusion tolerant middleware. This foundation allows up to f simultaneous Byzantine failures of replicated servers in a system of at least 3f+1 replicas. Voting on unmarshalled CORBA messages allows heterogeneous application implementations for a given service, allowing for greater diversity in implementation and greater survivability. Symmetric encryption session keys generated by distributed pseudo-random function techniques provide confidential client-server communications. This paper overviews the ITDOS architecture, discusses some of the challenging technical issues related to intrusion tolerance in heterogeneous middleware systems, and offers views on future areas of work.
面对对手的成功攻击,入侵容忍系统为其客户端提供高完整性和高可用性的服务。入侵容忍分布式对象系统(ITDOS)研究项目是为异构入侵容忍分布式对象系统开发一种体系结构。ITDOS将拜占庭容错多播协议集成到开源CORBA ORB中,以提供容错中间件。这个基础允许在至少有3f+1个副本的系统中复制服务器同时发生最多6个拜占庭故障。对未编组的CORBA消息进行投票允许对给定服务进行异构应用程序实现,从而允许实现的更大多样性和更高的生存性。由分布式伪随机函数技术生成的对称加密会话密钥提供机密的客户机-服务器通信。本文概述了ITDOS体系结构,讨论了与异构中间件系统中的入侵容忍相关的一些具有挑战性的技术问题,并对未来的工作领域提出了看法。
{"title":"Developing a heterogeneous intrusion tolerant CORBA system","authors":"D. Sames, B. Matt, B. Niebuhr, G. Tally, B. Whitmore, D. Bakken","doi":"10.1109/DSN.2002.1028905","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028905","url":null,"abstract":"Intrusion tolerant systems provide high-integrity and high-availability services to their clients in the face of successful attacks from an adversary. The Intrusion Tolerant Distributed Object Systems (ITDOS) research project is developing an architecture for a heterogeneous intrusion tolerant distributed object system. ITDOS integrates a Byzantine Fault Tolerant multicast protocol into an open-source CORBA ORB to provide intrusion tolerant middleware. This foundation allows up to f simultaneous Byzantine failures of replicated servers in a system of at least 3f+1 replicas. Voting on unmarshalled CORBA messages allows heterogeneous application implementations for a given service, allowing for greater diversity in implementation and greater survivability. Symmetric encryption session keys generated by distributed pseudo-random function techniques provide confidential client-server communications. This paper overviews the ITDOS architecture, discusses some of the challenging technical issues related to intrusion tolerance in heterogeneous middleware systems, and offers views on future areas of work.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"9 1","pages":"239-248"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76370254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Robust software - no more excuses 强大的软件-没有更多的借口
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1028895
John DeVale, P. Koopman
Software developers identify two main reasons why software systems are not made robust: performance and practicality. We demonstrate the effectiveness of general techniques to improve robustness that are practical and yield high performance. We present data from treating three systems to improve robustness by a factor of 5 or more, with a measured performance penalty of under 5% in nearly every case, and usually under 2%. We identify a third possible reason why software systems are not made robust: developer awareness. A case study on three professional development groups evaluated their ability to estimate the robustness of their software. Two groups were able to estimate their software's robustness to some extent, while one group had more divergent results. Although we can overcome the technical challenges, it appears that even experienced developers can benefit from tools to locate robustness failures and training in robustness issues.
软件开发人员确定了软件系统不健壮的两个主要原因:性能和实用性。我们证明了一般技术的有效性,以提高鲁棒性,是实用的和产生高性能。我们提供了处理三个系统的数据,将鲁棒性提高了5倍或更多,在几乎所有情况下,测量的性能损失都在5%以下,通常在2%以下。我们确定了软件系统不健壮的第三个可能原因:开发人员的意识。一个关于三个专业开发小组的案例研究评估了他们评估软件健壮性的能力。两组能够在一定程度上估计他们的软件的健壮性,而一组的结果更加不同。尽管我们可以克服技术上的挑战,但即使是经验丰富的开发人员也可以从定位健壮性故障和健壮性问题培训的工具中受益。
{"title":"Robust software - no more excuses","authors":"John DeVale, P. Koopman","doi":"10.1109/DSN.2002.1028895","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028895","url":null,"abstract":"Software developers identify two main reasons why software systems are not made robust: performance and practicality. We demonstrate the effectiveness of general techniques to improve robustness that are practical and yield high performance. We present data from treating three systems to improve robustness by a factor of 5 or more, with a measured performance penalty of under 5% in nearly every case, and usually under 2%. We identify a third possible reason why software systems are not made robust: developer awareness. A case study on three professional development groups evaluated their ability to estimate the robustness of their software. Two groups were able to estimate their software's robustness to some extent, while one group had more divergent results. Although we can overcome the technical challenges, it appears that even experienced developers can benefit from tools to locate robustness failures and training in robustness issues.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"31 1","pages":"145-154"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81253660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Detecting processor hardware faults by means of automatically generated virtual duplex systems 利用自动生成的虚拟双工系统检测处理器硬件故障
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1028925
M. Jochim
A virtual duplex system (VDS) can be used to increase safety without the use of structural redundancy on a single machine. If a deterministic program P is calculating a given function f, then a VDS contains two variants P/sub a/ and P/sub b/ of P which are calculating the diverse functions f/sub a/ and f/sub b/ in sequence. If no error occurs in the process of designing and executing P/sub a/ and P/sub b/, then f= f/sub a/=f/sub b/ holds. A fault in the underlying processor hardware is likely to be detected by the deviation of the results, i.e. f/sub a/(i)/spl ne/f/sub b/(i) for input i. Normally, VDSs are generated by manually applying different diversity techniques. This paper, in contrast, presents a new method and a tool for the automated generation of VDSs with a high detection probability for hardware faults. Moreover, for the first time the diversity techniques are selected by an optimization algorithm rather than chosen intuitively. The generated VDSs are investigated extensively by means of software implemented processor fault injection.
虚拟双工系统(VDS)可用于提高安全性,而无需在单个机器上使用结构冗余。如果一个确定性程序P正在计算一个给定的函数f,那么VDS包含两个变量P/下标a/和P/下标b/,它们依次计算不同的函数f/下标a/和f/下标b/。如果P/sub a/和P/sub b/在设计和执行过程中没有出现错误,则f= f/sub a/=f/sub b/成立。底层处理器硬件的故障很可能通过结果的偏差来检测,即输入i的f/sub A /(i)/spl ne/f/sub b/(i)。通常,vds是通过手动应用不同的分集技术产生的。相比之下,本文提出了一种新的方法和工具来自动生成具有高检测概率的硬件故障vds。此外,首次采用优化算法选择分集技术,而不是直观地选择分集技术。通过软件实现的处理器故障注入,对生成的虚拟决策系统进行了广泛的研究。
{"title":"Detecting processor hardware faults by means of automatically generated virtual duplex systems","authors":"M. Jochim","doi":"10.1109/DSN.2002.1028925","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028925","url":null,"abstract":"A virtual duplex system (VDS) can be used to increase safety without the use of structural redundancy on a single machine. If a deterministic program P is calculating a given function f, then a VDS contains two variants P/sub a/ and P/sub b/ of P which are calculating the diverse functions f/sub a/ and f/sub b/ in sequence. If no error occurs in the process of designing and executing P/sub a/ and P/sub b/, then f= f/sub a/=f/sub b/ holds. A fault in the underlying processor hardware is likely to be detected by the deviation of the results, i.e. f/sub a/(i)/spl ne/f/sub b/(i) for input i. Normally, VDSs are generated by manually applying different diversity techniques. This paper, in contrast, presents a new method and a tool for the automated generation of VDSs with a high detection probability for hardware faults. Moreover, for the first time the diversity techniques are selected by an optimization algorithm rather than chosen intuitively. The generated VDSs are investigated extensively by means of software implemented processor fault injection.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"89 1","pages":"399-408"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73447183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
SWIM: scalable weakly-consistent infection-style process group membership protocol SWIM:可伸缩的弱一致感染式进程组成员协议
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1028914
Abhinandan Das, Indranil Gupta, Ashish Motivala
Several distributed peer-to-peer applications require weakly-consistent knowledge of process group membership information at all participating processes. SWIM is a generic software module that offers this service for large scale process groups. The SWIM effort is motivated by the unscalability of traditional heart-beating protocols, which either impose network loads that grow quadratically with group size, or compromise response times or false positive frequency w.r.t. detecting process crashes. This paper reports on the design, implementation and performance of the SWIM sub-system on a large cluster of commodity PCs. Unlike traditional heart beating protocols, SWIM separates the failure detection and membership update dissemination functionalities of the membership protocol. Processes are monitored through an efficient peer-to-peer periodic randomized probing protocol. Both the expected time to first detection of each process failure, and the expected message load per member do not vary with group size. Information about membership changes, such as process joins, drop-outs and failures, is propagated via piggybacking on ping messages and acknowledgments. This results in a robust and fast infection style (also epidemic or gossip-style) of dissemination. The rate of false failure detections in the SWIM system is reduced by modifying the protocol to allow group members to suspect a process before declaring it as failed - this allows the system to discover and rectify false failure detections. Finally, the protocol guarantees a deterministic time bound to detect failures. Experimental results from the SWIM prototype are presented. We discuss the extensibility of the design to a WAN-wide scale.
几个分布式点对点应用程序需要所有参与过程的过程组成员信息的弱一致知识。SWIM是一个通用的软件模块,为大型过程组提供此服务。SWIM工作的动机是传统心跳协议的不可扩展性,它要么施加网络负载,使其随组大小呈二次增长,要么损害响应时间或误报频率,以检测进程崩溃。本文介绍了一个大型商用pc机集群上的SWIM子系统的设计、实现和性能。与传统的心跳协议不同,SWIM分离了成员协议的故障检测和成员更新传播功能。进程通过有效的点对点周期性随机探测协议进行监控。首次检测每个流程故障的预期时间和每个成员的预期消息负载都不随组大小而变化。有关成员关系更改的信息(如进程连接、退出和失败)通过附带ping消息和确认来传播。这导致了一种强大而快速的感染方式(也称为流行病或八卦式)传播。在SWIM系统中,通过修改协议,允许组成员在宣布一个进程失败之前怀疑它,从而降低了错误故障检测的率——这允许系统发现并纠正错误的故障检测。最后,该协议保证了检测故障的确定性时间范围。给出了SWIM原型机的实验结果。我们讨论了该设计在广域网范围内的可扩展性。
{"title":"SWIM: scalable weakly-consistent infection-style process group membership protocol","authors":"Abhinandan Das, Indranil Gupta, Ashish Motivala","doi":"10.1109/DSN.2002.1028914","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028914","url":null,"abstract":"Several distributed peer-to-peer applications require weakly-consistent knowledge of process group membership information at all participating processes. SWIM is a generic software module that offers this service for large scale process groups. The SWIM effort is motivated by the unscalability of traditional heart-beating protocols, which either impose network loads that grow quadratically with group size, or compromise response times or false positive frequency w.r.t. detecting process crashes. This paper reports on the design, implementation and performance of the SWIM sub-system on a large cluster of commodity PCs. Unlike traditional heart beating protocols, SWIM separates the failure detection and membership update dissemination functionalities of the membership protocol. Processes are monitored through an efficient peer-to-peer periodic randomized probing protocol. Both the expected time to first detection of each process failure, and the expected message load per member do not vary with group size. Information about membership changes, such as process joins, drop-outs and failures, is propagated via piggybacking on ping messages and acknowledgments. This results in a robust and fast infection style (also epidemic or gossip-style) of dissemination. The rate of false failure detections in the SWIM system is reduced by modifying the protocol to allow group members to suspect a process before declaring it as failed - this allows the system to discover and rectify false failure detections. Finally, the protocol guarantees a deterministic time bound to detect failures. Experimental results from the SWIM prototype are presented. We discuss the extensibility of the design to a WAN-wide scale.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"6 1","pages":"303-312"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81743479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 176
An adaptive architecture for monitoring and failure analysis of high-speed networks 一种高速网络监测与故障分析的自适应体系结构
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1028888
Benjamin Floering, B. Brothers, Z. Kalbarczyk, R. Iyer
Describes the design of a reconfigurable device using an FPGA (field programmable gate array) whose primary function is high-speed (several Gb/s) network data monitoring and run-time adaptive fault injection and statistics gathering for failure analysis. The device is designed for two types of media: Myrinet SAN and Fibre Channel, and failure analysis can be performed simultaneously over both of these networks. Although the device intercepts and retransmits signals on the network, no impact on the data transfer rate is observed and the latency caused by inserting the device in the network is negligible. The fault injection capabilities are demonstrated on a Myrinet LAN. Fault injection experiments are conducted on data transmitted across the network, including control packets previously inaccessible to software-based techniques.
描述了一种使用FPGA(现场可编程门阵列)的可重构器件的设计,其主要功能是高速(几Gb/s)网络数据监控和运行时自适应故障注入以及故障分析的统计数据收集。该设备专为两种类型的介质设计:Myrinet SAN和光纤通道,并且可以在这两种网络上同时执行故障分析。虽然设备在网络上截取和重传信号,但对数据传输速率没有影响,设备插入网络造成的延迟可以忽略不计。在一个局域网中演示了故障注入功能。故障注入实验是在网络传输的数据上进行的,包括以前基于软件的技术无法访问的控制数据包。
{"title":"An adaptive architecture for monitoring and failure analysis of high-speed networks","authors":"Benjamin Floering, B. Brothers, Z. Kalbarczyk, R. Iyer","doi":"10.1109/DSN.2002.1028888","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028888","url":null,"abstract":"Describes the design of a reconfigurable device using an FPGA (field programmable gate array) whose primary function is high-speed (several Gb/s) network data monitoring and run-time adaptive fault injection and statistics gathering for failure analysis. The device is designed for two types of media: Myrinet SAN and Fibre Channel, and failure analysis can be performed simultaneously over both of these networks. Although the device intercepts and retransmits signals on the network, no impact on the data transfer rate is observed and the latency caused by inserting the device in the network is negligible. The fault injection capabilities are demonstrated on a Myrinet LAN. Fault injection experiments are conducted on data transmitted across the network, including control packets previously inaccessible to software-based techniques.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"9 1","pages":"69-78"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82059344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Model checking safety properties of servo-loop control systems 伺服环控制系统安全特性的模型校核
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1028885
Paul Ammann, Wei Ding, Daling Xu
Presents the experiences of using a symbolic model checker to check the safety properties of a servo-loop control system. Symbolic model checking has been shown to be beneficial when the system under analysis can be modeled as a finite state machine. Servo-loop control systems are typically represented by differential equations (Laplace transforms)-not as finite state machines. However, the control loop is only apart of the software system needed to properly and safely operate the system. The paper first validates the safety of the servo loop using control theory and simulation. Then, a simple state model of a servo loop is combined with the state model of the entire system. This model is then entered into a model checker (SMV) along with safety predicates. The model checker is used to validate the safety predicates. The paper shows via an example-an antenna tracking system-that safety issues can be discovered and defined for control systems using a model checker. Furthermore, it demonstrates that effective hazard analysis may require multiple techniques.
介绍了用符号模型检查器检查伺服环控制系统安全性能的经验。当所分析的系统可以建模为有限状态机时,符号模型检查已被证明是有益的。伺服环控制系统通常由微分方程(拉普拉斯变换)表示,而不是有限状态机。然而,控制回路只是正确和安全操作系统所需的软件系统的一部分。本文首先通过控制理论和仿真验证了伺服回路的安全性。然后,将伺服回路的简单状态模型与整个系统的状态模型相结合。然后将该模型与安全谓词一起输入到模型检查器(SMV)中。模型检查器用于验证安全谓词。本文以天线跟踪系统为例,说明了利用模型检查器可以发现和定义控制系统的安全问题。此外,它表明有效的危害分析可能需要多种技术。
{"title":"Model checking safety properties of servo-loop control systems","authors":"Paul Ammann, Wei Ding, Daling Xu","doi":"10.1109/DSN.2002.1028885","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028885","url":null,"abstract":"Presents the experiences of using a symbolic model checker to check the safety properties of a servo-loop control system. Symbolic model checking has been shown to be beneficial when the system under analysis can be modeled as a finite state machine. Servo-loop control systems are typically represented by differential equations (Laplace transforms)-not as finite state machines. However, the control loop is only apart of the software system needed to properly and safely operate the system. The paper first validates the safety of the servo loop using control theory and simulation. Then, a simple state model of a servo loop is combined with the state model of the entire system. This model is then entered into a model checker (SMV) along with safety predicates. The model checker is used to validate the safety predicates. The paper shows via an example-an antenna tracking system-that safety issues can be discovered and defined for control systems using a model checker. Furthermore, it demonstrates that effective hazard analysis may require multiple techniques.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"44 1","pages":"45-50"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86318718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Evaluation of the maximum level reached by a queue over a finite period 对队列在有限时间内达到的最大级别进行评估
Pub Date : 2002-06-23 DOI: 10.1109/DSN.2002.1029019
G. Rubino
This paper deals with the performance analysis of a system modeled by a queue. If we are interested in occupation problems and if we look at the transient phase, then it makes sense to study the maximum backlog observed in the queue over a finite period. This paper proposes an efficient algorithmic scheme to evaluate the distribution of this maximum backlog level, based on the uniformization technique. The approach is illustrated using the classical M/M/1 model, but it can be extended to more complex ones.
本文研究了用队列建模的系统的性能分析。如果我们对占用问题感兴趣,并且关注暂态阶段,那么研究在有限时间内队列中观察到的最大积压是有意义的。本文提出了一种基于均匀化技术的高效算法方案来评估最大积压水平的分布。该方法是使用经典的M/M/1模型来说明的,但它可以扩展到更复杂的模型。
{"title":"Evaluation of the maximum level reached by a queue over a finite period","authors":"G. Rubino","doi":"10.1109/DSN.2002.1029019","DOIUrl":"https://doi.org/10.1109/DSN.2002.1029019","url":null,"abstract":"This paper deals with the performance analysis of a system modeled by a queue. If we are interested in occupation problems and if we look at the transient phase, then it makes sense to study the maximum backlog observed in the queue over a finite period. This paper proposes an efficient algorithmic scheme to evaluate the distribution of this maximum backlog level, based on the uniformization technique. The approach is illustrated using the classical M/M/1 model, but it can be extended to more complex ones.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"7 1","pages":"735-742"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87202861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings. International Conference on Dependable Systems and Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1