Self checking network protocols: a monitor based approach

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI:10.1109/RELDIS.2004.1353000

G. Khanna, Padma Varadharajan, S. Bagchi

{"title":"Self checking network protocols: a monitor based approach","authors":"G. Khanna, Padma Varadharajan, S. Bagchi","doi":"10.1109/RELDIS.2004.1353000","DOIUrl":null,"url":null,"abstract":"The wide deployment of high-speed computer networks has made distributed systems ubiquitous in today's connected world. The machines on which the distributed applications are hosted are heterogeneous in nature, the applications often run legacy code without the availability of their source code, the systems are of very large scales, and often have soft real-time guarantees. In this paper, we target the problem of online detection of disruptions through a generic external entity called Monitor that is able to observe the exchanged messages between the protocol participants and deduce any ongoing disruption by matching against a rule base composed of combinatorial and temporal rules. The Monitor architecture is application neutral, with the rule base making it specific to a protocol. To make the detection infrastructure scalable and dependable, we extend it to a hierarchical Monitor structure. The infrastructure is applied to a streaming video application running on a reliable multicast protocol called TRAM installed on the campus wide network. The evaluation brings out the scalability of the monitor infrastructure and detection coverage under different kinds of faults for the single level and the hierarchical arrangements.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RELDIS.2004.1353000","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

The wide deployment of high-speed computer networks has made distributed systems ubiquitous in today's connected world. The machines on which the distributed applications are hosted are heterogeneous in nature, the applications often run legacy code without the availability of their source code, the systems are of very large scales, and often have soft real-time guarantees. In this paper, we target the problem of online detection of disruptions through a generic external entity called Monitor that is able to observe the exchanged messages between the protocol participants and deduce any ongoing disruption by matching against a rule base composed of combinatorial and temporal rules. The Monitor architecture is application neutral, with the rule base making it specific to a protocol. To make the detection infrastructure scalable and dependable, we extend it to a hierarchical Monitor structure. The infrastructure is applied to a streaming video application running on a reliable multicast protocol called TRAM installed on the campus wide network. The evaluation brings out the scalability of the monitor infrastructure and detection coverage under different kinds of faults for the single level and the hierarchical arrangements.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自检网络协议:基于监视器的方法

高速计算机网络的广泛部署使得分布式系统在当今的互联世界中无处不在。托管分布式应用程序的机器本质上是异构的，应用程序经常运行遗留代码而没有其源代码的可用性，系统规模非常大，并且通常具有软实时保证。在本文中，我们通过一个称为Monitor的通用外部实体来解决在线检测中断的问题，该实体能够观察协议参与者之间交换的消息，并通过匹配由组合规则和时间规则组成的规则库来推断任何正在进行的中断。Monitor体系结构与应用程序无关，规则库使其特定于协议。为了使检测基础设施具有可扩展性和可靠性，我们将其扩展为分层Monitor结构。该基础架构应用于一个运行在可靠组播协议TRAM上的流视频应用程序，该协议安装在校园网上。评价结果表明，在单级和分层布置的情况下，监测基础设施的可扩展性和不同类型故障下的检测覆盖范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.

自引率

0.00%

发文量

期刊最新文献

Simple and efficient oracle-based consensus protocols for asynchronous Byzantine systems Token-based atomic broadcast using unreliable failure detectors The design and evaluation of a defense system for Internet worms Hardware support for high performance, intrusion- and fault-tolerant systems Run-time monitoring for dependable systems: an approach and a case study