{"title":"Practical application and implementation of distributed system-level diagnosis theory","authors":"R. Bianchini, K. Goodwin, D. S. Nydick","doi":"10.1109/FTCS.1990.89380","DOIUrl":null,"url":null,"abstract":"A DSD (distributed self-diagnosing) project that consists of the implementation of a distributed self-diagnosis algorithm and its application to distributed computer networks is presented. The EVENT-SELF algorithm presented combines the rigor associated with theoretical results with the resource limitations associated with actual systems. Resource limitations identified in real systems include available message capacity for the communication network and limited processor execution speed. The EVENT-SELF algorithm differs from previously published algorithms by adopting an event-driven approach to self-diagnosability. Algorithm messages are reduced to those messages required to indicate changes in system those messages required to indicate changes in system state. Practical issues regarding the CMU-ECE DSD implementation are considered. These issues include the reconfiguration of the testing subnetwork for environments in which processors can be added and removed. One of the goals of this work is to utilize the developed CMU-ECE DSD system as an experimental test-bed environment for distributed applications.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FTCS.1990.89380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 64
Abstract
A DSD (distributed self-diagnosing) project that consists of the implementation of a distributed self-diagnosis algorithm and its application to distributed computer networks is presented. The EVENT-SELF algorithm presented combines the rigor associated with theoretical results with the resource limitations associated with actual systems. Resource limitations identified in real systems include available message capacity for the communication network and limited processor execution speed. The EVENT-SELF algorithm differs from previously published algorithms by adopting an event-driven approach to self-diagnosability. Algorithm messages are reduced to those messages required to indicate changes in system those messages required to indicate changes in system state. Practical issues regarding the CMU-ECE DSD implementation are considered. These issues include the reconfiguration of the testing subnetwork for environments in which processors can be added and removed. One of the goals of this work is to utilize the developed CMU-ECE DSD system as an experimental test-bed environment for distributed applications.<>