稀疏互连多处理机系统的故障诊断

D. Blough, G. Sullivan, G. Masson
{"title":"稀疏互连多处理机系统的故障诊断","authors":"D. Blough, G. Sullivan, G. Masson","doi":"10.1109/FTCS.1989.105544","DOIUrl":null,"url":null,"abstract":"The authors present a general approach to fault diagnosis that is widely applicable and requires only a limited number of connections among units. Each unit in the system forms a private opinion on the status of each of its neighboring units based on duplication of jobs and comparison of job results over time. A diagnosis algorithm that consists of simply taking a majority vote among the neighbors of a unit to determine the status of that unit is then executed. The performance of this simple majority-vote diagnosis algorithm is analyzed using a probabilistic model for the faults in the system. It is shown that with high probability, for systems composed of n units, the algorithm will correctly identify the status of all units when each unit is connected to O(log n) other units. It is also shown that the algorithm works with high probability in a class of systems in which the average number of neighbors of a unit is constant. The results indicate that fault diagnosis can in fact be achieved quite simply in multiprocessor systems containing a low to moderate number of testing conditions.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":"{\"title\":\"Fault diagnosis for sparsely interconnected multiprocessor systems\",\"authors\":\"D. Blough, G. Sullivan, G. Masson\",\"doi\":\"10.1109/FTCS.1989.105544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The authors present a general approach to fault diagnosis that is widely applicable and requires only a limited number of connections among units. Each unit in the system forms a private opinion on the status of each of its neighboring units based on duplication of jobs and comparison of job results over time. A diagnosis algorithm that consists of simply taking a majority vote among the neighbors of a unit to determine the status of that unit is then executed. The performance of this simple majority-vote diagnosis algorithm is analyzed using a probabilistic model for the faults in the system. It is shown that with high probability, for systems composed of n units, the algorithm will correctly identify the status of all units when each unit is connected to O(log n) other units. It is also shown that the algorithm works with high probability in a class of systems in which the average number of neighbors of a unit is constant. The results indicate that fault diagnosis can in fact be achieved quite simply in multiprocessor systems containing a low to moderate number of testing conditions.<<ETX>>\",\"PeriodicalId\":230363,\"journal\":{\"name\":\"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1989-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"48\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FTCS.1989.105544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FTCS.1989.105544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 48

摘要

作者提出了一种通用的故障诊断方法,该方法广泛适用,并且只需要单元之间有限数量的连接。系统中的每个单位都会根据工作的重复和工作结果的长期比较,对相邻单位的状态形成自己的意见。然后执行一种诊断算法,该算法由简单地在单元的邻居中进行多数投票来确定该单元的状态组成。利用系统故障的概率模型分析了简单多数投票诊断算法的性能。结果表明,对于由n个单元组成的系统,当每个单元与O(log n)个其他单元连接时,该算法将以高概率正确识别所有单元的状态。在一类单元的平均邻居数为常数的系统中,该算法具有较高的工作概率。结果表明,在包含少量或中等数量测试条件的多处理器系统中,实际上可以很容易地实现故障诊断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Fault diagnosis for sparsely interconnected multiprocessor systems
The authors present a general approach to fault diagnosis that is widely applicable and requires only a limited number of connections among units. Each unit in the system forms a private opinion on the status of each of its neighboring units based on duplication of jobs and comparison of job results over time. A diagnosis algorithm that consists of simply taking a majority vote among the neighbors of a unit to determine the status of that unit is then executed. The performance of this simple majority-vote diagnosis algorithm is analyzed using a probabilistic model for the faults in the system. It is shown that with high probability, for systems composed of n units, the algorithm will correctly identify the status of all units when each unit is connected to O(log n) other units. It is also shown that the algorithm works with high probability in a class of systems in which the average number of neighbors of a unit is constant. The results indicate that fault diagnosis can in fact be achieved quite simply in multiprocessor systems containing a low to moderate number of testing conditions.<>
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Replication within atomic actions and conversations: a case study in fault-tolerance duality Byte unidirectional error correcting codes F-T in telecommunications networks: state, perspectives, trends Evaluation of fault-tolerant systems with nonhomogeneous workloads Control-flow checking using watchdog assists and extended-precision checksums
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1