{"title":"Adaptive System-Level Fault Diagnosis of Bijective Connection Networks","authors":"Yanze Huang;Limei Lin;Li Xu;Sun-Yuan Hsieh","doi":"10.1109/TR.2024.3425759","DOIUrl":null,"url":null,"abstract":"As the multiprocessor systems are becoming large-scale, fault-diagnosis is crucial to ensure the reliability of multiprocessor systems. In order to improve the self-diagnosis capability of a multiprocessor system, a pessimistic fault diagnosis scheme such as <inline-formula><tex-math>$t/s$</tex-math></inline-formula>-diagnosis allows some fault-free processors to be mistakenly identified as faulty. All faulty processors in a <inline-formula><tex-math>$t/s$</tex-math></inline-formula>-diagnosable multiprocessor system (<inline-formula><tex-math>$t\\leq s$</tex-math></inline-formula>) should be identified into a set with size up to <inline-formula><tex-math>$s$</tex-math></inline-formula>, when the total amount of faulty processors in the system does not exceed <inline-formula><tex-math>$t$</tex-math></inline-formula>. This article focuses on the <inline-formula><tex-math>$t/s$</tex-math></inline-formula>-diagnosis for the <inline-formula><tex-math>$n$</tex-math></inline-formula>-dimensional bijective connection network <inline-formula><tex-math>$X_{n}$</tex-math></inline-formula>. An adaptive <inline-formula><tex-math>$t/s$</tex-math></inline-formula>-diagnosis algorithm APDMM*<inline-formula><tex-math>$t/s$</tex-math></inline-formula> of complexity <inline-formula><tex-math>$O(M(log_{2}\\,M)^{2})$</tex-math></inline-formula> under the comparison model is proposed, where <inline-formula><tex-math>$M$</tex-math></inline-formula> is the total amount of nodes in <inline-formula><tex-math>$X_{n}$</tex-math></inline-formula>. Then, the correctness of algorithm APDMM*<inline-formula><tex-math>$t/s$</tex-math></inline-formula> is proved by the fault-tolerant properties of the network itself. Moreover, we calculate the <inline-formula><tex-math>$t/s$</tex-math></inline-formula>-diagnosability of <inline-formula><tex-math>$X_{n}$</tex-math></inline-formula> by theoretical method in mathematics, which is <inline-formula><tex-math>$-\\frac{1}{2}y^{2}+(n-\\frac{1}{2})y+1$</tex-math></inline-formula> for <inline-formula><tex-math>$2 \\leq y \\leq n$</tex-math></inline-formula> under comparison model, where <inline-formula><tex-math>$s=-\\frac{1}{2}y^{2}+(n-\\frac{1}{2})y+y-1$</tex-math></inline-formula>. Furthermore, we apply algorithm APDMM*<inline-formula><tex-math>$t/s$</tex-math></inline-formula> on the hypercube and the real-world network WSN-DS to verify our main results, and analyze the experimental outcomes in terms of true positive rate, false positive rate, accuracy and precision. The experimental results reveal the advantage and high performance of our algorithm APDMM*<inline-formula><tex-math>$t/s$</tex-math></inline-formula>. Besides, we compare the <inline-formula><tex-math>$t/s$</tex-math></inline-formula>-diagnosability of <inline-formula><tex-math>$X_{n}$</tex-math></inline-formula> with traditional accurate diagnosability, and it turns out that as <inline-formula><tex-math>$n$</tex-math></inline-formula> gets larger, the <inline-formula><tex-math>$t/s$</tex-math></inline-formula>-diagnosability of <inline-formula><tex-math>$X_{n}$</tex-math></inline-formula> is significantly better than traditional accurate diagnosability.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 2","pages":"2916-2926"},"PeriodicalIF":5.7000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10608002/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
As the multiprocessor systems are becoming large-scale, fault-diagnosis is crucial to ensure the reliability of multiprocessor systems. In order to improve the self-diagnosis capability of a multiprocessor system, a pessimistic fault diagnosis scheme such as $t/s$-diagnosis allows some fault-free processors to be mistakenly identified as faulty. All faulty processors in a $t/s$-diagnosable multiprocessor system ($t\leq s$) should be identified into a set with size up to $s$, when the total amount of faulty processors in the system does not exceed $t$. This article focuses on the $t/s$-diagnosis for the $n$-dimensional bijective connection network $X_{n}$. An adaptive $t/s$-diagnosis algorithm APDMM*$t/s$ of complexity $O(M(log_{2}\,M)^{2})$ under the comparison model is proposed, where $M$ is the total amount of nodes in $X_{n}$. Then, the correctness of algorithm APDMM*$t/s$ is proved by the fault-tolerant properties of the network itself. Moreover, we calculate the $t/s$-diagnosability of $X_{n}$ by theoretical method in mathematics, which is $-\frac{1}{2}y^{2}+(n-\frac{1}{2})y+1$ for $2 \leq y \leq n$ under comparison model, where $s=-\frac{1}{2}y^{2}+(n-\frac{1}{2})y+y-1$. Furthermore, we apply algorithm APDMM*$t/s$ on the hypercube and the real-world network WSN-DS to verify our main results, and analyze the experimental outcomes in terms of true positive rate, false positive rate, accuracy and precision. The experimental results reveal the advantage and high performance of our algorithm APDMM*$t/s$. Besides, we compare the $t/s$-diagnosability of $X_{n}$ with traditional accurate diagnosability, and it turns out that as $n$ gets larger, the $t/s$-diagnosability of $X_{n}$ is significantly better than traditional accurate diagnosability.
期刊介绍:
IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.