The authors propose a built-in concurrent self-test (BICST) technique for testing combinational logic circuits concurrently with their normal operation. They also introduce a concept of sharing the test hardware between identical circuits to reduce the overall area overhead. They implemented this technique in the design of an ALU (arithmetic logic unit) with online test capability in CMOS technology. The additional hardware used for a 12-bit ALU was 19% of the total chip area, and it did not impose any timing overhead on the operation of the ALU. The overhead decreases with an increase in the size of the ALU. The authors define some measures for evaluating the performance of the BICST technique and discuss methods for their computation and include both simulation and analytical results. In addition to detecting permanent faults, the BICST technique can also be used for detecting intermittent and transient faults. The authors propose some methods for detecting intermittent faults and for computing the transient fault coverage.<>
{"title":"An implementation and analysis of a concurrent built-in self-test technique","authors":"Rajiv Sharma, K. Saluja","doi":"10.1109/FTCS.1988.5315","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5315","url":null,"abstract":"The authors propose a built-in concurrent self-test (BICST) technique for testing combinational logic circuits concurrently with their normal operation. They also introduce a concept of sharing the test hardware between identical circuits to reduce the overall area overhead. They implemented this technique in the design of an ALU (arithmetic logic unit) with online test capability in CMOS technology. The additional hardware used for a 12-bit ALU was 19% of the total chip area, and it did not impose any timing overhead on the operation of the ALU. The overhead decreases with an increase in the size of the ALU. The authors define some measures for evaluating the performance of the BICST technique and discuss methods for their computation and include both simulation and analytical results. In addition to detecting permanent faults, the BICST technique can also be used for detecting intermittent and transient faults. The authors propose some methods for detecting intermittent faults and for computing the transient fault coverage.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127280766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The author presents a proposal for an open layered architecture for dependability analysis, corresponding to the respective levels of abstraction. The motivation for this reference architecture is to support structuring, reusability, and variability of methods and tools. Each of the seven layers is discussed in detail, and the correspondence with currently available tools for dependability analysis is shown by examples. To demonstrate the feasibility of the approach, the layered architecture is used as a basis for design and implementation of MARPLE, a tool for dependability analysis of distributed systems. MARPLE mainly concentrates on the application layer and the model-generation layer. It is embedded in a system-design environment and bridges the gap between the design tool and dependability analysis.<>
{"title":"An open layered architecture for dependability analysis and its application","authors":"M. Mulazzani","doi":"10.1109/FTCS.1988.5303","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5303","url":null,"abstract":"The author presents a proposal for an open layered architecture for dependability analysis, corresponding to the respective levels of abstraction. The motivation for this reference architecture is to support structuring, reusability, and variability of methods and tools. Each of the seven layers is discussed in detail, and the correspondence with currently available tools for dependability analysis is shown by examples. To demonstrate the feasibility of the approach, the layered architecture is used as a basis for design and implementation of MARPLE, a tool for dependability analysis of distributed systems. MARPLE mainly concentrates on the application layer and the model-generation layer. It is embedded in a system-design environment and bridges the gap between the design tool and dependability analysis.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132056598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors previously proposed (1984) the basic concept of the distributed recovery block (DRB) scheme as an approach to uniform treatment of hardware and software faults in real-time applications. Design issues that arise in implementing the DRB scheme are discussed together with some promising approaches. Issues in extending the DRB scheme with the capability of reincorporating a repaired node without disrupting the real-time computing service are also discussed. An experimental implementation of the repairable DRB scheme into a real-time distributed computer system (DCS) testbed and subsequent measurement of the system performance demonstrated the fast forward recovery capability and the logical soundness of the scheme.<>
{"title":"Approaches to implementation of a repairable distributed recovery block scheme","authors":"K. Kim, J. Yoon","doi":"10.1109/FTCS.1988.5296","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5296","url":null,"abstract":"The authors previously proposed (1984) the basic concept of the distributed recovery block (DRB) scheme as an approach to uniform treatment of hardware and software faults in real-time applications. Design issues that arise in implementing the DRB scheme are discussed together with some promising approaches. Issues in extending the DRB scheme with the capability of reincorporating a repaired node without disrupting the real-time computing service are also discussed. An experimental implementation of the repairable DRB scheme into a real-time distributed computer system (DCS) testbed and subsequent measurement of the system performance demonstrated the fast forward recovery capability and the logical soundness of the scheme.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"54 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132565707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors present a technique, called micro rollback, which allows most of the performance penalty for concurrent error detection to be eliminated. Detection is performed in parallel with the transmission of information between modules, thus removing the delay for detection from the critical path. Erroneous information may thus reach its destination module several clock cycles before an error indication. Operations performed on this erroneous information are undone using a hardware mechanism for fast rollback of a few cycles. The authors discuss the implementation of a VLSI processor capable of micro rollback as well as several critical issues related to its use in a complete system.<>
{"title":"The implementation and application of micro rollback in fault-tolerant VLSI systems","authors":"Y. Tamir, M. Tremblay, D. Rennels","doi":"10.1109/FTCS.1988.5325","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5325","url":null,"abstract":"The authors present a technique, called micro rollback, which allows most of the performance penalty for concurrent error detection to be eliminated. Detection is performed in parallel with the transmission of information between modules, thus removing the delay for detection from the critical path. Erroneous information may thus reach its destination module several clock cycles before an error indication. Operations performed on this erroneous information are undone using a hardware mechanism for fast rollback of a few cycles. The authors discuss the implementation of a VLSI processor capable of micro rollback as well as several critical issues related to its use in a complete system.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117000496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors consider a general, analytic approach to the study of workload effects on computer system dependability, where the faults considered are transient and the dependability measure in question is the time to failure, T/sub f/. Under these conditions, workload plays two roles with opposing effects: it can help detect/correct a correctable fault, or it can cause the system to fail by activating an uncorrectable fault. As a consequence, the overall influence of workload on T/sub f/ is difficult to evaluate intuitively. To examine this in more formal terms, the authors establish a Markov renewal process model that represents the interaction among workload and fault accumulation ins systems for which fault tolerance can be characterized by fault margins. Using this model, they consider some specific examples and show how the probabilistic nature of T/sub f/ can be formulated directly in terms of parameters regarding workload, fault arrivals, and fault margins.<>
{"title":"Analysis of workload influence on dependability","authors":"J. F. Meyer, Lu Wei","doi":"10.1109/FTCS.1988.5301","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5301","url":null,"abstract":"The authors consider a general, analytic approach to the study of workload effects on computer system dependability, where the faults considered are transient and the dependability measure in question is the time to failure, T/sub f/. Under these conditions, workload plays two roles with opposing effects: it can help detect/correct a correctable fault, or it can cause the system to fail by activating an uncorrectable fault. As a consequence, the overall influence of workload on T/sub f/ is difficult to evaluate intuitively. To examine this in more formal terms, the authors establish a Markov renewal process model that represents the interaction among workload and fault accumulation ins systems for which fault tolerance can be characterized by fault margins. Using this model, they consider some specific examples and show how the probabilistic nature of T/sub f/ can be formulated directly in terms of parameters regarding workload, fault arrivals, and fault margins.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114258469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The computational complexity of fault detection problems and various controllability and observability problems for combinational logic circuits are analyzed. It is shown that the fault detection problem is still NP-complete even for monotone circuits limited in fanout, i.e. the number of signal lines which fanouts from a signal line is limited to three. It is also shown that the observability problem for unate circuits is NP-complete, but that the controllability problem for unate circuits can be solved in time complexity O(m), where m is the number of lines in a circuit. Furthermore, two classes of circuits, called k-binate-bounded circuits and k-bounded circuits, are introduced. For k-binate-bounded circuits, the controllability problem is solvable in polynomial time, and for k-bounded circuits, the fault detection problem is solvable in polynomial time, when k>
{"title":"Computational complexity of controllability/observability problems for combinational circuits","authors":"H. Fujiwara","doi":"10.1109/FTCS.1988.5298","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5298","url":null,"abstract":"The computational complexity of fault detection problems and various controllability and observability problems for combinational logic circuits are analyzed. It is shown that the fault detection problem is still NP-complete even for monotone circuits limited in fanout, i.e. the number of signal lines which fanouts from a signal line is limited to three. It is also shown that the observability problem for unate circuits is NP-complete, but that the controllability problem for unate circuits can be solved in time complexity O(m), where m is the number of lines in a circuit. Furthermore, two classes of circuits, called k-binate-bounded circuits and k-bounded circuits, are introduced. For k-binate-bounded circuits, the controllability problem is solvable in polynomial time, and for k-bounded circuits, the fault detection problem is solvable in polynomial time, when k<or=log p(m) for some polynomial p(m). The class of k-bounded circuits includes many practical circuits such as decoders, adders, one-dimensional cellular arrays, two-dimensional cellular arrays, etc.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"130 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133877271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors present the fundamental concepts for realizing a fault-tolerant parallel processor modeled by a linear cellular automaton. They give the reconfiguration scheme under this model. They treat the processing elements in the processor as cells of the cellular automaton. They regard the operating states of the elements as states of the cells. The processor can be reconfigured easily and quickly by changing the states of its processing elements when faults are detected. The reconfiguration scheme for the processor utilizes the characteristics of polynomial rings over GF(q), where q is a power of a prime number.<>
{"title":"A fault-tolerant parallel processor modeled by a linear cellular automaton","authors":"M. Tsunoyama, S. Naito","doi":"10.1109/FTCS.1988.5340","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5340","url":null,"abstract":"The authors present the fundamental concepts for realizing a fault-tolerant parallel processor modeled by a linear cellular automaton. They give the reconfiguration scheme under this model. They treat the processing elements in the processor as cells of the cellular automaton. They regard the operating states of the elements as states of the cells. The processor can be reconfigured easily and quickly by changing the states of its processing elements when faults are detected. The reconfiguration scheme for the processor utilizes the characteristics of polynomial rings over GF(q), where q is a power of a prime number.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124960183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. The authors identify a set of linear codes which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minium numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, the authors derive a rule of thumb for the selection of a particular code for a given application. Since the overall error in the code will also depend on the method of implementation of the coding scheme, they suggest the use of specific algorithms and special hardware realizations for the check element computation.<>
{"title":"General linear codes for fault-tolerant matrix operations on processor arrays","authors":"S. Nair, J. Abraham","doi":"10.1109/FTCS.1988.5317","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5317","url":null,"abstract":"Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. The authors identify a set of linear codes which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minium numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, the authors derive a rule of thumb for the selection of a particular code for a given application. Since the overall error in the code will also depend on the method of implementation of the coding scheme, they suggest the use of specific algorithms and special hardware realizations for the check element computation.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127900522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors propose search methods for B-trees that correctly perform the search despite the presence of corrupted indices. If in a search the desired record with a given key is not found, the robust search methods determine if this is caused by a corrupted index. The corrupted index is then detected and corrected so that the value of the index is in a right range. The authors first present a method to handle a single error in the tree and then generalize the method to cope with multiple errors. Unlike the previous attempts for robust data structures, their methods do not require redundancy to be added to the data structure and make use of the constraints by which the indices are organized.<>
{"title":"Robust search methods for B-trees","authors":"K. Fujimura, P. Jalote","doi":"10.1109/FTCS.1988.5319","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5319","url":null,"abstract":"The authors propose search methods for B-trees that correctly perform the search despite the presence of corrupted indices. If in a search the desired record with a given key is not found, the robust search methods determine if this is caused by a corrupted index. The corrupted index is then detected and corrected so that the value of the index is in a right range. The authors first present a method to handle a single error in the tree and then generalize the method to cope with multiple errors. Unlike the previous attempts for robust data structures, their methods do not require redundancy to be added to the data structure and make use of the constraints by which the indices are organized.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116033317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors propose an integrated approach to the design of combinational logic circuits in which stuck-open faults and path delay faults are detectable by robust tests that detect modeled faults independent of the delays in the circuit under test. They demonstrate that the proposed designs and tests guarantee the design of CMOS logic circuits in which all path delay faults are locatable.<>
{"title":"On the design of robust testable CMOS combinational logic circuits","authors":"S. Kundu, S. Reddy","doi":"10.1109/FTCS.1988.5323","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5323","url":null,"abstract":"The authors propose an integrated approach to the design of combinational logic circuits in which stuck-open faults and path delay faults are detectable by robust tests that detect modeled faults independent of the delays in the circuit under test. They demonstrate that the proposed designs and tests guarantee the design of CMOS logic circuits in which all path delay faults are locatable.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130701791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}