It is shown that an n-input sorting network (SN) can be used to implement all n-variable symmetric threshold functions, using the least amount of hardware. A procedure of generating the minimal test set for K.E. Batcher's SNs is presented. An upper bound is determined for the number of tests required to detect all stuck-at faults in an n-input SN; it is fewer than in similar designs used to date. Finally, it is shown that the SNs can be used to realize easily testable self-testing checkers (STCs) for m-out-of-2m codes and all J.M. Berger codes. The new STCs for m/2m codes (m>3) have the lowest gate count and require the fewest number of tests. Upper bounds are also found for the number of tests required by the new STCs for Berger codes with I information bits. For I>or=14 they require fewer gates than similar designs known to date.<>
{"title":"The minimal test set for sorting networks and the use of sorting networks in self-testing checkers for unordered codes","authors":"S. Piestrak","doi":"10.1109/FTCS.1990.89382","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89382","url":null,"abstract":"It is shown that an n-input sorting network (SN) can be used to implement all n-variable symmetric threshold functions, using the least amount of hardware. A procedure of generating the minimal test set for K.E. Batcher's SNs is presented. An upper bound is determined for the number of tests required to detect all stuck-at faults in an n-input SN; it is fewer than in similar designs used to date. Finally, it is shown that the SNs can be used to realize easily testable self-testing checkers (STCs) for m-out-of-2m codes and all J.M. Berger codes. The new STCs for m/2m codes (m>3) have the lowest gate count and require the fewest number of tests. Upper bounds are also found for the number of tests required by the new STCs for Berger codes with I information bits. For I>or=14 they require fewer gates than similar designs known to date.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122102942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors present a software-based approach that uses run-time program behavior to minimize the performance overhead in signature control flow checking. In general, for both RISC (reduced-instruction-set-computer) and CISC (complex-instruction-set-computer) architectures, it is found that using run-time information can reduce the performance overhead by 50%. For the MC68000, the performance overhead for adding justifying and reference signatures to the program code is approximately 2.8%. In addition to optimizing the performance, the authors' approach does not increase the hardware complexity of the monitor. Furthermore, an O(N/sup 2/) algorithm which inserts justifying signatures on the arcs of the program control flow graph with N nodes is presented. It is shown that the algorithm complexity of previous schemes which insert justifying signatures in the program nodes is exponential.<>
{"title":"A software based approach to achieving optimal performance for signature control flow checking","authors":"N. Warter, Wen-mei W. Hwu","doi":"10.1109/FTCS.1990.89399","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89399","url":null,"abstract":"The authors present a software-based approach that uses run-time program behavior to minimize the performance overhead in signature control flow checking. In general, for both RISC (reduced-instruction-set-computer) and CISC (complex-instruction-set-computer) architectures, it is found that using run-time information can reduce the performance overhead by 50%. For the MC68000, the performance overhead for adding justifying and reference signatures to the program code is approximately 2.8%. In addition to optimizing the performance, the authors' approach does not increase the hardware complexity of the monitor. Furthermore, an O(N/sup 2/) algorithm which inserts justifying signatures on the arcs of the program control flow graph with N nodes is presented. It is shown that the algorithm complexity of previous schemes which insert justifying signatures in the program nodes is exponential.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114075069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A two-stage approach to the design of algorithm-based fault-tolerant (ABFT) systems is proposed. In the first stage a code is chosen to encode the data used in the algorithm. In the second stage the optimal architecture for implementing the scheme is chosen through the use of dependence graphs. Dependence graphs are a graph-theoretic form of algorithm representation. It is demonstrated that not all architectures are ideal for the implementation of a particular ABFT scheme. The authors propose new measures for characterizing the fault-tolerance capability of a system in order to better exploit the proposed design method. Dependence graphs can also be used for the synthesis of ABFT schemes for nonlinear problems. An example of a fault-tolerant median filter is provided to illustrate the usefulness of the dependence graph as a design tool for nonlinear system synthesis.<>
{"title":"A dependence graph-based approach to the design of algorithm-based fault tolerant systems","authors":"B. Vinnakota, N. Jha","doi":"10.1109/FTCS.1990.89347","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89347","url":null,"abstract":"A two-stage approach to the design of algorithm-based fault-tolerant (ABFT) systems is proposed. In the first stage a code is chosen to encode the data used in the algorithm. In the second stage the optimal architecture for implementing the scheme is chosen through the use of dependence graphs. Dependence graphs are a graph-theoretic form of algorithm representation. It is demonstrated that not all architectures are ideal for the implementation of a particular ABFT scheme. The authors propose new measures for characterizing the fault-tolerance capability of a system in order to better exploit the proposed design method. Dependence graphs can also be used for the synthesis of ABFT schemes for nonlinear problems. An example of a fault-tolerant median filter is provided to illustrate the usefulness of the dependence graph as a design tool for nonlinear system synthesis.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123422294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A DSD (distributed self-diagnosing) project that consists of the implementation of a distributed self-diagnosis algorithm and its application to distributed computer networks is presented. The EVENT-SELF algorithm presented combines the rigor associated with theoretical results with the resource limitations associated with actual systems. Resource limitations identified in real systems include available message capacity for the communication network and limited processor execution speed. The EVENT-SELF algorithm differs from previously published algorithms by adopting an event-driven approach to self-diagnosability. Algorithm messages are reduced to those messages required to indicate changes in system those messages required to indicate changes in system state. Practical issues regarding the CMU-ECE DSD implementation are considered. These issues include the reconfiguration of the testing subnetwork for environments in which processors can be added and removed. One of the goals of this work is to utilize the developed CMU-ECE DSD system as an experimental test-bed environment for distributed applications.<>
{"title":"Practical application and implementation of distributed system-level diagnosis theory","authors":"R. Bianchini, K. Goodwin, D. S. Nydick","doi":"10.1109/FTCS.1990.89380","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89380","url":null,"abstract":"A DSD (distributed self-diagnosing) project that consists of the implementation of a distributed self-diagnosis algorithm and its application to distributed computer networks is presented. The EVENT-SELF algorithm presented combines the rigor associated with theoretical results with the resource limitations associated with actual systems. Resource limitations identified in real systems include available message capacity for the communication network and limited processor execution speed. The EVENT-SELF algorithm differs from previously published algorithms by adopting an event-driven approach to self-diagnosability. Algorithm messages are reduced to those messages required to indicate changes in system those messages required to indicate changes in system state. Practical issues regarding the CMU-ECE DSD implementation are considered. These issues include the reconfiguration of the testing subnetwork for environments in which processors can be added and removed. One of the goals of this work is to utilize the developed CMU-ECE DSD system as an experimental test-bed environment for distributed applications.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128404964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors proposes a reliability model for recovery block structures based on error events that can be observed and distinguished during testing. Strategies are described for the collection of failure histories, which are needed to estimate the model parameters and obtain dependability predictions. Given that the software goes through different testing stages, the model can be employed at different points of the development cycle to assess or forecast the quality of project choices and the resulting product.<>
{"title":"On the modelling and testing of recovery block structures","authors":"G. Pucci","doi":"10.1109/FTCS.1990.89389","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89389","url":null,"abstract":"The authors proposes a reliability model for recovery block structures based on error events that can be observed and distinguished during testing. Strategies are described for the collection of failure histories, which are needed to estimate the model parameters and obtain dependability predictions. Given that the software goes through different testing stages, the model can be employed at different points of the development cycle to assess or forecast the quality of project choices and the resulting product.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126347679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A novel algorithm-based fault tolerance scheme is proposed for fast Fourier transform (FFT) networks. It is shown that the proposed scheme achieves 100% fault coverage theoretically. An accurate measure of the fault coverage for FFT networks is provided by taking the roundoff error into account. It is shown that the proposed scheme maintains the low hardware overhead and high throughput of J.Y. Jou and J.A. Abraham's scheme and, at the same time, increases the fault coverage significantly.<>
{"title":"A novel concurrent error detection scheme for FFT networks","authors":"D. Tao, C. Hartmann","doi":"10.1109/FTCS.1990.89346","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89346","url":null,"abstract":"A novel algorithm-based fault tolerance scheme is proposed for fast Fourier transform (FFT) networks. It is shown that the proposed scheme achieves 100% fault coverage theoretically. An accurate measure of the fault coverage for FFT networks is provided by taking the roundoff error into account. It is shown that the proposed scheme maintains the low hardware overhead and high throughput of J.Y. Jou and J.A. Abraham's scheme and, at the same time, increases the fault coverage significantly.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127619299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A composition technique for building large fault-tolerant systems hierarchically using the concept of checks at different levels in the hierarchy is described. A small system of known fault detectability and locatability is replicated several times, and new checks are added at the next higher level. Such checks at different levels can be introduced into most of the existing multiprocessor systems. An analysis technique based on a matrix model is developed. Relationships between the fault detectability and locatability of a basic system are derived, and the corresponding values of the complete system are computed hierarchically. Finally, the techniques are extended to complex systems in which individual processors produce multiple sets of data elements.<>
{"title":"Hierarchical design and analysis of fault-tolerant multiprocessor systems using concurrent error detection","authors":"S. Nair, J. Abraham","doi":"10.1109/FTCS.1990.89348","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89348","url":null,"abstract":"A composition technique for building large fault-tolerant systems hierarchically using the concept of checks at different levels in the hierarchy is described. A small system of known fault detectability and locatability is replicated several times, and new checks are added at the next higher level. Such checks at different levels can be introduced into most of the existing multiprocessor systems. An analysis technique based on a matrix model is developed. Relationships between the fault detectability and locatability of a basic system are derived, and the corresponding values of the complete system are computed hierarchically. Finally, the techniques are extended to complex systems in which individual processors produce multiple sets of data elements.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126690404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A compiler-based approach to generating efficient checkpoints for process recovery is described. The presented approach to checkpointing is programmer, operating system, and hardware transparent. Compile-time information is exploited to maintain the desired checkpoint interval and to reduce the size of checkpoints. Compiler-generated sparse potential checkpoint code is used to maintain the desired checkpoint interval. Adaptive checkpointing has been developed to reduce the size of checkpoints by exploiting potentially large variations in memory usage. A training technique is used in selecting the low-cost, high-coverage potential checkpoints. Since the potential checkpoint selection problem is NP-complete, a heuristic algorithm has been developed to obtain a quick suboptimal solution. These compiler-assisted checkpointing techniques have been implemented in a modified version of the GNU C (GCC) compiler version of 1.34. Experiments utilizing the CATCH GCC compiler on SUN workstations are described.<>
描述了一种基于编译器的方法,用于为进程恢复生成有效的检查点。所提出的检查点方法是程序员、操作系统和硬件透明的。利用编译时信息来维持所需的检查点间隔并减小检查点的大小。编译器生成的稀疏潜在检查点代码用于维护所需的检查点间隔。开发自适应检查点是为了通过利用内存使用的潜在大变化来减小检查点的大小。在选择低成本、高覆盖率的潜在检查点时使用了一种训练技术。由于潜在检查点选择问题是np完全的,因此开发了一种启发式算法来快速获得次优解。这些编译器辅助的检查点技术已经在GNU C (GCC)编译器1.34的修改版本中实现。描述了在SUN工作站上使用CATCH GCC编译器的实验。
{"title":"CATCH-compiler-assisted techniques for checkpointing","authors":"C. Li, W. Fuchs","doi":"10.1109/FTCS.1990.89337","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89337","url":null,"abstract":"A compiler-based approach to generating efficient checkpoints for process recovery is described. The presented approach to checkpointing is programmer, operating system, and hardware transparent. Compile-time information is exploited to maintain the desired checkpoint interval and to reduce the size of checkpoints. Compiler-generated sparse potential checkpoint code is used to maintain the desired checkpoint interval. Adaptive checkpointing has been developed to reduce the size of checkpoints by exploiting potentially large variations in memory usage. A training technique is used in selecting the low-cost, high-coverage potential checkpoints. Since the potential checkpoint selection problem is NP-complete, a heuristic algorithm has been developed to obtain a quick suboptimal solution. These compiler-assisted checkpointing techniques have been implemented in a modified version of the GNU C (GCC) compiler version of 1.34. Experiments utilizing the CATCH GCC compiler on SUN workstations are described.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"297 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124281906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}