An iterative technique for computing the exact probability of aliasing for any linear feedback signature register (i.e. characterized by any feedback polynomial, for any constant probability of error, and for any test length) is proposed. The technique is also applicable to a more general model of the aliasing problem wherein the probability of error may vary with each output bit. The complexity of the technique enables registers of lengths of interest in practice, e.g. 16, to be analyzed readily.<>
{"title":"An iterative technique for calculating aliasing probability of linear feedback signature registers","authors":"A. Ivanov, V. Agarwal","doi":"10.1109/FTCS.1988.5299","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5299","url":null,"abstract":"An iterative technique for computing the exact probability of aliasing for any linear feedback signature register (i.e. characterized by any feedback polynomial, for any constant probability of error, and for any test length) is proposed. The technique is also applicable to a more general model of the aliasing problem wherein the probability of error may vary with each output bit. The complexity of the technique enables registers of lengths of interest in practice, e.g. 16, to be analyzed readily.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127782918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An original BIST (built-in self-test) scheme is proposed to cover some shortcomings of self-checking circuits and to ensure all tests needed for integrated circuits. In the BIST scheme, self-checking techniques and built-in self-test techniques are combined in an original way and take advantage one from the other. This results in a unified BIST scheme (UBIST), allowing a high fault coverage for all tests needed for integrated circuits, e.g. offline test (design verification, manufacturing test, and maintenance test) and online concurrent error detection.<>
{"title":"A unified built-in-test scheme: UBIST","authors":"M. Nicolaidis","doi":"10.1109/FTCS.1988.5314","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5314","url":null,"abstract":"An original BIST (built-in self-test) scheme is proposed to cover some shortcomings of self-checking circuits and to ensure all tests needed for integrated circuits. In the BIST scheme, self-checking techniques and built-in self-test techniques are combined in an original way and take advantage one from the other. This results in a unified BIST scheme (UBIST), allowing a high fault coverage for all tests needed for integrated circuits, e.g. offline test (design verification, manufacturing test, and maintenance test) and online concurrent error detection.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123645801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors introduce two enhancements to optimistic recovery which allow messages to be logged without performing any I/O to stable storage. The first permits messages to be instantaneously logged in volatile storage, as in the sender-based message logging technique of D.B. Johnson and W. Zwaenepoel (1987), but without their restriction of single-fault-tolerance. The second permits message data and/or message arrival orders not to be logged in circumstances where this information can be reconstructed in other ways. They show that the combination of these two optimizations yields a transparent n-fault-tolerant system which logs to stable storage only those messages received from the outside world and a very small number of additional messages.<>
{"title":"Volatile logging in n-fault-tolerant distributed systems","authors":"R. Strom, D. F. Bacon, S. Yemini","doi":"10.1109/FTCS.1988.5295","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5295","url":null,"abstract":"The authors introduce two enhancements to optimistic recovery which allow messages to be logged without performing any I/O to stable storage. The first permits messages to be instantaneously logged in volatile storage, as in the sender-based message logging technique of D.B. Johnson and W. Zwaenepoel (1987), but without their restriction of single-fault-tolerance. The second permits message data and/or message arrival orders not to be logged in circumstances where this information can be reconstructed in other ways. They show that the combination of these two optimizations yields a transparent n-fault-tolerant system which logs to stable storage only those messages received from the outside world and a very small number of additional messages.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126506743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of multiple buses can improve both the fault tolerance and performance of local area computer networks. Existing schemes either depend on active components for full connectivity or can experience decreased performance as many hosts attempt to access one bus. An architecture class based on balanced incomplete block designs (BIBDs) is proposed to address these problems. A BIBD architecture uses redundant communication channels and exhibits degradable performance as faults occur. The performability of such networks is evaluated, where evaluation is based on stochastic activity network models. The results obtained are provided for comparison of BIBD network performability with that of conventional multibus networks.<>
{"title":"Fault-tolerant BIBD networks","authors":"B. Aupperle, J. F. Meyer","doi":"10.1109/FTCS.1988.5336","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5336","url":null,"abstract":"The use of multiple buses can improve both the fault tolerance and performance of local area computer networks. Existing schemes either depend on active components for full connectivity or can experience decreased performance as many hosts attempt to access one bus. An architecture class based on balanced incomplete block designs (BIBDs) is proposed to address these problems. A BIBD architecture uses redundant communication channels and exhibits degradable performance as faults occur. The performability of such networks is evaluated, where evaluation is based on stochastic activity network models. The results obtained are provided for comparison of BIBD network performability with that of conventional multibus networks.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116550001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors present and analyze a uniformly probabilistic model for the self-diagnosis capabilities of a multiprocessor system. In this model an individual processor fails with probability p and a fault-free processor testing a faulty processor detects a fault with probability q, modeling the situation in which processors can be intermittently faulty or the situation where tests are not capable of detecting all possible faults within a processor. They present an efficient algorithm which utilizes a relatively small number of tests (given by any function dominating n log n where n is the number of processors) and achieves correct diagnosis with high probability. They obtain a nearly matching lower bound which shows that no algorithm can achieve correct diagnosis with high probability in systems which conduct a number of tests dominated by n log n. Examples of systems which perform a modest number of tests are given in which the probability of correct diagnosis for the authors' algorithm is very nearly one.<>
{"title":"Almost certain diagnosis for intermittently faulty systems","authors":"D. Blough, G. Sullivan, G. Masson","doi":"10.1109/FTCS.1988.5329","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5329","url":null,"abstract":"The authors present and analyze a uniformly probabilistic model for the self-diagnosis capabilities of a multiprocessor system. In this model an individual processor fails with probability p and a fault-free processor testing a faulty processor detects a fault with probability q, modeling the situation in which processors can be intermittently faulty or the situation where tests are not capable of detecting all possible faults within a processor. They present an efficient algorithm which utilizes a relatively small number of tests (given by any function dominating n log n where n is the number of processors) and achieves correct diagnosis with high probability. They obtain a nearly matching lower bound which shows that no algorithm can achieve correct diagnosis with high probability in systems which conduct a number of tests dominated by n log n. Examples of systems which perform a modest number of tests are given in which the probability of correct diagnosis for the authors' algorithm is very nearly one.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130813813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors describe an efficient method for fault diagnosis in digital systems based on the technique of reasoning. The methodology operates on the observed erroneous behavior and the structure of the system. The behavior consists of the error(s) observed on the circuit's output lines and specific values on the circuit's input lines. The technique described improves on previously published research on diagnostic reasoning in two ways. Previous work has stressed system independent techniques which could be used to diagnose any fault system whose structure can be represented. By concentrating their efforts on the specific case of diagnosing faulty digital circuits, the authors have been able to simplify the representation of the structure of the system. This representation, in the form of an AND/OR fault tree, efficiently abstracts the structure of a faulty digital system. More importantly, a method for partitioning the digital system is introduced which is shown to reduce greatly the complexity of the diagnosis.<>
{"title":"Diagnostic reasoning in digital systems","authors":"Kurt H. Thearling, R. Iyer","doi":"10.1109/FTCS.1988.5333","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5333","url":null,"abstract":"The authors describe an efficient method for fault diagnosis in digital systems based on the technique of reasoning. The methodology operates on the observed erroneous behavior and the structure of the system. The behavior consists of the error(s) observed on the circuit's output lines and specific values on the circuit's input lines. The technique described improves on previously published research on diagnostic reasoning in two ways. Previous work has stressed system independent techniques which could be used to diagnose any fault system whose structure can be represented. By concentrating their efforts on the specific case of diagnosing faulty digital circuits, the authors have been able to simplify the representation of the structure of the system. This representation, in the form of an AND/OR fault tree, efficiently abstracts the structure of a faulty digital system. More importantly, a method for partitioning the digital system is introduced which is shown to reduce greatly the complexity of the diagnosis.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"21 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134288897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Segall, D. Vrsalovic, D. Siewiorek, David A. Yaskin, J. Kownacki, J. Barton, R. Dancey, A. Robinson, T. Lin
An automated real-time distributed accelerated fault injection environment (FIAT) is presented as an attempt to provide suitable tools for the validation process. The authors present the concepts and design, as well as the implementation and evaluation of the FIAT environment. As this system has been built, evaluated and is currently in use, an example of fault tolerant systems such as checkpointing and duplicate and match is used to show its usefulness.<>
{"title":"FIAT-fault injection based automated testing environment","authors":"Z. Segall, D. Vrsalovic, D. Siewiorek, David A. Yaskin, J. Kownacki, J. Barton, R. Dancey, A. Robinson, T. Lin","doi":"10.1109/FTCS.1988.5306","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5306","url":null,"abstract":"An automated real-time distributed accelerated fault injection environment (FIAT) is presented as an attempt to provide suitable tools for the validation process. The authors present the concepts and design, as well as the implementation and evaluation of the FIAT environment. As this system has been built, evaluated and is currently in use, an example of fault tolerant systems such as checkpointing and duplicate and match is used to show its usefulness.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"2021 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125400041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Concurrent C is a superset of C that provides parallel programming facilities. The authors' local area network (LAN) multiprocessor implementation has led them to explore the design and implementation of a fault-tolerant version of Concurrent C called FT Concurrent C. FT Concurrent C allows the programmer to replicate critical processes. A program continues to operate with full functionality as long as at least one of the copies of a replicated process is operational and accessible. As far as the user is concerned, interacting with a replicated process is the same as interactive with an ordinary process. FT Concurrent C also provides facilities for notification upon process termination, detecting processor failure during process interaction and automatically terminating orphan processes. The authors discuss the different approaches to fault tolerance, describe the considerations in the design of FT Concurrent C, and present a programming example.<>
{"title":"Fault tolerant concurrent C: a tool for writing fault tolerant distributed programs","authors":"Robert F. Cmelik, N. Gehani, W. D. Roome","doi":"10.1109/FTCS.1988.5297","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5297","url":null,"abstract":"Concurrent C is a superset of C that provides parallel programming facilities. The authors' local area network (LAN) multiprocessor implementation has led them to explore the design and implementation of a fault-tolerant version of Concurrent C called FT Concurrent C. FT Concurrent C allows the programmer to replicate critical processes. A program continues to operate with full functionality as long as at least one of the copies of a replicated process is operational and accessible. As far as the user is concerned, interacting with a replicated process is the same as interactive with an ordinary process. FT Concurrent C also provides facilities for notification upon process termination, detecting processor failure during process interaction and automatically terminating orphan processes. The authors discuss the different approaches to fault tolerance, describe the considerations in the design of FT Concurrent C, and present a programming example.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116579817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors propose a masking method for asymmetric line faults in LSIs using semidistance codes, which are a class of nonlinear codes. Faults caused by open or short circuit defects in line areas of LSIs can be made asymmetric by controlling the bus driver and the bus terminal gates. The conditions required for codes to mask these faults are clarified, and the codes satisfying these conditions for random faults and adjacent faults caused by line bridging defects are constructed by using a novel concept of semidistance. This masking technique has the advantage that no additional circuits, such as error decoders, are needed. The codes have been applied to the bus lines in the address decoders of the 4-Mb ROMs to improve fabrication yield of the LSIs.<>
{"title":"Masking asymmetric line faults using semi-distance codes","authors":"K. Matsuzawa, E. Fujiwara","doi":"10.1109/FTCS.1988.5343","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5343","url":null,"abstract":"The authors propose a masking method for asymmetric line faults in LSIs using semidistance codes, which are a class of nonlinear codes. Faults caused by open or short circuit defects in line areas of LSIs can be made asymmetric by controlling the bus driver and the bus terminal gates. The conditions required for codes to mask these faults are clarified, and the codes satisfying these conditions for random faults and adjacent faults caused by line bridging defects are constructed by using a novel concept of semidistance. This masking technique has the advantage that no additional circuits, such as error decoders, are needed. The codes have been applied to the bus lines in the address decoders of the 4-Mb ROMs to improve fabrication yield of the LSIs.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126130927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The author describes his system model and failure assumptions by precisely specifying the processor group membership problem. He then gives two protocols for solving this problem. The protocols provide all correct processors with constituent views of the processor group membership. They also guarantee bounded processor failure detection and join processing delays despite any number of performance failures that do not cause network partitioning. The first protocol provides very fast processor failure detection but can require a significant message traffic overhead, even when no failures occur. To reduce this overhead, the author derives the second protocol, which has a (provable) minimal message overhead in the absence of failures but provides a longer failure detection delay and is more complex. He concludes by comparing his approach with other known approaches.<>
{"title":"Agreeing on who is present and who is absent in a synchronous distributed system","authors":"F. Cristian","doi":"10.1109/FTCS.1988.5321","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5321","url":null,"abstract":"The author describes his system model and failure assumptions by precisely specifying the processor group membership problem. He then gives two protocols for solving this problem. The protocols provide all correct processors with constituent views of the processor group membership. They also guarantee bounded processor failure detection and join processing delays despite any number of performance failures that do not cause network partitioning. The first protocol provides very fast processor failure detection but can require a significant message traffic overhead, even when no failures occur. To reduce this overhead, the author derives the second protocol, which has a (provable) minimal message overhead in the absence of failures but provides a longer failure detection delay and is more complex. He concludes by comparing his approach with other known approaches.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127068680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}