H. Kopetz, H. Kantz, G. Grünsteidl, P. Puschner, J. Reisinger
The concepts of transient fault handling in the MARS architecture are discussed. After an overview of the MARS architecture, the mechanisms for the detection of transient faults are discussed in detail. In addition to extensive checks in the hardware and in the operating system, time-redundant execution of application tasks is proposed for the detection of transient faults. The time difference between the effective and the maximum execution time of an application task is used for this purpose. Whenever a transient fault has been detected, the affected component is turned off and reintegrated immediately by retrieving the uncorrupted state of the actively redundant partner component. In order to reduce the probability of spare exhaustion (in the case of permanent faults) 'shadow components' are introduced. The reliability improvement, which can be realized by these techniques, is calculated by detailed reliability models of the architecture, where the parameters are based on experimental results measured on the present MARS prototype implementation.<>
{"title":"Tolerating transient faults in MARS","authors":"H. Kopetz, H. Kantz, G. Grünsteidl, P. Puschner, J. Reisinger","doi":"10.1109/FTCS.1990.89384","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89384","url":null,"abstract":"The concepts of transient fault handling in the MARS architecture are discussed. After an overview of the MARS architecture, the mechanisms for the detection of transient faults are discussed in detail. In addition to extensive checks in the hardware and in the operating system, time-redundant execution of application tasks is proposed for the detection of transient faults. The time difference between the effective and the maximum execution time of an application task is used for this purpose. Whenever a transient fault has been detected, the affected component is turned off and reintegrated immediately by retrieving the uncorrupted state of the actively redundant partner component. In order to reduce the probability of spare exhaustion (in the case of permanent faults) 'shadow components' are introduced. The reliability improvement, which can be realized by these techniques, is calculated by detailed reliability models of the architecture, where the parameters are based on experimental results measured on the present MARS prototype implementation.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122774014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors describe a practical method for realizing fault-tolerant global control of resources in distributed computing systems. The method is particularly suitable for systems that are based on a centralized arbiter for making control decisions. Many applications in LAN-based computing, online transactions, and telecommunication systems fall into this category. The method exploits the inherent physical separation of distributed computing systems to achieve high reliability in the face of decentralized arbiter failures. A significant feature of the method is that the fault-tolerance mechanisms are imbedded in the normal control signal flow so that the overhead is practically negligible in the absence of faults. The principles behind the method, its internal structure, and its operations are explained. Also, the experience gained through its application is discussed.<>
{"title":"A fault-tolerant strategy for hierarchical control in distributed computing systems","authors":"P. Goyer, Parham Momtahan, B. Selić","doi":"10.1109/FTCS.1990.89343","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89343","url":null,"abstract":"The authors describe a practical method for realizing fault-tolerant global control of resources in distributed computing systems. The method is particularly suitable for systems that are based on a centralized arbiter for making control decisions. Many applications in LAN-based computing, online transactions, and telecommunication systems fall into this category. The method exploits the inherent physical separation of distributed computing systems to achieve high reliability in the face of decentralized arbiter failures. A significant feature of the method is that the fault-tolerance mechanisms are imbedded in the normal control signal flow so that the overhead is practically negligible in the absence of faults. The principles behind the method, its internal structure, and its operations are explained. Also, the experience gained through its application is discussed.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132858672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A method is presented for detecting stuck-open faults, as well as stuck-at faults, in CMOS combinational circuits by short test sequences of fixed length. The discussion is based on the assumption that outputs of all the gates in a circuit are observable. This assumption will become reasonable when a new testability solution called CrossCheck, or a new test equipment, called on electron-beam tester, is used. The concept of k-UCP (uniform, having a (k+1)-Color solution and compatible polarity) circuits is introduced, and it is shown that 2(k+1) kinds of test sequences of length k(k+1)+1 are sufficient to detect stuck-open faults, as well as stuck-at faults in a k-UCP circuit. Furthermore, it is shown that single stuck-open faults can be located by using a fault diagnosis table. A method which can speed up the generation of a fault diagnosis table is also proposed.<>
{"title":"Fault detection and diagnosis of k-UCP circuits under totally observable condition","authors":"X. Wen, K. Kinoshita","doi":"10.1109/FTCS.1990.89392","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89392","url":null,"abstract":"A method is presented for detecting stuck-open faults, as well as stuck-at faults, in CMOS combinational circuits by short test sequences of fixed length. The discussion is based on the assumption that outputs of all the gates in a circuit are observable. This assumption will become reasonable when a new testability solution called CrossCheck, or a new test equipment, called on electron-beam tester, is used. The concept of k-UCP (uniform, having a (k+1)-Color solution and compatible polarity) circuits is introduced, and it is shown that 2(k+1) kinds of test sequences of length k(k+1)+1 are sufficient to detect stuck-open faults, as well as stuck-at faults in a k-UCP circuit. Furthermore, it is shown that single stuck-open faults can be located by using a fault diagnosis table. A method which can speed up the generation of a fault diagnosis table is also proposed.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126730924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Control flow checking techniques are discussed. Invariant properties of the control flow can be checked at two different levels: verification of the sequencing in the controller of the microprocessor or verification of the control flow in the application program. Control flow checking has been implemented, at the two levels, in different versions of a 32-b microprocessor designed in a CMOS 1.5- mu technology. Integration of the monitors on silicon is detailed. The silicon overhead due to the different online test devices is precisely discussed. Different versions of this microprocessor have been designed and implemented in order to make real cost comparisons on components with identical functionality but different integrated monitors. Here only the hardware cost of concurrent checking is considered.<>
{"title":"Design of microprocessors with built-in on-line test","authors":"R. Leveugle, T. Michel, G. Saucier","doi":"10.1109/FTCS.1990.89381","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89381","url":null,"abstract":"Control flow checking techniques are discussed. Invariant properties of the control flow can be checked at two different levels: verification of the sequencing in the controller of the microprocessor or verification of the control flow in the application program. Control flow checking has been implemented, at the two levels, in different versions of a 32-b microprocessor designed in a CMOS 1.5- mu technology. Integration of the monitors on silicon is detailed. The silicon overhead due to the different online test devices is precisely discussed. Different versions of this microprocessor have been designed and implemented in order to make real cost comparisons on components with identical functionality but different integrated monitors. Here only the hardware cost of concurrent checking is considered.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115338688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The author presents a method for detecting anomalous events in communication networks and other similarly characterized environments in which performance anomalies are indicative of failure. The methodology, based on automatically learning the difference between normal and abnormal behavior, has been implemented as part of an automated diagnosis system from which performance results are drawn and presented. The dynamic nature of the model enables a diagnostic system to deal with continuously changing environments without explicit control, reaching to the way the world is now, as opposed to the way the world was planned to be. Results of successful deployment in a noisy, real-time monitoring environment are shown.<>
{"title":"Anomaly detection for diagnosis","authors":"R. Maxion","doi":"10.1109/FTCS.1990.89362","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89362","url":null,"abstract":"The author presents a method for detecting anomalous events in communication networks and other similarly characterized environments in which performance anomalies are indicative of failure. The methodology, based on automatically learning the difference between normal and abnormal behavior, has been implemented as part of an automated diagnosis system from which performance results are drawn and presented. The dynamic nature of the model enables a diagnostic system to deal with continuously changing environments without explicit control, reaching to the way the world is now, as opposed to the way the world was planned to be. Results of successful deployment in a noisy, real-time monitoring environment are shown.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"517 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115348785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Nijhuis, B. Höfflinger, A. V. Schaik, L. Spaanenburg
Input data and hardware fault tolerance of neural networks are discussed. It is shown that fault-tolerant behavior is not self-evident but must be activated by an appropriate learning scheme. Practical limitations are demonstrated by an example of neural character recognition. The results show that the effects of learning and synapse weight decay on fault tolerance largely influence the practicality of large-scale silicon implementations. It is anticipated that, owing to implementation issues, such as the use of volatile memories, some neural VLSI architectures will not be sufficiently fault tolerant.<>
{"title":"Limits to the fault-tolerance of a feedforward neural network with learning","authors":"J. Nijhuis, B. Höfflinger, A. V. Schaik, L. Spaanenburg","doi":"10.1109/FTCS.1990.89370","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89370","url":null,"abstract":"Input data and hardware fault tolerance of neural networks are discussed. It is shown that fault-tolerant behavior is not self-evident but must be activated by an appropriate learning scheme. Practical limitations are demonstrated by an example of neural character recognition. The results show that the effects of learning and synapse weight decay on fault tolerance largely influence the practicality of large-scale silicon implementations. It is anticipated that, owing to implementation issues, such as the use of volatile memories, some neural VLSI architectures will not be sufficiently fault tolerant.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115523442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A synthesis procedure for self-testable finite state machines is presented. Testability comes under consideration when the behavioral description of the circuit is being transformed into a structural description. To this end, a novel state encoding algorithm, as well as a modified self-test architecture, is developed. Experimental results show that this approach leads to a significant reduction of hardware overhead. Self-testing circuits generally employ linear feedback shift registers for pattern generation. The impact of choosing a particular feedback polynomial on the state encoding is discussed.<>
{"title":"Optimized synthesis of self-testable finite state machines","authors":"B. Eschermann, H. Wunderlich","doi":"10.1109/FTCS.1990.89393","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89393","url":null,"abstract":"A synthesis procedure for self-testable finite state machines is presented. Testability comes under consideration when the behavioral description of the circuit is being transformed into a structural description. To this end, a novel state encoding algorithm, as well as a modified self-test architecture, is developed. Experimental results show that this approach leads to a significant reduction of hardware overhead. Self-testing circuits generally employ linear feedback shift registers for pattern generation. The impact of choosing a particular feedback polynomial on the state encoding is discussed.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114862303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assuming the existence of tamper-resistant devices with computational power and storage capacity similar to those of PCs and secure cryptosystems, the authors present loss tolerance schemes that leave the security, autonomy, and untraceability of the basic payment system that uses electronic wallets almost unchanged. These schemes are the distributed account list protocol and the marked standard value (MSV) protocol. The two schemes are compared. It is noted that more important than the problem of loss tolerance is that of constructing really secure temperature resistant devices.<>
{"title":"Loss-tolerance for electronic wallets","authors":"M. Waidner, B. Pfitzmann","doi":"10.1109/FTCS.1990.89349","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89349","url":null,"abstract":"Assuming the existence of tamper-resistant devices with computational power and storage capacity similar to those of PCs and secure cryptosystems, the authors present loss tolerance schemes that leave the security, autonomy, and untraceability of the basic payment system that uses electronic wallets almost unchanged. These schemes are the distributed account list protocol and the marked standard value (MSV) protocol. The two schemes are compared. It is noted that more important than the problem of loss tolerance is that of constructing really secure temperature resistant devices.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114922061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Codes capable of correcting burst asymmetric and unidirectional errors are described. The proposed codes need approximately b+log/sub 2/k check bits to correct a burst of b asymmetric/unidirectional errors, where k is the number of information bits. In most cases, the proposed codes require fewer check bits than the equivalent burst symmetric error-correcting codes. The optimality of the codes is also considered. In addition, efficient codes capable of detecting double burst unidirectional errors are given.<>
{"title":"Burst asymmetric/unidirectional error correcting/detecting codes","authors":"Seungjin Park, B. Bose","doi":"10.1109/FTCS.1990.89375","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89375","url":null,"abstract":"Codes capable of correcting burst asymmetric and unidirectional errors are described. The proposed codes need approximately b+log/sub 2/k check bits to correct a burst of b asymmetric/unidirectional errors, where k is the number of information bits. In most cases, the proposed codes require fewer check bits than the equivalent burst symmetric error-correcting codes. The optimality of the codes is also considered. In addition, efficient codes capable of detecting double burst unidirectional errors are given.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121526637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Advanced Automation System (AAS), a distributed real-time system intended to replace the present en-route and terminal approach US air traffic control computer systems over the next decade, is discussed. High availability of air traffic control services is an essential requirement of the system. The authors discuss the general approach to fault tolerance adopted in the AAS by reviewing some of the questions asked during the system design, various alternative solutions considered, and the reasons for the design choices made.<>
{"title":"Fault-tolerance in the Advanced Automation System","authors":"F. Cristian, Bob Dancey, Jonathan Dehn","doi":"10.1145/504136.504156","DOIUrl":"https://doi.org/10.1145/504136.504156","url":null,"abstract":"The Advanced Automation System (AAS), a distributed real-time system intended to replace the present en-route and terminal approach US air traffic control computer systems over the next decade, is discussed. High availability of air traffic control services is an essential requirement of the system. The authors discuss the general approach to fault tolerance adopted in the AAS by reviewing some of the questions asked during the system design, various alternative solutions considered, and the reasons for the design choices made.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115118751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}