A formalism is presented for specification and analysis of real-time constraints of systems at run time. Real-time logic (RTL) is employed to illustrate how timing properties can be specified elegantly in the form of annotation added to a program (or to a design specification). The algorithms for detecting a violation of a timing property at runtime, expressed in RTL, are presented.<>
{"title":"A formalism for monitoring real-time constraints at run-time","authors":"F. Jahanian, A. Goyal","doi":"10.1109/FTCS.1990.89350","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89350","url":null,"abstract":"A formalism is presented for specification and analysis of real-time constraints of systems at run time. Real-time logic (RTL) is employed to illustrate how timing properties can be specified elegantly in the form of annotation added to a program (or to a design specification). The algorithms for detecting a violation of a timing property at runtime, expressed in RTL, are presented.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116631547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph reduction, a computational model which supports the parallel execution of functional languages, is discussed. An MIMD (multiple instruction/multiple data) machine, Flagship, which supports the graph reduction model has been built. The authors investigate the formal specification and proof of an algorithm which can ensure the successful execution of a functional program in the presence of the failure of a processing element (PE) of the Flagship machine. The specifications of the algorithm, the graph reduction model, and the augmented graph reduction model, which can tolerate the failure of a PE, are described using CSP (communicating sequential processes) notation. The algebraic transformation rules of CSP are used to prove that, in the presence of PE failure, the fault-tolerant graph reduction model behaves correctly.<>
{"title":"Specification and proof of a distributed recovery algorithm","authors":"Xinfeng Ye, B. Warboys, J. Keane","doi":"10.1109/FTCS.1990.89377","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89377","url":null,"abstract":"Graph reduction, a computational model which supports the parallel execution of functional languages, is discussed. An MIMD (multiple instruction/multiple data) machine, Flagship, which supports the graph reduction model has been built. The authors investigate the formal specification and proof of an algorithm which can ensure the successful execution of a functional program in the presence of the failure of a processing element (PE) of the Flagship machine. The specifications of the algorithm, the graph reduction model, and the augmented graph reduction model, which can tolerate the failure of a PE, are described using CSP (communicating sequential processes) notation. The algebraic transformation rules of CSP are used to prove that, in the presence of PE failure, the fault-tolerant graph reduction model behaves correctly.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129211486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A theoretical framework for investigating the design for the path-delay-fault testability problem is provided. Necessary and sufficient conditions for the existence of general robust tests in a multioutput, multilevel circuit are given. The conditions for the existence of a more restricted class of robust tests are derived from those for general robust tests. A design procedure is given for the synthesis of multioutput, multilevel combinational logic circuits in which all path delay faults are robustly detectable. A powerful factorization method, that of extended factorization, was exploited for this purpose.<>
{"title":"On the design of path delay fault testable combinational circuits","authors":"A. Pramanick, S. Reddy","doi":"10.1109/FTCS.1990.89391","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89391","url":null,"abstract":"A theoretical framework for investigating the design for the path-delay-fault testability problem is provided. Necessary and sufficient conditions for the existence of general robust tests in a multioutput, multilevel circuit are given. The conditions for the existence of a more restricted class of robust tests are derived from those for general robust tests. A design procedure is given for the synthesis of multioutput, multilevel combinational logic circuits in which all path delay faults are robustly detectable. A powerful factorization method, that of extended factorization, was exploited for this purpose.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116253012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. V. Driel, R. J. B. Follon, A. Kohler, R. V. Osch, J. M. Spanjers
The authors present an analysis of modularly redundant computer systems and various theoretical, as well as implementation, aspects of a highly reliable computer system with a relatively small amount of redundant hardware. From the evaluation of modularly redundant systems it is concluded that the (4, 2)-concept computer system compares favorably with a triple modular redundant and doubled system, with respect to cost, as well as to reliability. In order to cope with the Byzantine Generals problem, a hardware implementation of the algorithm is presented. The (4, 2)-concept is used in the Philips business communication switch SOPHO S-2500 and in a broadband switch (the Philips H1-switch). The prototype computer system presented can be used as a controller (e.g. in a telephone switching system), as a computer system for online transaction processing, or as a general-purpose computer.<>
{"title":"The error-resistant interactively consistent architecture (ERICA)","authors":"C. V. Driel, R. J. B. Follon, A. Kohler, R. V. Osch, J. M. Spanjers","doi":"10.1109/FTCS.1990.89385","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89385","url":null,"abstract":"The authors present an analysis of modularly redundant computer systems and various theoretical, as well as implementation, aspects of a highly reliable computer system with a relatively small amount of redundant hardware. From the evaluation of modularly redundant systems it is concluded that the (4, 2)-concept computer system compares favorably with a triple modular redundant and doubled system, with respect to cost, as well as to reliability. In order to cope with the Byzantine Generals problem, a hardware implementation of the algorithm is presented. The (4, 2)-concept is used in the Philips business communication switch SOPHO S-2500 and in a broadband switch (the Philips H1-switch). The prototype computer system presented can be used as a controller (e.g. in a telephone switching system), as a computer system for online transaction processing, or as a general-purpose computer.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115849531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A compression technique, called periodic quotient compression, which eliminates the problem of aliasing is presented. The compression in signature analysis is based on polynomial division, where the remainder is the signature and the quotient is discarded. With this technique one looks at both the remainder and the quotient and assumes that the good circuit response is known a-priory during the design of the linear feedback shift register (LFSR). The concept of periodic polynomials is used to completely characterize the quotient, thus eliminating aliasing. The maximum number of bits required to compress an N-b response to achieve zero aliasing is determined. The authors provide an algorithm for constructing an LFSR to achieve this bound for any given circuit under test.<>
{"title":"Zero aliasing compression","authors":"S. Gupta, D. Pradhan, S. Reddy","doi":"10.1109/FTCS.1990.89373","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89373","url":null,"abstract":"A compression technique, called periodic quotient compression, which eliminates the problem of aliasing is presented. The compression in signature analysis is based on polynomial division, where the remainder is the signature and the quotient is discarded. With this technique one looks at both the remainder and the quotient and assumes that the good circuit response is known a-priory during the design of the linear feedback shift register (LFSR). The concept of periodic polynomials is used to completely characterize the quotient, thus eliminating aliasing. The maximum number of bits required to compress an N-b response to achieve zero aliasing is determined. The authors provide an algorithm for constructing an LFSR to achieve this bound for any given circuit under test.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The design and analysis of test schemes for algorithm-based fault tolerance (ABFT) are examined. The problem is studied under the assumption that no bound is imposed on the size of a test. Upper and lower bounds are established on the number of tests needed to detect a given number of errors. These bounds are sharply different from those previously established under the bounded test size model. The test schemes presented are easy to implement. It is also shown that the design problem for fault detection is NP-hard even when only one fault needs to be detected. It is shown that the analysis problem is, in general, co-NP-complete and hence unlikely to be efficiently solvable. Several restricted versions of the problem that can be solved efficiently are identified. In addition, a new branch-and-bound algorithm for determining the error detectability of a system is presented.<>
{"title":"Design and analysis of test schemes for algorithm-based fault tolerance","authors":"Dechang Gu, D. Rosenkrantz, S. Ravi","doi":"10.1109/FTCS.1990.89341","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89341","url":null,"abstract":"The design and analysis of test schemes for algorithm-based fault tolerance (ABFT) are examined. The problem is studied under the assumption that no bound is imposed on the size of a test. Upper and lower bounds are established on the number of tests needed to detect a given number of errors. These bounds are sharply different from those previously established under the bounded test size model. The test schemes presented are easy to implement. It is also shown that the design problem for fault detection is NP-hard even when only one fault needs to be detected. It is shown that the analysis problem is, in general, co-NP-complete and hence unlikely to be efficiently solvable. Several restricted versions of the problem that can be solved efficiently are identified. In addition, a new branch-and-bound algorithm for determining the error detectability of a system is presented.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116381981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Some data about the design and validation of a safety critical software, the ESIN application software, are presented. The ESIN application software is integrated within an instrumentation system designed for experimental nuclear reactors. Its main function is to generate the emergency shutdown of the reactor. The development of this software has been based on a fault-avoidance approach: use of a strict life cycle, existence of an independent verification and validation team, and application of rules of design and programming. The data presented here concern the location of faults in the life cycle and in subsystems; a classification of faults in each step is provided. These data are also correlated with the effort spent on verification/qualification.<>
{"title":"An experience of a critical software development","authors":"C. Sayet, E. Pilaud","doi":"10.1109/FTCS.1990.89364","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89364","url":null,"abstract":"Some data about the design and validation of a safety critical software, the ESIN application software, are presented. The ESIN application software is integrated within an instrumentation system designed for experimental nuclear reactors. Its main function is to generate the emergency shutdown of the reactor. The development of this software has been based on a fault-avoidance approach: use of a strict life cycle, existence of an independent verification and validation team, and application of rules of design and programming. The data presented here concern the location of faults in the life cycle and in subsystems; a classification of faults in each step is provided. These data are also correlated with the effort spent on verification/qualification.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130602513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A technique for fitting distributions to empirical recovery time data that focuses on the components that dominate system reliability is proposed. The technique uses Goldfarb's conjugate gradient descent search to minimize the L/sup 2/ norm of the error projected in the Laplace transform domain. A new parametric family of distributions is also suggested and is seen to provide uniformly better predictions of system reliability than the standard distributions used for this purpose, i.e. gamma, Weibull, and log normal. Applications to several sets of real recovery time data are provided.<>
{"title":"Modeling recovery time distributions in ultrareliable fault-tolerant systems","authors":"R. Geist, M. Smotherman, Ronald Talley","doi":"10.1109/FTCS.1990.89400","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89400","url":null,"abstract":"A technique for fitting distributions to empirical recovery time data that focuses on the components that dominate system reliability is proposed. The technique uses Goldfarb's conjugate gradient descent search to minimize the L/sup 2/ norm of the error projected in the Laplace transform domain. A new parametric family of distributions is also suggested and is seen to provide uniformly better predictions of system reliability than the standard distributions used for this purpose, i.e. gamma, Weibull, and log normal. Applications to several sets of real recovery time data are provided.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122550171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Barrett, Andrew M. Hilborne, P. Bond, Douglas T. Seaton, P. Veríssimo, Luís E. T. Rodrigues, N. Speirs
The design of an extra performance architecture for Delta-4, which explicitly supports the requirements of real-time systems with respect to throughput and response, is presented. The Delta-4 approach to fault tolerance is based on the replication of software components on distinct host computers using a range of different replication strategies. The problems of replicate divergence are discussed, and a solution based on message selection and preemption synchronization messages is proposed. A description of the ongoing implementation of such a system within the overall Delta-4 framework is included.<>
{"title":"The Delta-4 extra performance architecture (XPA)","authors":"P. Barrett, Andrew M. Hilborne, P. Bond, Douglas T. Seaton, P. Veríssimo, Luís E. T. Rodrigues, N. Speirs","doi":"10.1109/FTCS.1990.89386","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89386","url":null,"abstract":"The design of an extra performance architecture for Delta-4, which explicitly supports the requirements of real-time systems with respect to throughput and response, is presented. The Delta-4 approach to fault tolerance is based on the replication of software components on distinct host computers using a range of different replication strategies. The problems of replicate divergence are discussed, and a solution based on message selection and preemption synchronization messages is proposed. A description of the ongoing implementation of such a system within the overall Delta-4 framework is included.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131553237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Some aspects of a fault-tolerant tightly coupled multiprocessor architecture are presented. The originality of this architecture resides in the use of a stable transactional memory shared by all processors. To ensure fault tolerance, each update of a memory block is included into an atomic transaction managed by the stable transactional memory. All the blocks that are part of a transaction are written back atomically into stable transaction memory. This work focuses on a protocol which ensures the atomic update of blocks into stable transactional memory when they have been modified by several caches. The results of various simulations that were conducted in order to evaluate the potential performance of the proposed architecture are also presented.<>
{"title":"Cache management in a tightly coupled fault tolerant multiprocessor","authors":"M. Banâtre, Philippe Joubert","doi":"10.1109/FTCS.1990.89339","DOIUrl":"https://doi.org/10.1109/FTCS.1990.89339","url":null,"abstract":"Some aspects of a fault-tolerant tightly coupled multiprocessor architecture are presented. The originality of this architecture resides in the use of a stable transactional memory shared by all processors. To ensure fault tolerance, each update of a memory block is included into an atomic transaction managed by the stable transactional memory. All the blocks that are part of a transaction are written back atomically into stable transaction memory. This work focuses on a protocol which ensures the atomic update of blocks into stable transactional memory when they have been modified by several caches. The results of various simulations that were conducted in order to evaluate the potential performance of the proposed architecture are also presented.<<ETX>>","PeriodicalId":174189,"journal":{"name":"[1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131585839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}