An exact fault simulation can be achieved by simulating only the faults on reconvergent fanout stems, while determining the detectability of faults on other lines by critical path tracing within fanout-free regions. The authors have delimited, for every convergent fanout stem, a region of the circuit outside of which the stem fault does not have to be simulated. Lines on the boundary of such a stem region, called exit lines, have the following property: if the stem fault is detected at the line, and the line is critical with respect to a primary output, then the stem fault is detected at the primary output. Any fault-simulation technique can be used to simulate the stem fault within its stem region. The fault simulation complexity of a circuit is shown to be directly related to the number and size of stem regions in the circuit. Results obtained for the well-known benchmark circuits are presented.<>
{"title":"A reconvergent fanout analysis for efficient exact fault simulation of combinational circuits","authors":"F. Maamari, J. Rajski","doi":"10.1109/FTCS.1988.5309","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5309","url":null,"abstract":"An exact fault simulation can be achieved by simulating only the faults on reconvergent fanout stems, while determining the detectability of faults on other lines by critical path tracing within fanout-free regions. The authors have delimited, for every convergent fanout stem, a region of the circuit outside of which the stem fault does not have to be simulated. Lines on the boundary of such a stem region, called exit lines, have the following property: if the stem fault is detected at the line, and the line is critical with respect to a primary output, then the stem fault is detected at the primary output. Any fault-simulation technique can be used to simulate the stem fault within its stem region. The fault simulation complexity of a circuit is shown to be directly related to the number and size of stem regions in the circuit. Results obtained for the well-known benchmark circuits are presented.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125948492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean-Claude Fabre, Y. Deswarte, J. Laprie, David Powell
The authors present a technique for maximizing the redundancy level of tasks and tolerating hardware faults by majority voting in the context of a network of workstations. The idea is to compute dynamically the number of copies allocated to each task, according to the number of sites and the tasks' criticality parameters. This technique leads to maximum utilization of the available resources in the distributed system, i.e. it reduces the idleness of resources and increases the redundancy of tasks. A reduction in fault dormancy and error latency is thus provided. This technique, called the saturation technique, is compared with similar approaches. A detailed description and the results obtained by simulation showing the advantages and the cost of implementing the saturation technique are given. The authors underline the structure of a convenient distributed operating system, including the execution model and task designation, to support the execution of multiple copies of tasks. The fault assumptions are discussed, and the different phases of a distributed scheduler are detailed.<>
{"title":"Saturation: reduced idleness for improved fault-tolerance","authors":"Jean-Claude Fabre, Y. Deswarte, J. Laprie, David Powell","doi":"10.1109/FTCS.1988.5320","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5320","url":null,"abstract":"The authors present a technique for maximizing the redundancy level of tasks and tolerating hardware faults by majority voting in the context of a network of workstations. The idea is to compute dynamically the number of copies allocated to each task, according to the number of sites and the tasks' criticality parameters. This technique leads to maximum utilization of the available resources in the distributed system, i.e. it reduces the idleness of resources and increases the redundancy of tasks. A reduction in fault dormancy and error latency is thus provided. This technique, called the saturation technique, is compared with similar approaches. A detailed description and the results obtained by simulation showing the advantages and the cost of implementing the saturation technique are given. The authors underline the structure of a convenient distributed operating system, including the execution model and task designation, to support the execution of multiple copies of tasks. The fault assumptions are discussed, and the different phases of a distributed scheduler are detailed.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126138461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A major source of transient errors and unreliable operation of large crosspoint switching networks is the simultaneous switching ( Delta I) noise that is caused by the switching of a large number of off-chip drivers in a chip. An architectural solution to this problem is presented for networks constructed from one-sided crosspoint switching chips. The method seeks to achieve a uniform distribution of active drivers among the chips by rearranging a subset of the existing connections when a new connection is made. The problem is studied in the context of a one-sided crosspoint network with N=rn ports constructed from individual switching chips of size n*m/2. The authors show that the lower bound of m/r active drivers per chip can always be maintained in practice when m/r is an even number. The maximum number of rearrangements needed is min(m/2-1, 2r-1). In addition, the rearrangements are confined to two chip columns of the matrix.<>
{"title":"Reliable design of large crosspoint switching networks","authors":"A. Varma, Joydeep Ghosh, C. J. Georgiou","doi":"10.1109/FTCS.1988.5338","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5338","url":null,"abstract":"A major source of transient errors and unreliable operation of large crosspoint switching networks is the simultaneous switching ( Delta I) noise that is caused by the switching of a large number of off-chip drivers in a chip. An architectural solution to this problem is presented for networks constructed from one-sided crosspoint switching chips. The method seeks to achieve a uniform distribution of active drivers among the chips by rearranging a subset of the existing connections when a new connection is made. The problem is studied in the context of a one-sided crosspoint network with N=rn ports constructed from individual switching chips of size n*m/2. The authors show that the lower bound of m/r active drivers per chip can always be maintained in practice when m/r is an even number. The maximum number of rearrangements needed is min(m/2-1, 2r-1). In addition, the rearrangements are confined to two chip columns of the matrix.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126190204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hardware engines (e.g. YSE and EVE) have been built to perform functional simulation of large designs over many patterns. The authors present a method of simulating faults in parallel that is applicable to these hardware simulation engines (and to software simulators with similar characteristics). A notion of independence between faults is used to determine the faults that can be simulated in parallel. An efficient algorithm is developed to determine the independent subsets of faults. Results of applying the algorithm to large examples are presented and shown to be very good by comparing them with theoretical lower bounds. This technique makes it feasible to fault simulate large networks using these hardware simulation engines.<>
{"title":"On simulating faults in parallel","authors":"V. Iyengar, D. Tang","doi":"10.1109/FTCS.1988.5307","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5307","url":null,"abstract":"Hardware engines (e.g. YSE and EVE) have been built to perform functional simulation of large designs over many patterns. The authors present a method of simulating faults in parallel that is applicable to these hardware simulation engines (and to software simulators with similar characteristics). A notion of independence between faults is used to determine the faults that can be simulated in parallel. An efficient algorithm is developed to determine the independent subsets of faults. Results of applying the algorithm to large examples are presented and shown to be very good by comparing them with theoretical lower bounds. This technique makes it feasible to fault simulate large networks using these hardware simulation engines.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126831394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John P. J. Kelly, D. Eckhardt, M. Vouk, D. McAllister, A. Caglayan
The second-generation experiment is a large-scale empirical study of the development and operation of multiversion software systems that has engaged researchers at five universities and three research institutes. The authors present the history and current status of this experiment. The primary objective for the second generation experiments is an examination of multiple-version reliability improvement. Experimentation concerns have been focused on the development of multiversion software (MVS) systems, primarily design and testing issues, and the modeling and analysis of these systems. A preliminary analysis of the multiple software versions has been performed and is reported.<>
{"title":"A large scale second generation experiment in multi-version software: description and early results","authors":"John P. J. Kelly, D. Eckhardt, M. Vouk, D. McAllister, A. Caglayan","doi":"10.1109/FTCS.1988.5290","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5290","url":null,"abstract":"The second-generation experiment is a large-scale empirical study of the development and operation of multiversion software systems that has engaged researchers at five universities and three research institutes. The authors present the history and current status of this experiment. The primary objective for the second generation experiments is an examination of multiple-version reliability improvement. Experimentation concerns have been focused on the development of multiversion software (MVS) systems, primarily design and testing issues, and the modeling and analysis of these systems. A preliminary analysis of the multiple software versions has been performed and is reported.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116562756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Schmeichel, S. Hakimi, M. Otsuka, Geoff Sullivan
A bound is obtained for the number of rounds of testing sufficient to identify the faulty units of a system. Within a single round each unit may participate in at most one test. The authors give an adaptive algorithm which works in O(log/sub (n/t)//sup t/) rounds and uses O(n) tests. The multiplicative constants in the new bounds are small; four in both cases. This is a major improvement over previous nonadaptive and adaptive algorithm which required O(t+log n) rounds of testing and O(n+t) tests. If t>n/sup 1- epsilon /, then the algorithm runs within a constant number of rounds.<>
{"title":"On minimizing testing rounds for fault identification","authors":"E. Schmeichel, S. Hakimi, M. Otsuka, Geoff Sullivan","doi":"10.1109/FTCS.1988.5330","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5330","url":null,"abstract":"A bound is obtained for the number of rounds of testing sufficient to identify the faulty units of a system. Within a single round each unit may participate in at most one test. The authors give an adaptive algorithm which works in O(log/sub (n/t)//sup t/) rounds and uses O(n) tests. The multiplicative constants in the new bounds are small; four in both cases. This is a major improvement over previous nonadaptive and adaptive algorithm which required O(t+log n) rounds of testing and O(n+t) tests. If t>n/sup 1- epsilon /, then the algorithm runs within a constant number of rounds.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122449286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A description is given of an empirical study of the failure characteristics of software defects detected in the programs developed in the Project on Diverse Software (PODS). The results are interpreted in the context of a state machine model of software failure. The results of the empirical study case doubts on the general validity of the assumption of constant software failure probability and the assumption of constant software failure probability and the assumption that all defects have similar failure rates. In addition, an analysis of failure dependency lends support to the use of diversity as a means of minimizing the impact of design-level faults. Here, nonidentical faults exhibited coincident failure characteristics approximately in accord with the independence assumption, and some of the observed positive and negative correlation effects could be explained by failure masking effects, which can be removed by suitable design.<>
{"title":"PODS revisited-a study of software failure behaviour","authors":"P. Bishop, F. D. Pullen","doi":"10.1109/FTCS.1988.5289","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5289","url":null,"abstract":"A description is given of an empirical study of the failure characteristics of software defects detected in the programs developed in the Project on Diverse Software (PODS). The results are interpreted in the context of a state machine model of software failure. The results of the empirical study case doubts on the general validity of the assumption of constant software failure probability and the assumption of constant software failure probability and the assumption that all defects have similar failure rates. In addition, an analysis of failure dependency lends support to the use of diversity as a means of minimizing the impact of design-level faults. Here, nonidentical faults exhibited coincident failure characteristics approximately in accord with the independence assumption, and some of the observed positive and negative correlation effects could be explained by failure masking effects, which can be removed by suitable design.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129937038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An on-line error-detectable high-speed array divider is proposed. The divider is based on a formerly proposed algorithm using a redundant binary representation with a digit set (0, 1, -1). The computation time of the n-bit divider is proportional to n, in contrast to that of an array divider based on a conventional subtract-and-shift algorithm, which is proportional to n/sup 2/. By the residue checks of only the dividend, divisor, quotient, and the remainder, and a few additional checks, any error caused by a single-cell fault can be detected in normal computation. The amount of additional hardware to achieve the online error-detectability is proportional to n, and very small compared with the whole amount of hardware of the divider.<>
{"title":"An on-line error-detectable array divider with a redundant binary representation and a residue code","authors":"N. Takagi, S. Yajima","doi":"10.1109/FTCS.1988.5316","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5316","url":null,"abstract":"An on-line error-detectable high-speed array divider is proposed. The divider is based on a formerly proposed algorithm using a redundant binary representation with a digit set (0, 1, -1). The computation time of the n-bit divider is proportional to n, in contrast to that of an array divider based on a conventional subtract-and-shift algorithm, which is proportional to n/sup 2/. By the residue checks of only the dividend, divisor, quotient, and the remainder, and a few additional checks, any error caused by a single-cell fault can be detected in normal computation. The amount of additional hardware to achieve the online error-detectability is proportional to n, and very small compared with the whole amount of hardware of the divider.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114791088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors discuss the case in which the redundant elements are arranged in the form of spare rows and spare columns for a rectangular array. Redundant RAMs are examples of such case. A covering is set of rows and columns that are to be replaced by spare rows and spare columns so that all defective elements are replaced. The authors introduce the notion of a critical set, which is a maximum set of rows and columns that must be included in any minimum covering. They show that for a given pattern of defective elements the corresponding critical set is unique. They also present a polynomial-time algorithm for finding the critical set and demonstrate how the concept of critical sets can be used to solve a number of fault-coverage problems.<>
{"title":"Minimum fault coverage in reconfigurable arrays","authors":"N. Hasan, C. Liu","doi":"10.1109/FTCS.1988.5342","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5342","url":null,"abstract":"The authors discuss the case in which the redundant elements are arranged in the form of spare rows and spare columns for a rectangular array. Redundant RAMs are examples of such case. A covering is set of rows and columns that are to be replaced by spare rows and spare columns so that all defective elements are replaced. The authors introduce the notion of a critical set, which is a maximum set of rows and columns that must be included in any minimum covering. They show that for a given pattern of defective elements the corresponding critical set is unique. They also present a polynomial-time algorithm for finding the critical set and demonstrate how the concept of critical sets can be used to solve a number of fault-coverage problems.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125473055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a feasibility study on offline testing of VLSI systems with concurrent error checking capability, the multiple fault testability is evaluated for self-testing checkers. New offline testing schema called codeword testing and noncodeword testing are introduced, in which all codewords and a small number of noncodewords are used as test inputs and the checker outputs are observed to decide if the circuit under test is faulty or not. It is proved that all the multiple stuck-at faults in tree-structured two-rail code checkers are detected with codeword testing followed by noncodeword testing. It is shown by simulation experiments that codeword testing can detect more than 99% of all possible double and triple faults in existing self-testing checkers for two-rail codes, Berger codes, and k-out-of-2k codes. The simulation experiments also show that all of the double and triple faults that elude the codeword testing are detected by noncodeword testing in which a small number of noncodewords are needed.<>
{"title":"Multiple stuck-at fault testability of self-testing checkers","authors":"T. Nanya, S. Mourad, E. McCluskey","doi":"10.1109/FTCS.1988.5347","DOIUrl":"https://doi.org/10.1109/FTCS.1988.5347","url":null,"abstract":"As a feasibility study on offline testing of VLSI systems with concurrent error checking capability, the multiple fault testability is evaluated for self-testing checkers. New offline testing schema called codeword testing and noncodeword testing are introduced, in which all codewords and a small number of noncodewords are used as test inputs and the checker outputs are observed to decide if the circuit under test is faulty or not. It is proved that all the multiple stuck-at faults in tree-structured two-rail code checkers are detected with codeword testing followed by noncodeword testing. It is shown by simulation experiments that codeword testing can detect more than 99% of all possible double and triple faults in existing self-testing checkers for two-rail codes, Berger codes, and k-out-of-2k codes. The simulation experiments also show that all of the double and triple faults that elude the codeword testing are detected by noncodeword testing in which a small number of noncodewords are needed.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"131 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130802236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}