Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513215
F. Mowrer, M. Pecht
Electronic equipment is expected to operate reliably under normal conditions as well as under foreseeable abnormal conditions, particularly in life-critical and environmentally sensitive applications. One foreseeable abnormal condition to which electronic equipment may be subjected at least once during its life-cycle is a fire environment. Such an environment may include the thermal and corrosive effects in the immediate vicinity of the fire and the nonthermal effects associated with smoke contamination, humidity and corrosion in remote locations. Direct thermal effects are generally so severe that reasonable remedial actions may not be feasible. Fortunately, such effects are frequently restricted to a fairly small zone, often through the use of automatic fire detection and suppression systems. On the other hand, the thermal decomposition products of smoke and fire suppression agents resulting from even a small fire may permeate a building and cause nonthermal damage to electronic equipment in locations remote from the actual fire. With ever-increasing reliance being placed on electronic equipment in all types of applications and the consequent increase in value concentrations, nonthermal damage from fires and fire suppression agents is a topic of growing interest. The purpose of this exploratory research is to characterize nonthermal damage mechanisms, consequences, and potential preventive and remedial actions using a physics-of-failure approach.
{"title":"Exploratory research on nonthermal damage to electronics from fires and fire-suppression agents","authors":"F. Mowrer, M. Pecht","doi":"10.1109/RAMS.1995.513215","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513215","url":null,"abstract":"Electronic equipment is expected to operate reliably under normal conditions as well as under foreseeable abnormal conditions, particularly in life-critical and environmentally sensitive applications. One foreseeable abnormal condition to which electronic equipment may be subjected at least once during its life-cycle is a fire environment. Such an environment may include the thermal and corrosive effects in the immediate vicinity of the fire and the nonthermal effects associated with smoke contamination, humidity and corrosion in remote locations. Direct thermal effects are generally so severe that reasonable remedial actions may not be feasible. Fortunately, such effects are frequently restricted to a fairly small zone, often through the use of automatic fire detection and suppression systems. On the other hand, the thermal decomposition products of smoke and fire suppression agents resulting from even a small fire may permeate a building and cause nonthermal damage to electronic equipment in locations remote from the actual fire. With ever-increasing reliance being placed on electronic equipment in all types of applications and the consequent increase in value concentrations, nonthermal damage from fires and fire suppression agents is a topic of growing interest. The purpose of this exploratory research is to characterize nonthermal damage mechanisms, consequences, and potential preventive and remedial actions using a physics-of-failure approach.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128358025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513264
K. P. LaSala, M. L. Roush, Z. Matic
The Reliable Human-Machine System Developer (REHMS-D) is a major advance in system and reliability engineering that has broad application. By means of a new human reliability model and a six-stage system engineering process, the REHMS-D decision support approach guides the designer through the synthesis of system or process human functions in a manner that results in a system or process that meets the assigned reliability requirements. REHMS-D complements the traditional hardware design process for reliability by providing a means for designing human functions for reliability. Experience with manufacturing process application has led to the conclusion that REHMS-D is valuable in addressing a major competitiveness concern of US Industry-improving manufacturing processes. Also, it is concluded that REHMS-D also has value in system and maintenance design.
{"title":"A decision-support approach for the design of human-machine systems and processes","authors":"K. P. LaSala, M. L. Roush, Z. Matic","doi":"10.1109/RAMS.1995.513264","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513264","url":null,"abstract":"The Reliable Human-Machine System Developer (REHMS-D) is a major advance in system and reliability engineering that has broad application. By means of a new human reliability model and a six-stage system engineering process, the REHMS-D decision support approach guides the designer through the synthesis of system or process human functions in a manner that results in a system or process that meets the assigned reliability requirements. REHMS-D complements the traditional hardware design process for reliability by providing a means for designing human functions for reliability. Experience with manufacturing process application has led to the conclusion that REHMS-D is valuable in addressing a major competitiveness concern of US Industry-improving manufacturing processes. Also, it is concluded that REHMS-D also has value in system and maintenance design.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133138491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513258
S. A. Doyle, J. L. Mackey
This paper presents a quantitative analysis of two configurations of one architectural approach to the integration of hardware and software fault tolerance. The importance of this work is to determine if there is a clear-cut advantage to using one configuration of N-version programming (NVP) over the other. A previous preliminary sensitivity analysis on the individual parameter values showed that downloading a faulty software version had the most significant effect on the reliability and safety of the system. The other parameters that we varied had little or no effect on the systems' performances, or on the relationship between the two systems. This fact demonstrates that our results are relatively robust for the particular parameter values that were chosen. Of course a significantly different set of parameter values may yield different results. Closed form solutions proved difficult to manage. We investigate the well-known anomaly for hardware fault tolerant TMR systems to see if the anomaly still holds when software faults are considered. The anomaly considered is that, for a TMR hardware fault tolerant system, discarding an operational component upon the first failure (and continuing in simplex mode) actually improves reliability. When software faults are considered in a more comprehensive analysis, the anomaly no longer holds.
{"title":"Comparative analysis of two architectural alternatives for the N-version programming (NVP) system","authors":"S. A. Doyle, J. L. Mackey","doi":"10.1109/RAMS.1995.513258","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513258","url":null,"abstract":"This paper presents a quantitative analysis of two configurations of one architectural approach to the integration of hardware and software fault tolerance. The importance of this work is to determine if there is a clear-cut advantage to using one configuration of N-version programming (NVP) over the other. A previous preliminary sensitivity analysis on the individual parameter values showed that downloading a faulty software version had the most significant effect on the reliability and safety of the system. The other parameters that we varied had little or no effect on the systems' performances, or on the relationship between the two systems. This fact demonstrates that our results are relatively robust for the particular parameter values that were chosen. Of course a significantly different set of parameter values may yield different results. Closed form solutions proved difficult to manage. We investigate the well-known anomaly for hardware fault tolerant TMR systems to see if the anomaly still holds when software faults are considered. The anomaly considered is that, for a TMR hardware fault tolerant system, discarding an operational component upon the first failure (and continuing in simplex mode) actually improves reliability. When software faults are considered in a more comprehensive analysis, the anomaly no longer holds.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131313384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513253
T. P. Rothman, A. Dasgupta, J.M. Hu
This paper illustrates a methodology for using physics-of-failure models to extract acceleration transform information from limited test data under accelerated stresses. Test time compression is achieved by appropriately accelerating the stress levels in order to obtain accurate information on reliability. The critical variables are identified and their influence on the stress magnitude is quantified using physics-of-failure models. The total amount of testing time is minimized by tailoring the critical variables in each sample such that multiple stress levels can be achieved in the samples under a single loading. This type of parametric-accelerated test eliminates the need for repeating the test at multiple load levels. Such techniques are essential for cost effective and timely qualification testing of highly reliable modules under accelerated stresses. All sources of error due to experimental variables and assumptions or simplifications in the analytical model are closely examined and discussed. Future work will employ a more detailed physics-of-failure model to quantify the experimental results. These test results also validate physics-of-failure models for acceleration transforms which relate test data to field reliability. Analytical predictive models for acceleration transforms will obviously result in significant savings of cost and time during qualification.
{"title":"Test-time compression for qualification testing of electronic packages: a case study","authors":"T. P. Rothman, A. Dasgupta, J.M. Hu","doi":"10.1109/RAMS.1995.513253","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513253","url":null,"abstract":"This paper illustrates a methodology for using physics-of-failure models to extract acceleration transform information from limited test data under accelerated stresses. Test time compression is achieved by appropriately accelerating the stress levels in order to obtain accurate information on reliability. The critical variables are identified and their influence on the stress magnitude is quantified using physics-of-failure models. The total amount of testing time is minimized by tailoring the critical variables in each sample such that multiple stress levels can be achieved in the samples under a single loading. This type of parametric-accelerated test eliminates the need for repeating the test at multiple load levels. Such techniques are essential for cost effective and timely qualification testing of highly reliable modules under accelerated stresses. All sources of error due to experimental variables and assumptions or simplifications in the analytical model are closely examined and discussed. Future work will employ a more detailed physics-of-failure model to quantify the experimental results. These test results also validate physics-of-failure models for acceleration transforms which relate test data to field reliability. Analytical predictive models for acceleration transforms will obviously result in significant savings of cost and time during qualification.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"277 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116586758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513285
G. G. Maxwell
Pacemaker reliability can be thought of as a measure of the change in quality during lifetime or mean time to replace (MTTR). Quality has been defined as the totality of all the features designed and built into a pacemaker, or any product. If these features are to function as intended for the designed MTTR, a formal quality system which covers design, qualification manufacturing, distribution and market monitoring must be in place. Some important elements of the quality system are shown. Managerial discipline is one of the most critical features of the quality system. No system will be effective if it is compromised or abandoned in a crisis. These different elements can be thought of as tools which are used by management, engineers, scientists and mathematicians. When used in accordance with procedures these tools shape the product into usable form which will meet customer expectations.
{"title":"Pacemaker reliability: design to explant","authors":"G. G. Maxwell","doi":"10.1109/RAMS.1995.513285","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513285","url":null,"abstract":"Pacemaker reliability can be thought of as a measure of the change in quality during lifetime or mean time to replace (MTTR). Quality has been defined as the totality of all the features designed and built into a pacemaker, or any product. If these features are to function as intended for the designed MTTR, a formal quality system which covers design, qualification manufacturing, distribution and market monitoring must be in place. Some important elements of the quality system are shown. Managerial discipline is one of the most critical features of the quality system. No system will be effective if it is compromised or abandoned in a crisis. These different elements can be thought of as tools which are used by management, engineers, scientists and mathematicians. When used in accordance with procedures these tools shape the product into usable form which will meet customer expectations.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128634902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513260
D.R. Beniquez, D.C. Witteried
Rapid improvement in built-in-test (BIT) system capability is probably one of the most important activities in an air-vehicle system development test and evaluation program. An effective BIT adequacy evaluation methodology can accomplish this. The methodology must provide the development tester BIT parameters that serve as indicators of air-vehicle BIT system performance. Deficiencies that are pinpointed can then be corrected to improve the overall BIT system's performance. A BIT adequacy evaluation methodology developed and implemented during a recently conducted Air Force development and test program did just that. It provided real-time technical information at a particular period that indicated the status of the BIT system in relation to specified requirements. It also served as a springboard to evaluate the BIT system, and to document the results, which provided the manufacturer with valuable data on the BIT system's performance. Finally, it gave the manufacturer the opportunity to implement corrective actions within the allocated schedule, and develop the air-vehicle BIT system concurrently.
{"title":"Built-in-test adequacy-evaluation methodology for an air-vehicle system","authors":"D.R. Beniquez, D.C. Witteried","doi":"10.1109/RAMS.1995.513260","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513260","url":null,"abstract":"Rapid improvement in built-in-test (BIT) system capability is probably one of the most important activities in an air-vehicle system development test and evaluation program. An effective BIT adequacy evaluation methodology can accomplish this. The methodology must provide the development tester BIT parameters that serve as indicators of air-vehicle BIT system performance. Deficiencies that are pinpointed can then be corrected to improve the overall BIT system's performance. A BIT adequacy evaluation methodology developed and implemented during a recently conducted Air Force development and test program did just that. It provided real-time technical information at a particular period that indicated the status of the BIT system in relation to specified requirements. It also served as a springboard to evaluate the BIT system, and to document the results, which provided the manufacturer with valuable data on the BIT system's performance. Finally, it gave the manufacturer the opportunity to implement corrective actions within the allocated schedule, and develop the air-vehicle BIT system concurrently.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134138081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513261
K. Le
To meet the demand for high availability in telecommunications networks, it is necessary to ensure sufficient availability of the elements that compose the network, e.g. switching systems. Two attributes key to a network element's availability are reliability and maintainability. Reliability is the element's ability to detect faults and perform automatic actions to minimize the occurrence and impact of service outages. Maintainability is the element's ability to help maintenance personnel perform repair actions quickly and with a minimal risk of outage inducing error. This paper presents an extension of the existing approach to: (1) systematically consider double-event scenarios; and (2) include events other than hardware faults. In the proposed approach, an event could be a hardware fault, a procedural error, a system initiated or manually initiated maintenance activity. An example of double-event scenario is a hardware fault occurring during an automatic reload which was caused by a procedural error.
{"title":"A double-event testing approach to improve network-element availability","authors":"K. Le","doi":"10.1109/RAMS.1995.513261","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513261","url":null,"abstract":"To meet the demand for high availability in telecommunications networks, it is necessary to ensure sufficient availability of the elements that compose the network, e.g. switching systems. Two attributes key to a network element's availability are reliability and maintainability. Reliability is the element's ability to detect faults and perform automatic actions to minimize the occurrence and impact of service outages. Maintainability is the element's ability to help maintenance personnel perform repair actions quickly and with a minimal risk of outage inducing error. This paper presents an extension of the existing approach to: (1) systematically consider double-event scenarios; and (2) include events other than hardware faults. In the proposed approach, an event could be a hardware fault, a procedural error, a system initiated or manually initiated maintenance activity. An example of double-event scenario is a hardware fault occurring during an automatic reload which was caused by a procedural error.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133301689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513248
J. A. Nachlas, C. Cassady, K.F. Rooney
Two candidate models of the relationship between the initial defect concentration in a population of integrated circuit (IC) devices and the associated life distribution are presented. One of the models is based specifically upon wafer geometry while the other portrays stochastic deterioration as the the accumulation of degradation reaction product. The two models are subjected to computer simulation in order to determine if either displays behavior that is consistent with empirical experience. Unfortunately, while both models display some reasonable behavior, neither may be said to provide an adequate general description of IC failure dependence upon initial defects.
{"title":"Hazard-function implications of stochastic-deterioration and distributed-defect concentrations","authors":"J. A. Nachlas, C. Cassady, K.F. Rooney","doi":"10.1109/RAMS.1995.513248","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513248","url":null,"abstract":"Two candidate models of the relationship between the initial defect concentration in a population of integrated circuit (IC) devices and the associated life distribution are presented. One of the models is based specifically upon wafer geometry while the other portrays stochastic deterioration as the the accumulation of degradation reaction product. The two models are subjected to computer simulation in order to determine if either displays behavior that is consistent with empirical experience. Unfortunately, while both models display some reasonable behavior, neither may be said to provide an adequate general description of IC failure dependence upon initial defects.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123441607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513294
N. J. Prescott
Almost without exception, military equipments remain in service beyond their original planned lives. This paper shows how the ability to extend life is greatly influenced by early project decisions. It describes how the identification and management of life-related risks provides a realistic framework on which to plan an equipment's future. It demonstrates how different procurement strategies deliver equipments with different life qualities. It asserts that a robust product will result when life, itself, and related issues, such as reliability and maintainability (R&M), are afforded sufficient priority in the specification and contract, linking payment milestones to achievement. Justification is given for the need to specify modular construction methods and to address growth and mid-life improvement (MLI) at the outset. Focus on maintenance, aided by integrated logistic support (ILS) principles, during design and in service is shown to assist in prolonging life. Ultimately, the paper affirms that we can afford to extend the life of military equipment, provided that the armed forces can continue to perform effectively in the modern battlefield.
{"title":"Equipment life: can we afford to extend it?","authors":"N. J. Prescott","doi":"10.1109/RAMS.1995.513294","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513294","url":null,"abstract":"Almost without exception, military equipments remain in service beyond their original planned lives. This paper shows how the ability to extend life is greatly influenced by early project decisions. It describes how the identification and management of life-related risks provides a realistic framework on which to plan an equipment's future. It demonstrates how different procurement strategies deliver equipments with different life qualities. It asserts that a robust product will result when life, itself, and related issues, such as reliability and maintainability (R&M), are afforded sufficient priority in the specification and contract, linking payment milestones to achievement. Justification is given for the need to specify modular construction methods and to address growth and mid-life improvement (MLI) at the outset. Focus on maintenance, aided by integrated logistic support (ILS) principles, during design and in service is shown to assist in prolonging life. Ultimately, the paper affirms that we can afford to extend the life of military equipment, provided that the armed forces can continue to perform effectively in the modern battlefield.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123949377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-16DOI: 10.1109/RAMS.1995.513278
D. T. Smith, Barry W. Johnson, J. Profeta, D. Bozzolo
The expanding size and complexity of dependable computing systems has significantly increased their cost while complicating the estimation of system dependability attributes such as fault coverage and latency. The increasing requirements of safety and reliability for a dependable system, however, have made the evaluation of dependability attributes a crucial task. One approach to performing such an evaluation is through fault injection. The development of a method which enumerates the equivalent faults associated with a given fault injection experiment would significantly reduce the amount of effort required to measure dependability parameters to the desired degree of accuracy and confidence. This research has developed a new method for determining the set of equivalent faults for either a permanent or transient fault injection experiment. The primary objective of the research effort was to expand the data obtained from a single fault injection experiment into a set of data associated with the equivalent fault set. The end result is an automated method for determining equivalent faults from a set of fault injection experiments. The expanded equivalent data sets are then evaluated to determine dependability parameter estimates for fault coverage and error latency.
{"title":"A method to determine equivalent fault classes for permanent and transient faults","authors":"D. T. Smith, Barry W. Johnson, J. Profeta, D. Bozzolo","doi":"10.1109/RAMS.1995.513278","DOIUrl":"https://doi.org/10.1109/RAMS.1995.513278","url":null,"abstract":"The expanding size and complexity of dependable computing systems has significantly increased their cost while complicating the estimation of system dependability attributes such as fault coverage and latency. The increasing requirements of safety and reliability for a dependable system, however, have made the evaluation of dependability attributes a crucial task. One approach to performing such an evaluation is through fault injection. The development of a method which enumerates the equivalent faults associated with a given fault injection experiment would significantly reduce the amount of effort required to measure dependability parameters to the desired degree of accuracy and confidence. This research has developed a new method for determining the set of equivalent faults for either a permanent or transient fault injection experiment. The primary objective of the research effort was to expand the data obtained from a single fault injection experiment into a set of data associated with the equivalent fault set. The end result is an automated method for determining equivalent faults from a set of fault injection experiments. The expanded equivalent data sets are then evaluated to determine dependability parameter estimates for fault coverage and error latency.","PeriodicalId":143102,"journal":{"name":"Annual Reliability and Maintainability Symposium 1995 Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130299619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}