{"title":"OpenErrorPro: A New Tool for Stochastic Model-Based Reliability and Resilience Analysis","authors":"A. Morozov, K. Ding, Mikael Steurer, K. Janschek","doi":"10.1109/ISSRE.2019.00038","DOIUrl":null,"url":null,"abstract":"Increasing complexity and heterogeneity of modern safety-critical systems require advanced tools for quantitative reliability analysis. Most of the available analytical software exploits classical methods such as event trees, static and dynamic fault trees, reliability block diagrams, simple Bayesian networks, and Markov chains. First, these methods fail to adequately model complex interaction of software, hardware, physical components, dynamic feedback loops, propagation of data errors, nontrivial failure scenarios, sophisticated fault tolerance, and resilience mechanisms. Second, these methods are limited to the evaluation of the fixed set of traditional reliability metrics such as the probability of generic system failure, failure rate, MTTF, MTBF, and MTTR. More flexible models, such as the Dual-graph Error Propagation Model (DEPM) can overcome these limitations but have no available tools. This paper introduces the first open-source DEPM-based analytical software tool OpenErrorPro. The DEPM is a formal stochastic model that captures control and data flow structures and reliability-related properties of executable system components. The numerical analysis in OpenErrorPro is based on the automatic generation of Markov chain models and the utilization of modern Probabilistic Model Checking (PMC) techniques. The PMC enables the analysis of highly-customizable resilience metrics, e.g. \"the probability of system recovery after a specified system failure during the defined time interval\", in addition to the traditional reliability metrics. DEPMs can be automatically generated from Simulink/Stateflow, UML/SysML, and AADL models, as well as source code of software components using LLVM. This allows not only the automated model-based evaluation but also the analysis of systems developed using the combination of several modeling paradigms. The key purpose of the tool is to close the gap between the conventional system design models and advanced analytical methods in order to give system reliability engineers easy and automated access to the full potential of PMC techniques. Finally, OpenErrorPro enables the application of several effective optimizations against the state space explosion of underlying Markov models already in the DEPM level where the system semantics such as control and data flow structures are accessible.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE.2019.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Increasing complexity and heterogeneity of modern safety-critical systems require advanced tools for quantitative reliability analysis. Most of the available analytical software exploits classical methods such as event trees, static and dynamic fault trees, reliability block diagrams, simple Bayesian networks, and Markov chains. First, these methods fail to adequately model complex interaction of software, hardware, physical components, dynamic feedback loops, propagation of data errors, nontrivial failure scenarios, sophisticated fault tolerance, and resilience mechanisms. Second, these methods are limited to the evaluation of the fixed set of traditional reliability metrics such as the probability of generic system failure, failure rate, MTTF, MTBF, and MTTR. More flexible models, such as the Dual-graph Error Propagation Model (DEPM) can overcome these limitations but have no available tools. This paper introduces the first open-source DEPM-based analytical software tool OpenErrorPro. The DEPM is a formal stochastic model that captures control and data flow structures and reliability-related properties of executable system components. The numerical analysis in OpenErrorPro is based on the automatic generation of Markov chain models and the utilization of modern Probabilistic Model Checking (PMC) techniques. The PMC enables the analysis of highly-customizable resilience metrics, e.g. "the probability of system recovery after a specified system failure during the defined time interval", in addition to the traditional reliability metrics. DEPMs can be automatically generated from Simulink/Stateflow, UML/SysML, and AADL models, as well as source code of software components using LLVM. This allows not only the automated model-based evaluation but also the analysis of systems developed using the combination of several modeling paradigms. The key purpose of the tool is to close the gap between the conventional system design models and advanced analytical methods in order to give system reliability engineers easy and automated access to the full potential of PMC techniques. Finally, OpenErrorPro enables the application of several effective optimizations against the state space explosion of underlying Markov models already in the DEPM level where the system semantics such as control and data flow structures are accessible.