2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)最新文献_第3页

OpenErrorPro: A New Tool for Stochastic Model-Based Reliability and Resilience Analysis OpenErrorPro:一个基于随机模型的可靠性和弹性分析新工具

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00038

A. Morozov, K. Ding, Mikael Steurer, K. Janschek

Increasing complexity and heterogeneity of modern safety-critical systems require advanced tools for quantitative reliability analysis. Most of the available analytical software exploits classical methods such as event trees, static and dynamic fault trees, reliability block diagrams, simple Bayesian networks, and Markov chains. First, these methods fail to adequately model complex interaction of software, hardware, physical components, dynamic feedback loops, propagation of data errors, nontrivial failure scenarios, sophisticated fault tolerance, and resilience mechanisms. Second, these methods are limited to the evaluation of the fixed set of traditional reliability metrics such as the probability of generic system failure, failure rate, MTTF, MTBF, and MTTR. More flexible models, such as the Dual-graph Error Propagation Model (DEPM) can overcome these limitations but have no available tools. This paper introduces the first open-source DEPM-based analytical software tool OpenErrorPro. The DEPM is a formal stochastic model that captures control and data flow structures and reliability-related properties of executable system components. The numerical analysis in OpenErrorPro is based on the automatic generation of Markov chain models and the utilization of modern Probabilistic Model Checking (PMC) techniques. The PMC enables the analysis of highly-customizable resilience metrics, e.g. "the probability of system recovery after a specified system failure during the defined time interval", in addition to the traditional reliability metrics. DEPMs can be automatically generated from Simulink/Stateflow, UML/SysML, and AADL models, as well as source code of software components using LLVM. This allows not only the automated model-based evaluation but also the analysis of systems developed using the combination of several modeling paradigms. The key purpose of the tool is to close the gap between the conventional system design models and advanced analytical methods in order to give system reliability engineers easy and automated access to the full potential of PMC techniques. Finally, OpenErrorPro enables the application of several effective optimizations against the state space explosion of underlying Markov models already in the DEPM level where the system semantics such as control and data flow structures are accessible.

现代安全关键系统日益增加的复杂性和异质性需要先进的定量可靠性分析工具。大多数可用的分析软件利用经典方法，如事件树、静态和动态故障树、可靠性框图、简单贝叶斯网络和马尔可夫链。首先，这些方法不能充分模拟软件、硬件、物理组件、动态反馈回路、数据错误传播、重要故障场景、复杂的容错和弹性机制的复杂交互。其次，这些方法仅限于评估一组固定的传统可靠性指标，如一般系统故障概率、故障率、MTTF、MTBF和MTTR。更灵活的模型，如双图误差传播模型(Dual-graph Error Propagation Model, DEPM)可以克服这些限制，但没有可用的工具。本文介绍了第一个基于depm的开源分析工具OpenErrorPro。DEPM是一种正式的随机模型，用于捕获可执行系统组件的控制和数据流结构以及与可靠性相关的属性。OpenErrorPro中的数值分析是基于自动生成马尔可夫链模型和利用现代概率模型检查(PMC)技术。PMC支持对高度可定制的弹性指标进行分析，例如:“在规定的时间间隔内，在规定的系统故障后系统恢复的概率”，除了传统的可靠性指标。depm可以从Simulink/Stateflow、UML/SysML和AADL模型以及使用LLVM的软件组件的源代码中自动生成。这不仅允许基于模型的自动化评估，还允许使用几个建模范例组合开发的系统分析。该工具的主要目的是缩小传统系统设计模型与先进分析方法之间的差距，从而使系统可靠性工程师能够轻松、自动地充分利用PMC技术的潜力。最后，OpenErrorPro支持针对已经在DEPM级别的底层马尔可夫模型的状态空间爆炸的几种有效优化应用程序，在DEPM级别中，控制和数据流结构等系统语义是可访问的。

{"title":"OpenErrorPro: A New Tool for Stochastic Model-Based Reliability and Resilience Analysis","authors":"A. Morozov, K. Ding, Mikael Steurer, K. Janschek","doi":"10.1109/ISSRE.2019.00038","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00038","url":null,"abstract":"Increasing complexity and heterogeneity of modern safety-critical systems require advanced tools for quantitative reliability analysis. Most of the available analytical software exploits classical methods such as event trees, static and dynamic fault trees, reliability block diagrams, simple Bayesian networks, and Markov chains. First, these methods fail to adequately model complex interaction of software, hardware, physical components, dynamic feedback loops, propagation of data errors, nontrivial failure scenarios, sophisticated fault tolerance, and resilience mechanisms. Second, these methods are limited to the evaluation of the fixed set of traditional reliability metrics such as the probability of generic system failure, failure rate, MTTF, MTBF, and MTTR. More flexible models, such as the Dual-graph Error Propagation Model (DEPM) can overcome these limitations but have no available tools. This paper introduces the first open-source DEPM-based analytical software tool OpenErrorPro. The DEPM is a formal stochastic model that captures control and data flow structures and reliability-related properties of executable system components. The numerical analysis in OpenErrorPro is based on the automatic generation of Markov chain models and the utilization of modern Probabilistic Model Checking (PMC) techniques. The PMC enables the analysis of highly-customizable resilience metrics, e.g. \"the probability of system recovery after a specified system failure during the defined time interval\", in addition to the traditional reliability metrics. DEPMs can be automatically generated from Simulink/Stateflow, UML/SysML, and AADL models, as well as source code of software components using LLVM. This allows not only the automated model-based evaluation but also the analysis of systems developed using the combination of several modeling paradigms. The key purpose of the tool is to close the gap between the conventional system design models and advanced analytical methods in order to give system reliability engineers easy and automated access to the full potential of PMC techniques. Finally, OpenErrorPro enables the application of several effective optimizations against the state space explosion of underlying Markov models already in the DEPM level where the system semantics such as control and data flow structures are accessible.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124240576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Amplifying Integration Tests with CAMP 使用CAMP放大集成测试

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00036

Franck Chauvel, Brice Morin, Enrique Garcia-Ceja

Modern software systems interact with multiple 3rd party dependencies such as the OS file system, libraries, databases or remote services. To verify these interactions, developers write so-called "integration tests" that exercise the software within a specific environment. These tests are not only difficult to write as their environment is complicated, but they are also brittle because changes outside the code (i.e., in the environment) might make them fail unexpectedly. Integration tests are thus underused whereas they could help find many more issues. We hence propose CAMP, a tool that amplifies an existing integration test by exploring variations of the given environment. The tests that CAMP generates alter the services orchestration, the software stacks, the individual components' configuration or any combination thereof. We used CAMP to amplify tests from the Sphinx and Atom open-source projects, and in both cases, we were able to spot undocumented issues related to incompatible environments.

现代软件系统与多个第三方依赖关系交互，如操作系统文件系统、库、数据库或远程服务。为了验证这些交互，开发人员编写所谓的“集成测试”，在特定的环境中测试软件。这些测试不仅很难编写，因为它们的环境很复杂，而且它们也很脆弱，因为代码外部的更改(即，在环境中)可能会使它们意外失败。因此，集成测试没有得到充分利用，而它们可以帮助发现更多的问题。因此，我们提出了CAMP，这是一个通过探索给定环境的变化来放大现有集成测试的工具。CAMP生成的测试会改变服务编排、软件堆栈、单个组件的配置或它们的任何组合。我们使用CAMP来放大来自Sphinx和Atom开源项目的测试，在这两种情况下，我们都能够发现与不兼容环境相关的未记录问题。

引用次数: 3

HiRec: API Recommendation using Hierarchical Context HiRec:使用分层上下文的API推荐

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00044

Rensong Xie, Xianglong Kong, Lulu Wang, Ying Zhou, Bixin Li

Context-aware API recommendation techniques aim to generate a ranked list of candidate APIs on an editing position during development. The basic context used in traditional API recommendation mainly focuses on the APIs from third-party libraries, limit or even ignore the usage of project-specific code. The limited usage of project-specific code may result in the lack of context information, and degrade the effectiveness of API recommendation. To address this problem, we introduce a novel type of context, i.e., hierarchical context, which can leverage the hidden information of project-specific code by analyzing the call graph. In hierarchical context, a project-specific API is presented as a sequence of low-leveled APIs from third-party libraries. We propose an approach, i.e., HiRec, which builds on the basis of hierarchical context. HiRec is evaluated on 108 projects and the results show that HiRec can obtain much more accurate results than all the other selected approaches in terms of top-5 and top-10 accuracy due to the strong ability of context representation. And HiRec performs closely to the outstanding tools in terms of top-1 accuracy. The average time of recommending execution is less than 1 seconds in most cases, which is acceptable for interaction in an IDE. Unlike current approaches, the effectiveness of HiRec is not impacted much by editing positions. And we can obtain more accurate results from HiRec with larger sizes of training data and hierarchical context.

上下文感知API推荐技术的目标是在开发过程中生成候选API的排序列表。传统API推荐中使用的基本上下文主要关注第三方库的API，限制甚至忽略了特定于项目的代码的使用。有限地使用特定于项目的代码可能会导致缺乏上下文信息，并降低API推荐的有效性。为了解决这个问题，我们引入了一种新的上下文类型，即层次上下文，它可以通过分析调用图来利用项目特定代码的隐藏信息。在分层上下文中，特定于项目的API表示为来自第三方库的一系列低级API。我们提出了一种方法，即HiRec，它建立在分层上下文的基础上。在108个项目中对HiRec进行了评估，结果表明，由于具有较强的上下文表示能力，HiRec在前5名和前10名的准确率方面比其他所有选择的方法都要准确得多。在准确性方面，HiRec的表现与其他优秀工具非常接近。在大多数情况下，建议执行的平均时间少于1秒，这对于IDE中的交互是可以接受的。与目前的方法不同，HiRec的有效性不受编辑职位的影响。并且我们可以在更大的训练数据和层次上下文中获得更准确的结果。

{"title":"HiRec: API Recommendation using Hierarchical Context","authors":"Rensong Xie, Xianglong Kong, Lulu Wang, Ying Zhou, Bixin Li","doi":"10.1109/ISSRE.2019.00044","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00044","url":null,"abstract":"Context-aware API recommendation techniques aim to generate a ranked list of candidate APIs on an editing position during development. The basic context used in traditional API recommendation mainly focuses on the APIs from third-party libraries, limit or even ignore the usage of project-specific code. The limited usage of project-specific code may result in the lack of context information, and degrade the effectiveness of API recommendation. To address this problem, we introduce a novel type of context, i.e., hierarchical context, which can leverage the hidden information of project-specific code by analyzing the call graph. In hierarchical context, a project-specific API is presented as a sequence of low-leveled APIs from third-party libraries. We propose an approach, i.e., HiRec, which builds on the basis of hierarchical context. HiRec is evaluated on 108 projects and the results show that HiRec can obtain much more accurate results than all the other selected approaches in terms of top-5 and top-10 accuracy due to the strong ability of context representation. And HiRec performs closely to the outstanding tools in terms of top-1 accuracy. The average time of recommending execution is less than 1 seconds in most cases, which is acceptable for interaction in an IDE. Unlike current approaches, the effectiveness of HiRec is not impacted much by editing positions. And we can obtain more accurate results from HiRec with larger sizes of training data and hierarchical context.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"32 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127984137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Identifying Crashing Fault Residence Based on Cross Project Model 基于交叉项目模型的碰撞断层居住地识别

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00027

Zhou Xu, Tao Zhang, Yifeng Zhang, Yutian Tang, Jin Liu, Xiapu Luo, J. Keung, Xiaohui Cui

Analyzing the crash reports recorded upon software crashes is a critical activity for software quality assurance. Predicting whether or not the fault causing the crash (crashing fault for short) resides in the stack traces of crash reports can speed-up the program debugging process and determine the priority of the debugging efforts. Previous work mostly collected label information from bug-fixing logs, and extracted crash features from stack traces and source code to train classification models for the Identification of Crashing Fault Residence (ICFR) of newly-submitted crashes. However, labeled data are not always fully available in real applications. Hence the classifier training is not always feasible. In this work, we make the first attempt to develop a cross project ICFR model to address the data scarcity problem. This is achieved by transferring the knowledge from external projects to the current project via utilizing a state-of-the-art Balanced Distribution Adaptation (BDA) based transfer learning method. BDA not only combines both marginal distribution and conditional distribution across projects but also assigns adaptive weights to the two distributions for better adjusting specific cross project pair. The experiments on 7 software projects show that BDA is superior to 9 baseline methods in terms of 6 indicators overall.

分析记录在软件崩溃上的崩溃报告是软件质量保证的关键活动。预测导致崩溃的故障(简称崩溃故障)是否存在于崩溃报告的堆栈跟踪中，可以加快程序调试过程并确定调试工作的优先级。以前的工作主要是从bug修复日志中收集标签信息，从堆栈跟踪和源代码中提取崩溃特征，训练分类模型，用于识别新提交的崩溃故障驻留(ICFR)。然而，在实际应用程序中，标记数据并不总是完全可用的。因此，分类器的训练并不总是可行的。在这项工作中，我们首次尝试开发一个跨项目的ICFR模型来解决数据稀缺问题。这是通过利用最先进的基于平衡分布适应(BDA)的迁移学习方法，将知识从外部项目转移到当前项目来实现的。BDA不仅结合了项目间的边际分布和条件分布，而且为这两种分布分配了自适应权值，以便更好地调整特定的跨项目对。在7个软件项目上的实验表明，BDA在6个指标上总体上优于9种基线方法。

{"title":"Identifying Crashing Fault Residence Based on Cross Project Model","authors":"Zhou Xu, Tao Zhang, Yifeng Zhang, Yutian Tang, Jin Liu, Xiapu Luo, J. Keung, Xiaohui Cui","doi":"10.1109/ISSRE.2019.00027","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00027","url":null,"abstract":"Analyzing the crash reports recorded upon software crashes is a critical activity for software quality assurance. Predicting whether or not the fault causing the crash (crashing fault for short) resides in the stack traces of crash reports can speed-up the program debugging process and determine the priority of the debugging efforts. Previous work mostly collected label information from bug-fixing logs, and extracted crash features from stack traces and source code to train classification models for the Identification of Crashing Fault Residence (ICFR) of newly-submitted crashes. However, labeled data are not always fully available in real applications. Hence the classifier training is not always feasible. In this work, we make the first attempt to develop a cross project ICFR model to address the data scarcity problem. This is achieved by transferring the knowledge from external projects to the current project via utilizing a state-of-the-art Balanced Distribution Adaptation (BDA) based transfer learning method. BDA not only combines both marginal distribution and conditional distribution across projects but also assigns adaptive weights to the two distributions for better adjusting specific cross project pair. The experiments on 7 software projects show that BDA is superior to 9 baseline methods in terms of 6 indicators overall.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130827481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Engineering a Better Fuzzer with Synergically Integrated Optimizations 设计一个更好的模糊与协同集成优化

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00018

Jie Liang, Yuanliang Chen, Mingzhe Wang, Yu Jiang, Z. Yang, Chengnian Sun, Xun Jiao, Jiaguang Sun

State-of-the-art fuzzers implement various optimizations to enhance their performance. As the optimizations reside in different stages such as input seed selection and mutation, it is tempting to combine the optimizations in different stages. However, our initial attempts demonstrate that naive combination actually worsens the performance, which explains that most optimizations are still isolated by stages and metrics. In this paper, we present InteFuzz, the first framework that synergically integrates multiple fuzzing optimizations. We analyze the root cause for performance degradation in naive combination, and discover optimizations conflict in coverage criteria and optimization granularity. To resolve the conflicts, we propose a novel priority-based scheduling mechanism. The dynamic integration considers both branch-based and block-based coverage feedbacks that are used by most fuzzing optimizations. In our evaluation, we extract four optimizations from popular fuzzers such as AFLFast and FairFuzz and compare InteFuzz against naive combinations. The evaluation results show that InteFuzz outperforms the naive combination by 29% and 26% in path-and branch-coverage. Additionally, InteFuzz triggers 222 more unique crashes, and discovers 33 zero-day vulnerabilities in real-world projects with 12 registered as CVEs.

最先进的fuzzers实现各种优化以提高其性能。由于优化存在于输入种子选择和突变等不同阶段，因此很容易将不同阶段的优化组合在一起。然而，我们最初的尝试表明，天真的组合实际上会使性能恶化，这解释了大多数优化仍然是由阶段和指标隔离的。在本文中，我们提出了InteFuzz，这是第一个协同集成多种模糊优化的框架。我们分析了朴素组合中性能下降的根本原因，发现了覆盖标准和优化粒度的优化冲突。为了解决这些冲突，我们提出了一种新的基于优先级的调度机制。动态集成考虑了大多数模糊优化所使用的基于分支和基于块的覆盖反馈。在我们的评估中，我们从流行的fuzzers(如AFLFast和FairFuzz)中提取了四种优化，并将InteFuzz与幼稚组合进行比较。评估结果表明，在路径和分支覆盖方面，InteFuzz比朴素组合分别高出29%和26%。此外，InteFuzz还触发了222个独特的崩溃，并在实际项目中发现了33个零日漏洞，其中12个注册为cve。

{"title":"Engineering a Better Fuzzer with Synergically Integrated Optimizations","authors":"Jie Liang, Yuanliang Chen, Mingzhe Wang, Yu Jiang, Z. Yang, Chengnian Sun, Xun Jiao, Jiaguang Sun","doi":"10.1109/ISSRE.2019.00018","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00018","url":null,"abstract":"State-of-the-art fuzzers implement various optimizations to enhance their performance. As the optimizations reside in different stages such as input seed selection and mutation, it is tempting to combine the optimizations in different stages. However, our initial attempts demonstrate that naive combination actually worsens the performance, which explains that most optimizations are still isolated by stages and metrics. In this paper, we present InteFuzz, the first framework that synergically integrates multiple fuzzing optimizations. We analyze the root cause for performance degradation in naive combination, and discover optimizations conflict in coverage criteria and optimization granularity. To resolve the conflicts, we propose a novel priority-based scheduling mechanism. The dynamic integration considers both branch-based and block-based coverage feedbacks that are used by most fuzzing optimizations. In our evaluation, we extract four optimizations from popular fuzzers such as AFLFast and FairFuzz and compare InteFuzz against naive combinations. The evaluation results show that InteFuzz outperforms the naive combination by 29% and 26% in path-and branch-coverage. Additionally, InteFuzz triggers 222 more unique crashes, and discovers 33 zero-day vulnerabilities in real-world projects with 12 registered as CVEs.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117041356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

How Do Developers Act on Static Analysis Alerts? An Empirical Study of Coverity Usage 开发者如何处理静态分析警报?Coverity用法的实证研究

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00040

Nasif Imtiaz, Brendan Murphy, L. Williams

Static analysis tools (SATs) often fall short of developer satisfaction despite their many benefits. An understanding of how developers in the real-world act on the alerts detected by SATs can help improve the utility of these tools and determine future research directions. The goal of this paper is to aid researchers and tool makers in improving the utility of static analysis tools through an empirical study of developer action on the alerts detected by Coverity, a state-of-the-art static analysis tool. In this paper, we analyze five open source projects as case studies (Linux, Firefox, Samba, Kodi, and Ovirt-engine) that have been actively using Coverity over a period of at least five years. We investigate the alert occurrences and developer triage of the alerts from the Coverity database; identify the alerts that were fixed through code changes (i.e. actionable) by mining the commit history of the projects; analyze the time an alert remain in the code base (i.e. lifespan) and the complexity of code changes (i.e. fix complexity) in fixing the alert. We find that 27.4% to 49.5% (median: 36.7%) of the alerts are actionable across projects, a rate higher than previously reported. We also find that the fixes of Coverity alerts are generally low in complexity (2 to 7 lines of code changes in the affected file, median: 4). However, developers still take from 36 to 245 days (median: 96) to fix these alerts. Finally, our data suggest that severity and fix complexity may correlate with an alert's lifespan in some of the projects.

静态分析工具(sat)尽管有很多好处，但往往不能满足开发人员的需求。了解现实世界中的开发人员如何对sat检测到的警报采取行动，有助于提高这些工具的实用性，并确定未来的研究方向。本文的目标是通过对开发人员对Coverity(一种最先进的静态分析工具)检测到的警报的行为进行实证研究，帮助研究人员和工具制造商改进静态分析工具的效用。在本文中，我们分析了五个开源项目作为案例研究(Linux、Firefox、Samba、Kodi和Ovirt-engine)，这些项目在至少五年的时间里一直在积极使用Coverity。我们调查来自Coverity数据库的警报事件和开发人员对警报的分类;通过挖掘项目的提交历史，识别通过代码更改(即可操作)修复的警报;分析警报在代码库中保留的时间(即生命周期)和修复警报时代码更改的复杂性(即修复复杂性)。我们发现27.4%到49.5%(中位数:36.7%)的警报是跨项目可操作的，这一比例高于之前的报告。我们还发现，修复Coverity警报的复杂性通常较低(受影响文件中的2到7行代码更改，中位数:4)。然而，开发人员仍然需要36到245天(中位数:96)来修复这些警报。最后，我们的数据表明，在某些项目中，严重性和修复复杂性可能与警报的生命周期相关。

{"title":"How Do Developers Act on Static Analysis Alerts? An Empirical Study of Coverity Usage","authors":"Nasif Imtiaz, Brendan Murphy, L. Williams","doi":"10.1109/ISSRE.2019.00040","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00040","url":null,"abstract":"Static analysis tools (SATs) often fall short of developer satisfaction despite their many benefits. An understanding of how developers in the real-world act on the alerts detected by SATs can help improve the utility of these tools and determine future research directions. The goal of this paper is to aid researchers and tool makers in improving the utility of static analysis tools through an empirical study of developer action on the alerts detected by Coverity, a state-of-the-art static analysis tool. In this paper, we analyze five open source projects as case studies (Linux, Firefox, Samba, Kodi, and Ovirt-engine) that have been actively using Coverity over a period of at least five years. We investigate the alert occurrences and developer triage of the alerts from the Coverity database; identify the alerts that were fixed through code changes (i.e. actionable) by mining the commit history of the projects; analyze the time an alert remain in the code base (i.e. lifespan) and the complexity of code changes (i.e. fix complexity) in fixing the alert. We find that 27.4% to 49.5% (median: 36.7%) of the alerts are actionable across projects, a rate higher than previously reported. We also find that the fixes of Coverity alerts are generally low in complexity (2 to 7 lines of code changes in the affected file, median: 4). However, developers still take from 36 to 245 days (median: 96) to fix these alerts. Finally, our data suggest that severity and fix complexity may correlate with an alert's lifespan in some of the projects.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131059415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

MPro: Combining Static and Symbolic Analysis for Scalable Testing of Smart Contract MPro:结合静态和符号分析的智能合约可扩展测试

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00052

William Zhang, Sebastian Banescu, Leodardo Pasos, Steven T. Stewart, Vijay Ganesh

Smart contracts are executable programs that enable the building of a programmable trust mechanism between multiple entities without the need of a trusted third-party. At the time of this writing, there were over 10 million smart contracts deployed on the Ethereum networks and this number continues to grow at a rapid pace. Smart contracts are often written in a Turing-complete programming language called Solidity, which is not easy to audit for subtle errors. Further, since smart contracts are immutable, errors have led to attacks resulting in losses of cryptocurrency worth 100s of millions of USD and reputational damage. Unfortunately, manual security analyses do not scale with size and number of smart contracts. Automated and scalable mechanisms are essential if smart contracts are to gain mainstream acceptance. Researchers have developed several security scanners in the past couple of years. However, many of these analyzer either do not scale well, or if they do, produce many false positives. This issue is exacerbated when bugs are triggered only after a series of interactions with the functions of the contract-under-test. A depth-n vulnerability, refers to a vulnerability that requires invoking a specific sequence of n functions to trigger. Depth-n vulnerabilities are time-consuming to detect by existing automated analyzers, because of the combinatorial explosion of sequences of functions that could be executed on smart contracts. In this paper, we present a technique to analyze depth-n vulnerabilities in an efficient and scalable way by combining symbolic execution and data dependency analysis. A significant advantage of combining symbolic with static analysis is that it scales much better than symbolic alone and does not have the problem of false positive that static analysis tools typically have. We have implemented our technique in a tool called MPro, a scalable and automated smart contract analyzer based on the existing symbolic analysis tool Mythril-Classic and the static analysis tool Slither. We analyzed 100 randomly chosen smart contracts on MPro and our evaluation shows that MPro is about n-times faster than Mythril-Classic for detecting depth-n vulnerabilities, while preserving all the detection capabilities of Mythril-Classic.

智能合约是可执行的程序，它可以在多个实体之间建立可编程的信任机制，而不需要可信的第三方。在撰写本文时，以太坊网络上部署了超过1000万个智能合约，而且这个数字还在快速增长。智能合约通常是用一种叫做Solidity的图灵完备编程语言编写的，这种语言不容易审计细微的错误。此外，由于智能合约是不可变的，错误会导致攻击，导致价值数亿美元的加密货币损失和声誉受损。不幸的是，人工安全分析不能随智能合约的大小和数量而扩展。如果智能合约要获得主流认可，自动化和可扩展的机制是必不可少的。在过去的几年里，研究人员开发了几种安全扫描仪。然而，这些分析器中的许多要么不能很好地扩展，要么即使扩展了，也会产生许多误报。当只有在与被测合约的功能进行一系列交互之后才触发bug时，这个问题就会加剧。深度n漏洞，是指需要调用特定序列的n个函数才能触发的漏洞。现有的自动化分析器检测深度n漏洞非常耗时，因为可以在智能合约上执行的函数序列的组合爆炸。在本文中，我们提出了一种结合符号执行和数据依赖分析的技术，以有效和可扩展的方式分析深度n漏洞。将符号分析与静态分析相结合的一个显著优势是，它的可伸缩性比单独使用符号分析好得多，并且没有静态分析工具通常存在的误报问题。我们已经在一个名为MPro的工具中实现了我们的技术，MPro是一个可扩展的自动化智能合约分析器，基于现有的符号分析工具Mythril-Classic和静态分析工具Slither。我们在MPro上分析了100个随机选择的智能合约，我们的评估表明，MPro在检测深度n漏洞方面比Mythril-Classic快n倍，同时保留了Mythril-Classic的所有检测功能。

{"title":"MPro: Combining Static and Symbolic Analysis for Scalable Testing of Smart Contract","authors":"William Zhang, Sebastian Banescu, Leodardo Pasos, Steven T. Stewart, Vijay Ganesh","doi":"10.1109/ISSRE.2019.00052","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00052","url":null,"abstract":"Smart contracts are executable programs that enable the building of a programmable trust mechanism between multiple entities without the need of a trusted third-party. At the time of this writing, there were over 10 million smart contracts deployed on the Ethereum networks and this number continues to grow at a rapid pace. Smart contracts are often written in a Turing-complete programming language called Solidity, which is not easy to audit for subtle errors. Further, since smart contracts are immutable, errors have led to attacks resulting in losses of cryptocurrency worth 100s of millions of USD and reputational damage. Unfortunately, manual security analyses do not scale with size and number of smart contracts. Automated and scalable mechanisms are essential if smart contracts are to gain mainstream acceptance. Researchers have developed several security scanners in the past couple of years. However, many of these analyzer either do not scale well, or if they do, produce many false positives. This issue is exacerbated when bugs are triggered only after a series of interactions with the functions of the contract-under-test. A depth-n vulnerability, refers to a vulnerability that requires invoking a specific sequence of n functions to trigger. Depth-n vulnerabilities are time-consuming to detect by existing automated analyzers, because of the combinatorial explosion of sequences of functions that could be executed on smart contracts. In this paper, we present a technique to analyze depth-n vulnerabilities in an efficient and scalable way by combining symbolic execution and data dependency analysis. A significant advantage of combining symbolic with static analysis is that it scales much better than symbolic alone and does not have the problem of false positive that static analysis tools typically have. We have implemented our technique in a tool called MPro, a scalable and automated smart contract analyzer based on the existing symbolic analysis tool Mythril-Classic and the static analysis tool Slither. We analyzed 100 randomly chosen smart contracts on MPro and our evaluation shows that MPro is about n-times faster than Mythril-Classic for detecting depth-n vulnerabilities, while preserving all the detection capabilities of Mythril-Classic.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123137750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

ISSRE 2019 Organizing Committee ISSRE 2019组委会

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/issre.2019.00007

引用次数: 0

The Impact of Data Preparation on the Fairness of Software Systems 数据准备对软件系统公平性的影响

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00046

Inês Valentim, Nuno Lourenço, Nuno Antunes

Machine learning models are widely adopted in scenarios that directly affect people. The development of software systems based on these models raises societal and legal concerns, as their decisions may lead to the unfair treatment of individuals based on attributes like race or gender. Data preparation is key in any machine learning pipeline, but its effect on fairness is yet to be studied in detail. In this paper, we evaluate how the fairness and effectiveness of the learned models are affected by the removal of the sensitive attribute, the encoding of the categorical attributes, and instance selection methods (including cross-validators and random undersampling). We used the Adult Income and the German Credit Data datasets, which are widely studied and known to have fairness concerns. We applied each data preparation technique individually to analyse the difference in predictive performance and fairness, using statistical parity difference, disparate impact, and the normalised prejudice index. The results show that fairness is affected by transformations made to the training data, particularly in imbalanced datasets. Removing the sensitive attribute is insufficient to eliminate all the unfairness in the predictions, as expected, but it is key to achieve fairer models. Additionally, the standard random undersampling with respect to the true labels is sometimes more prejudicial than performing no random undersampling.

机器学习模型被广泛应用于直接影响人类的场景中。基于这些模型的软件系统的开发引起了社会和法律的关注，因为它们的决定可能导致基于种族或性别等属性的个人不公平待遇。数据准备是任何机器学习管道的关键，但其对公平性的影响尚未得到详细研究。在本文中，我们评估了敏感属性的去除、分类属性的编码和实例选择方法(包括交叉验证器和随机欠抽样)对学习模型的公平性和有效性的影响。我们使用了成人收入和德国信用数据数据集，这些数据集被广泛研究，并且已知具有公平性问题。我们分别应用每种数据准备技术来分析预测性能和公平性的差异，使用统计宇称差异、差异性影响和标准化偏见指数。结果表明，公平性受到对训练数据的转换的影响，特别是在不平衡的数据集中。去除敏感属性不足以消除预测中的所有不公平，但它是实现更公平模型的关键。此外，相对于真实标签的标准随机欠采样有时比不进行随机欠采样更有偏见。

{"title":"The Impact of Data Preparation on the Fairness of Software Systems","authors":"Inês Valentim, Nuno Lourenço, Nuno Antunes","doi":"10.1109/ISSRE.2019.00046","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00046","url":null,"abstract":"Machine learning models are widely adopted in scenarios that directly affect people. The development of software systems based on these models raises societal and legal concerns, as their decisions may lead to the unfair treatment of individuals based on attributes like race or gender. Data preparation is key in any machine learning pipeline, but its effect on fairness is yet to be studied in detail. In this paper, we evaluate how the fairness and effectiveness of the learned models are affected by the removal of the sensitive attribute, the encoding of the categorical attributes, and instance selection methods (including cross-validators and random undersampling). We used the Adult Income and the German Credit Data datasets, which are widely studied and known to have fairness concerns. We applied each data preparation technique individually to analyse the difference in predictive performance and fairness, using statistical parity difference, disparate impact, and the normalised prejudice index. The results show that fairness is affected by transformations made to the training data, particularly in imbalanced datasets. Removing the sensitive attribute is insufficient to eliminate all the unfairness in the predictions, as expected, but it is key to achieve fairer models. Additionally, the standard random undersampling with respect to the true labels is sometimes more prejudicial than performing no random undersampling.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124884586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Fuzzing Error Handling Code in Device Drivers Based on Software Fault Injection 基于软件故障注入的设备驱动程序模糊错误处理代码

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00022

Zu-Ming Jiang, Jia-Ju Bai, J. Lawall, Shimin Hu

Device drivers remain a main source of runtime failures in operating systems. To detect bugs in device drivers, fuzzing has been commonly used in practice. However, a main limitation of existing fuzzing approaches is that they cannot effectively test error handling code. Indeed, these fuzzing approaches require effective inputs to cover target code, but much error handling code in drivers is triggered by occasional errors (such as insufficient memory and hardware malfunctions) that are not related to inputs. In this paper, based on software fault injection, we propose a new fuzzing approach named FIZZER, to test error handling code in device drivers. At compile time, FIZZER uses static analysis to recommend possible error sites that can trigger error handling code. During driver execution, by analyzing runtime information, it automatically fuzzes error-site sequences for fault injection to improve code coverage. We evaluate FIZZER on 18 device drivers in Linux 4.19, and in total find 22 real bugs. The code coverage is increased by over 15% compared to normal execution without fuzzing.

设备驱动程序仍然是操作系统运行时故障的主要来源。为了检测设备驱动程序中的错误，模糊测试在实践中被广泛使用。然而，现有模糊测试方法的一个主要限制是它们不能有效地测试错误处理代码。实际上，这些模糊测试方法需要有效的输入来覆盖目标代码，但是驱动程序中的许多错误处理代码是由与输入无关的偶然错误(例如内存不足和硬件故障)触发的。本文在软件故障注入的基础上，提出了一种新的模糊测试方法FIZZER，用于测试设备驱动程序中的错误处理代码。在编译时，FIZZER使用静态分析来推荐可能触发错误处理代码的错误站点。在驱动程序执行过程中，通过分析运行时信息，自动模糊错误注入的错误位点序列，提高代码覆盖率。我们在Linux 4.19的18个设备驱动程序上评估了FIZZER，总共发现了22个真正的错误。与没有模糊测试的正常执行相比，代码覆盖率增加了15%以上。

{"title":"Fuzzing Error Handling Code in Device Drivers Based on Software Fault Injection","authors":"Zu-Ming Jiang, Jia-Ju Bai, J. Lawall, Shimin Hu","doi":"10.1109/ISSRE.2019.00022","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00022","url":null,"abstract":"Device drivers remain a main source of runtime failures in operating systems. To detect bugs in device drivers, fuzzing has been commonly used in practice. However, a main limitation of existing fuzzing approaches is that they cannot effectively test error handling code. Indeed, these fuzzing approaches require effective inputs to cover target code, but much error handling code in drivers is triggered by occasional errors (such as insufficient memory and hardware malfunctions) that are not related to inputs. In this paper, based on software fault injection, we propose a new fuzzing approach named FIZZER, to test error handling code in device drivers. At compile time, FIZZER uses static analysis to recommend possible error sites that can trigger error handling code. During driver execution, by analyzing runtime information, it automatically fuzzes error-site sequences for fault injection to improve code coverage. We evaluate FIZZER on 18 device drivers in Linux 4.19, and in total find 22 real bugs. The code coverage is increased by over 15% compared to normal execution without fuzzing.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129841107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9