2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)最新文献

英文中文

Automated Partitioning of Android Applications for Trusted Execution Environments Android应用程序在可信执行环境下的自动分区

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884817

K. Rubinov, Lucia Rosculete, T. Mitra, Abhik Roychoudhury

The co-existence of critical and non-critical applications on computing devices, such as mobile phones, is becoming commonplace. The sensitive segments of a critical application should be executed in isolation on Trusted Execution Environments (TEE) so that the associated code and data can be protected from malicious applications. TEE is supported by different technologies and platforms, such as ARM Trustzone, that allow logical separation of "secure" and "normal" worlds. We develop an approach for automated partitioning of critical Android applications into "client" code to be run in the "normal" world and "TEE commands" encapsulating the handling of confidential data to be run in the "secure" world. We also reduce the overhead due to transitions between the two worlds by choosing appropriate granularity for the TEE commands. The advantage of our proposed solution is evidenced by efficient partitioning of real-world applications.

关键和非关键应用程序在计算设备(如移动电话)上的共存正变得越来越普遍。关键应用程序的敏感段应该在可信执行环境(TEE)上隔离执行，这样可以保护相关代码和数据免受恶意应用程序的攻击。TEE由不同的技术和平台支持，例如ARM Trustzone，它允许逻辑分离“安全”和“正常”世界。我们开发了一种方法，将关键的Android应用程序自动划分为在“正常”环境中运行的“客户端”代码和在“安全”环境中运行的封装机密数据处理的“TEE命令”。通过为TEE命令选择适当的粒度，我们还减少了由于在两个世界之间转换而产生的开销。我们提出的解决方案的优势通过对实际应用程序的有效分区得到了证明。

引用次数: 59

Finding Security Bugs in Web Applications Using a Catalog of Access Control Patterns 使用访问控制模式目录查找Web应用程序中的安全漏洞

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884836

Joseph P. Near, D. Jackson

We propose a specification-free technique for finding missing security checks in web applications using a catalog of access control patterns in which each pattern models a common access control use case. Our implementation, SPACE, checks that every data exposure allowed by an application's code matches an allowed exposure from a security pattern in our catalog. The only user-provided input is a mapping from application types to the types of the catalog; the rest of the process is entirely automatic. In an evaluation on the 50 most watched Ruby on Rails applications on Github, SPACE reported 33 possible bugs---23 previously unknown security bugs, and 10 false positives.

我们提出了一种无需规范的技术，用于使用访问控制模式目录查找web应用程序中缺失的安全检查，其中每个模式都对一个常见的访问控制用例进行建模。我们的实现SPACE检查应用程序代码允许的每个数据公开是否与我们目录中安全模式允许的公开相匹配。用户提供的唯一输入是从应用程序类型到目录类型的映射;剩下的过程完全是自动的。在对Github上50个最受关注的Ruby on Rails应用程序的评估中，SPACE报告了33个可能的错误——23个以前未知的安全错误，10个误报。

引用次数: 34

Are "Non-functional" Requirements really Non-functional? An Investigation of Non-functional Requirements in Practice “非功能性”需求真的是非功能性的吗?实践中非功能需求的研究

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884788

J. Eckhardt, Andreas Vogelsang, Daniel Méndez Fernández

Non-functional requirements (NFRs) are commonly distinguished from functional requirements by differentiating how the system shall do something in contrast to what the system shall do. This distinction is not only prevalent in research, but also influences how requirements are handled in practice. NFRs are usually documented separately from functional requirements, without quantitative measures, and with relatively vague descriptions.As a result, they remain difficult to analyze and test.Several authors argue, however, that many so-called NFRs actually describe behavioral properties and may be treated the same way as functional requirements. In this paper, we empirically investigate this point of view and aim to increase our understanding on the nature of NFRs addressing system properties. We report on the classification of 530 NFRs extracted from 11 industrial requirements specifications and analyze to which extent these NFRs describe system behavior.Our results suggest that most "non-functional" requirements are not non-functional as they describe behavior of a system. Consequently, we argue that many so-called NFRs can be handled similarly to functional requirements.

非功能需求(NFRs)通常通过区分系统应该如何做某件事和系统应该做什么来与功能需求区分开来。这种区别不仅在研究中普遍存在，而且也影响着在实践中如何处理需求。nfr通常与功能需求分开记录，没有定量度量，并且描述相对模糊。因此，它们仍然难以分析和测试。然而，一些作者认为，许多所谓的NFRs实际上描述的是行为特性，可能与功能需求一样被对待。在本文中，我们对这一观点进行了实证研究，旨在增加我们对NFRs寻址系统属性本质的理解。我们报告了从11个工业需求规范中提取的530个nfr的分类，并分析了这些nfr在多大程度上描述了系统行为。我们的结果表明，大多数“非功能性”需求并不是描述系统行为的非功能性需求。因此，我们认为许多所谓的nfr可以像处理功能需求一样处理。

{"title":"Are \"Non-functional\" Requirements really Non-functional? An Investigation of Non-functional Requirements in Practice","authors":"J. Eckhardt, Andreas Vogelsang, Daniel Méndez Fernández","doi":"10.1145/2884781.2884788","DOIUrl":"https://doi.org/10.1145/2884781.2884788","url":null,"abstract":"Non-functional requirements (NFRs) are commonly distinguished from functional requirements by differentiating how the system shall do something in contrast to what the system shall do. This distinction is not only prevalent in research, but also influences how requirements are handled in practice. NFRs are usually documented separately from functional requirements, without quantitative measures, and with relatively vague descriptions.As a result, they remain difficult to analyze and test.Several authors argue, however, that many so-called NFRs actually describe behavioral properties and may be treated the same way as functional requirements. In this paper, we empirically investigate this point of view and aim to increase our understanding on the nature of NFRs addressing system properties. We report on the classification of 530 NFRs extracted from 11 industrial requirements specifications and analyze to which extent these NFRs describe system behavior.Our results suggest that most \"non-functional\" requirements are not non-functional as they describe behavior of a system. Consequently, we argue that many so-called NFRs can be handled similarly to functional requirements.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"19 1","pages":"832-842"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85095693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 92

Risk-Driven Revision of Requirements Models 风险驱动的需求模型修订

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884838

Dalal Alrajeh, A. V. Lamsweerde, J. Kramer, A. Russo, Sebastián Uchitel

Requirements incompleteness is often the result of unanticipated adverse conditions which prevent the software and its environment from behaving as expected. These conditions represent risks that can cause severe software failures. The identification and resolution of such risks is therefore a crucial step towards requirements completeness. Obstacle analysis is a goal-driven form of risk analysis that aims at detecting missing conditions that can obstruct goals from being satisfied in a given domain, and resolving them. This paper proposes an approach for automatically revising goals that may be under-specified or (partially) wrong to resolve obstructions in a given domain. The approach deploys a learning-based revision methodology in which obstructed goals in a goal model are iteratively revised from traces exemplifying obstruction and non-obstruction occurrences. Our revision methodology computes domain-consistent, obstruction-free revisions that are automatically propagated to other goals in the model in order to preserve the correctness of goal models whilst guaranteeing minimal change to the original model. We present the formal foundations of our learning-based approach, and show that it preserves the properties of our formal framework. We validate it against the benchmarking case study of the London Ambulance Service.

需求不完整通常是由于未预料到的不利条件导致的，这些条件阻止了软件及其环境按照预期的方式运行。这些情况代表了可能导致严重软件故障的风险。因此，识别和解决这些风险是实现需求完整性的关键步骤。障碍分析是一种目标驱动的风险分析形式，其目的是检测可能阻碍在给定领域实现目标的缺失条件，并解决这些条件。本文提出了一种自动修正可能不明确或(部分)错误的目标的方法，以解决给定领域中的障碍。该方法部署了一种基于学习的修订方法，其中目标模型中受阻的目标从举例说明障碍和非障碍发生的痕迹中迭代修订。我们的修订方法计算领域一致的、无阻碍的修订，这些修订自动传播到模型中的其他目标，以保持目标模型的正确性，同时保证对原始模型的最小更改。我们提出了基于学习的方法的正式基础，并表明它保留了我们的正式框架的属性。我们对伦敦救护车服务的基准案例研究进行了验证。

{"title":"Risk-Driven Revision of Requirements Models","authors":"Dalal Alrajeh, A. V. Lamsweerde, J. Kramer, A. Russo, Sebastián Uchitel","doi":"10.1145/2884781.2884838","DOIUrl":"https://doi.org/10.1145/2884781.2884838","url":null,"abstract":"Requirements incompleteness is often the result of unanticipated adverse conditions which prevent the software and its environment from behaving as expected. These conditions represent risks that can cause severe software failures. The identification and resolution of such risks is therefore a crucial step towards requirements completeness. Obstacle analysis is a goal-driven form of risk analysis that aims at detecting missing conditions that can obstruct goals from being satisfied in a given domain, and resolving them. This paper proposes an approach for automatically revising goals that may be under-specified or (partially) wrong to resolve obstructions in a given domain. The approach deploys a learning-based revision methodology in which obstructed goals in a goal model are iteratively revised from traces exemplifying obstruction and non-obstruction occurrences. Our revision methodology computes domain-consistent, obstruction-free revisions that are automatically propagated to other goals in the model in order to preserve the correctness of goal models whilst guaranteeing minimal change to the original model. We present the formal foundations of our learning-based approach, and show that it preserves the properties of our formal framework. We validate it against the benchmarking case study of the London Ambulance Service.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"20 1","pages":"855-865"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84035705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Quality Experience: A Grounded Theory of Successful Agile Projects without Dedicated Testers 质量经验:没有专门测试人员的成功敏捷项目的基础理论

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884789

L. Prechelt, H. Schmeisky, Franz Zieris

Context: While successful conventional software development regularly employs separate testing staff, there are successful agile teams with as well as without separate testers. Question: How does successful agile development work without separate testers? What are advantages and disadvantages? Method: A case study, based on Grounded Theory evaluation of interviews and direct observation of three agile teams; one having separate testers, two without. All teams perform long-term development of parts of e-business web portals. Results: Teams without testers use a quality experience work mode centered around a tight field-use feedback loop, driven by a feeling of responsibility, supported by test automation, resulting in frequent deployments. Conclusion: In the given domain, hand-overs to separate testers appear to hamper the feedback loop more than they contribute to quality, so working without testers is preferred. However, Quality Experience is achievable only with modular architectures and in suitable domains.

背景:虽然成功的传统软件开发经常使用单独的测试人员，但是成功的敏捷团队有或没有单独的测试人员。问题:如果没有独立的测试人员，成功的敏捷开发如何工作?优点和缺点是什么?方法:运用扎根理论对三个敏捷团队进行访谈评价和直接观察，以案例研究为基础;一个有独立的测试人员，两个没有。所有团队都执行电子商务门户网站部分的长期开发。结果:没有测试人员的团队使用以紧密的现场使用反馈循环为中心的质量体验工作模式，由责任感驱动，由测试自动化支持，导致频繁的部署。结论:在给定的领域中，移交给单独的测试人员似乎阻碍了反馈循环，而不是对质量的贡献，所以在没有测试人员的情况下工作更可取。然而，质量体验只有在模块化架构和合适的领域才能实现。

引用次数: 19

PRADA: Prioritizing Android Devices for Apps by Mining Large-Scale Usage Data PRADA:通过挖掘大规模使用数据为应用程序在Android设备上排序

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884828

Xuan Lu, Xuanzhe Liu, Huoran Li, Tao Xie, Q. Mei, Dan Hao, Gang Huang, Feng Feng

Selecting and prioritizing major device models are critical for mobile app developers to select testbeds and optimize resources such as marketing and quality-assurance resources. The heavily fragmented distribution of Android devices makes it challenging to select a few major device models out of thousands of models available on the market. Currently app developers usually rely on some reported or estimated general market share of device models. However, these estimates can be quite inaccurate, and more problematically, can be irrelevant to the particular app under consideration. To address this issue, we propose PRADA, the first approach to prioritizing Android device models for individual apps, based on mining large-scale usage data. PRADA adapts the concept of operational profiling (popularly used in software reliability engineering) for mobile apps – the usage of an app on a specific device model reflects the importance of that device model for the app. PRADA includes a collaborative filtering technique to predict the usage of an app on different device models, even if the app is entirely new (without its actual usage in the market yet), based on the usage data of a large collection of apps. We empirically demonstrate the effectiveness of PRADA over two popular app categories, i.e., Game and Media, covering over 3.86 million users and 14,000 device models collected through a leading Android management app in China.

选择和优先考虑主要设备模型对于手机应用开发者选择测试平台和优化资源(如营销和质量保证资源)至关重要。Android设备的分布非常分散，这使得我们很难从市场上数千种可用的设备中选择几款主要的设备。目前，应用开发者通常依赖于一些报告或估计的设备型号的总体市场份额。然而，这些估计可能相当不准确，更有问题的是，它们可能与所考虑的特定应用无关。为了解决这个问题，我们提出了PRADA，这是第一种基于挖掘大规模使用数据为单个应用程序优先考虑Android设备模型的方法。PRADA将操作分析的概念(在软件可靠性工程中普遍使用)应用于移动应用程序-应用程序在特定设备模型上的使用反映了该设备模型对应用程序的重要性。PRADA包括一种协同过滤技术，以预测应用程序在不同设备模型上的使用情况，即使该应用程序是全新的(尚未在市场上实际使用)，基于大量应用程序的使用数据。我们通过实证证明了PRADA在两个流行的应用类别(即游戏和媒体)上的有效性，涵盖了超过386万用户和通过中国领先的Android管理应用收集的14,000种设备型号。

{"title":"PRADA: Prioritizing Android Devices for Apps by Mining Large-Scale Usage Data","authors":"Xuan Lu, Xuanzhe Liu, Huoran Li, Tao Xie, Q. Mei, Dan Hao, Gang Huang, Feng Feng","doi":"10.1145/2884781.2884828","DOIUrl":"https://doi.org/10.1145/2884781.2884828","url":null,"abstract":"Selecting and prioritizing major device models are critical for mobile app developers to select testbeds and optimize resources such as marketing and quality-assurance resources. The heavily fragmented distribution of Android devices makes it challenging to select a few major device models out of thousands of models available on the market. Currently app developers usually rely on some reported or estimated general market share of device models. However, these estimates can be quite inaccurate, and more problematically, can be irrelevant to the particular app under consideration. To address this issue, we propose PRADA, the first approach to prioritizing Android device models for individual apps, based on mining large-scale usage data. PRADA adapts the concept of operational profiling (popularly used in software reliability engineering) for mobile apps – the usage of an app on a specific device model reflects the importance of that device model for the app. PRADA includes a collaborative filtering technique to predict the usage of an app on different device models, even if the app is entirely new (without its actual usage in the market yet), based on the usage data of a large collection of apps. We empirically demonstrate the effectiveness of PRADA over two popular app categories, i.e., Game and Media, covering over 3.86 million users and 14,000 device models collected through a leading Android management app in China.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"23 1","pages":"3-13"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82162490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Automatically Learning Semantic Features for Defect Prediction 用于缺陷预测的语义特征自动学习

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884804

Song Wang, Taiyue Liu, Lin Tan

Software defect prediction, which predicts defective code regions, can help developers find bugs and prioritize their testing efforts. To build accurate prediction models, previous studies focus on manually designing features that encode the characteristics of programs and exploring different machine learning algorithms. Existing traditional features often fail to capture the semantic differences of programs, and such a capability is needed for building accurate prediction models. To bridge the gap between programs' semantics and defect prediction features, this paper proposes to leverage a powerful representation-learning algorithm, deep learning, to learn semantic representation of programs automatically from source code. Specifically, we leverage Deep Belief Network (DBN) to automatically learn semantic features from token vectors extracted from programs' Abstract Syntax Trees (ASTs). Our evaluation on ten open source projects shows that our automatically learned semantic features significantly improve both within-project defect prediction (WPDP) and cross-project defect prediction (CPDP) compared to traditional features. Our semantic features improve WPDP on average by 14.7% in precision, 11.5% in recall, and 14.2% in F1. For CPDP, our semantic features based approach outperforms the state-of-the-art technique TCA+ with traditional features by 8.9% in F1.

软件缺陷预测，它预测有缺陷的代码区域，可以帮助开发人员发现错误并优先考虑他们的测试工作。为了建立准确的预测模型，以前的研究主要集中在手动设计编码程序特征的特征和探索不同的机器学习算法。现有的传统特征常常不能捕获程序的语义差异，而构建准确的预测模型需要这样的能力。为了弥合程序语义和缺陷预测特征之间的差距，本文提出利用一种强大的表示学习算法——深度学习，从源代码中自动学习程序的语义表示。具体来说，我们利用深度信念网络(DBN)从程序的抽象语法树(ast)中提取的令牌向量中自动学习语义特征。我们对10个开源项目的评估表明，与传统特征相比，我们的自动学习语义特征显著提高了项目内缺陷预测(WPDP)和跨项目缺陷预测(CPDP)。我们的语义特征平均提高了WPDP的精度14.7%，召回率11.5%，F1 14.2%。对于CPDP，我们基于语义特征的方法在F1中比具有传统特征的最先进技术TCA+高出8.9%。

{"title":"Automatically Learning Semantic Features for Defect Prediction","authors":"Song Wang, Taiyue Liu, Lin Tan","doi":"10.1145/2884781.2884804","DOIUrl":"https://doi.org/10.1145/2884781.2884804","url":null,"abstract":"Software defect prediction, which predicts defective code regions, can help developers find bugs and prioritize their testing efforts. To build accurate prediction models, previous studies focus on manually designing features that encode the characteristics of programs and exploring different machine learning algorithms. Existing traditional features often fail to capture the semantic differences of programs, and such a capability is needed for building accurate prediction models. To bridge the gap between programs' semantics and defect prediction features, this paper proposes to leverage a powerful representation-learning algorithm, deep learning, to learn semantic representation of programs automatically from source code. Specifically, we leverage Deep Belief Network (DBN) to automatically learn semantic features from token vectors extracted from programs' Abstract Syntax Trees (ASTs). Our evaluation on ten open source projects shows that our automatically learned semantic features significantly improve both within-project defect prediction (WPDP) and cross-project defect prediction (CPDP) compared to traditional features. Our semantic features improve WPDP on average by 14.7% in precision, 11.5% in recall, and 14.2% in F1. For CPDP, our semantic features based approach outperforms the state-of-the-art technique TCA+ with traditional features by 8.9% in F1.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"11 1","pages":"297-308"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87063263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 537

Comparing White-Box and Black-Box Test Prioritization 白盒和黑盒测试优先级的比较

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884791

Christopher Henard, Mike Papadakis, M. Harman, Yue Jia, Yves Le Traon

Although white-box regression test prioritization has been well-studied, the more recently introduced black-box prioritization approaches have neither been compared against each other nor against more well-established white-box techniques. We present a comprehensive experimental comparison of several test prioritization techniques, including well-established white-box strategies and more recently introduced black-box approaches. We found that Combinatorial Interaction Testing and diversity-based techniques (Input Model Diversity and Input Test Set Diameter) perform best among the black-box approaches. Perhaps surprisingly, we found little difference between black-box and white-box performance (at most 4% fault detection rate difference). We also found the overlap between black- and white-box faults to be high: the first 10% of the prioritized test suites already agree on at least 60% of the faults found. These are positive findings for practicing regression testers who may not have source code available, thereby making white-box techniques inapplicable. We also found evidence that both black-box and white-box prioritization remain robust over multiple system releases.

尽管白盒回归测试优先级已经得到了很好的研究，但最近引入的黑盒优先级方法既没有相互比较，也没有与更完善的白盒技术进行比较。我们提出了几种测试优先级技术的综合实验比较，包括完善的白盒策略和最近引入的黑盒方法。我们发现组合交互测试和基于多样性的技术(输入模型多样性和输入测试集直径)在黑盒方法中表现最好。也许令人惊讶的是，我们发现黑盒和白盒性能之间的差异很小(最多只有4%的故障检测率差异)。我们还发现黑盒和白盒错误之间的重叠是很高的:前10%的优先级测试套件已经在发现的至少60%的错误上达成一致。对于没有可用源代码的回归测试人员来说，这些都是积极的发现，因此使得白盒技术不适用。我们还发现证据表明，黑盒和白盒优先级在多个系统发布中仍然是健壮的。

{"title":"Comparing White-Box and Black-Box Test Prioritization","authors":"Christopher Henard, Mike Papadakis, M. Harman, Yue Jia, Yves Le Traon","doi":"10.1145/2884781.2884791","DOIUrl":"https://doi.org/10.1145/2884781.2884791","url":null,"abstract":"Although white-box regression test prioritization has been well-studied, the more recently introduced black-box prioritization approaches have neither been compared against each other nor against more well-established white-box techniques. We present a comprehensive experimental comparison of several test prioritization techniques, including well-established white-box strategies and more recently introduced black-box approaches. We found that Combinatorial Interaction Testing and diversity-based techniques (Input Model Diversity and Input Test Set Diameter) perform best among the black-box approaches. Perhaps surprisingly, we found little difference between black-box and white-box performance (at most 4% fault detection rate difference). We also found the overlap between black- and white-box faults to be high: the first 10% of the prioritized test suites already agree on at least 60% of the faults found. These are positive findings for practicing regression testers who may not have source code available, thereby making white-box techniques inapplicable. We also found evidence that both black-box and white-box prioritization remain robust over multiple system releases.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"15 1","pages":"523-534"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86222616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 167

Guiding Dynamic Symbolic Execution toward Unverified Program Executions 引导动态符号执行到未经验证的程序执行

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884843

M. Christakis, Peter Müller, Valentin Wüstholz

Most techniques to detect program errors, such as testing, code reviews, and static program analysis, do not fully verify all possible executions of a program. They leave executions unverified when they do not check certain properties, fail to verify properties, or check properties under certain unsound assumptions such as the absence of arithmetic overflow. In this paper, we present a technique to complement partial verification results by automatic test case generation. In contrast to existing work, our technique supports the common case that the verification results are based on unsound assumptions. We annotate programs to reflect which executions have been verified, and under which assumptions. These annotations are then used to guide dynamic symbolic execution toward unverified program executions. Our main technical contribution is a code instrumentation that causes dynamic symbolic execution to abort tests that lead to verified executions, to prune parts of the search space, and to prioritize tests that cover more properties that are not fully verified. We have implemented our technique for the .NET static analyzer Clousot and the dynamic symbolic execution tool Pex. It produces smaller test suites (by up to 19.2%), covers more unverified executions (by up to 7.1%), and reduces testing time (by up to 52.4%) compared to combining Clousot and Pex without our technique.

大多数检测程序错误的技术，如测试、代码审查和静态程序分析，并不能完全验证程序的所有可能的执行。当它们不检查某些属性、无法检查属性或在某些不合理的假设(如没有算术溢出)下检查属性时，它们会使执行未验证。在本文中，我们提出了一种通过自动生成测试用例来补充部分验证结果的技术。与现有的工作相比，我们的技术支持验证结果基于不合理假设的常见情况。我们对程序进行注释，以反映哪些执行已被验证，以及在哪些假设下执行。然后使用这些注释将动态符号执行引导到未经验证的程序执行。我们的主要技术贡献是代码插插，它导致动态符号执行中止导致已验证执行的测试，减少部分搜索空间，并优先考虑覆盖更多未完全验证的属性的测试。我们已经为。net静态分析器Clousot和动态符号执行工具Pex实现了我们的技术。它产生更小的测试套件(最多19.2%)，覆盖更多未经验证的执行(最多7.1%)，并且与没有我们的技术的Clousot和Pex组合相比，减少了测试时间(最多52.4%)。

{"title":"Guiding Dynamic Symbolic Execution toward Unverified Program Executions","authors":"M. Christakis, Peter Müller, Valentin Wüstholz","doi":"10.1145/2884781.2884843","DOIUrl":"https://doi.org/10.1145/2884781.2884843","url":null,"abstract":"Most techniques to detect program errors, such as testing, code reviews, and static program analysis, do not fully verify all possible executions of a program. They leave executions unverified when they do not check certain properties, fail to verify properties, or check properties under certain unsound assumptions such as the absence of arithmetic overflow. In this paper, we present a technique to complement partial verification results by automatic test case generation. In contrast to existing work, our technique supports the common case that the verification results are based on unsound assumptions. We annotate programs to reflect which executions have been verified, and under which assumptions. These annotations are then used to guide dynamic symbolic execution toward unverified program executions. Our main technical contribution is a code instrumentation that causes dynamic symbolic execution to abort tests that lead to verified executions, to prune parts of the search space, and to prioritize tests that cover more properties that are not fully verified. We have implemented our technique for the .NET static analyzer Clousot and the dynamic symbolic execution tool Pex. It produces smaller test suites (by up to 19.2%), covers more unverified executions (by up to 7.1%), and reduces testing time (by up to 52.4%) compared to combining Clousot and Pex without our technique.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"49 1","pages":"144-155"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82730998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 85

Revisit of Automatic Debugging via Human Focus-Tracking Analysis 基于人的焦点跟踪分析的自动调试

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884834

Xiaoyuan Xie, Zicong Liu, Shuo Song, Zhenyu Chen, J. Xuan, Baowen Xu

In many fields of software engineering, studies on human behavior have attracted a lot of attention; however, few such studies exist in automated debugging. Parnin and Orso conducted a pioneering study comparing the performance of programmers in debugging with and without a ranking-based fault localization technique, namely Spectrum-Based Fault Localization (SBFL). In this paper, we revisit the actual helpfulness of SBFL, by addressing some major problems that were not resolved in Parnin and Orso’s study. Our investigation involved 207 participants and 17 debugging tasks. A user-friendly SBFL tool was adopted. It was found that SBFL tended not to be helpful in improving the efficiency of debugging. By tracking and analyzing programmers’ focus of attention, we characterized their source code navigation patterns and provided in-depth explanations to the observations. Results indicated that (1) a short “first scan” on the source code tended to result in inefficient debugging; and (2) inspections on the pinpointed statements during the “follow-up browsing” were normally just quick skimming. Moreover, we found that the SBFL assistance may even slightly weaken programmers’ abilities in fault detection. Our observations imply interference between the mechanism of automated fault localization and the actual assistance needed by programmers in debugging. To resolve this interference, we provide several insights and suggestions.

在软件工程的许多领域中，对人类行为的研究引起了人们的广泛关注;然而，在自动化调试方面，这样的研究很少。Parnin和Orso进行了一项开创性的研究，比较了程序员在使用和不使用基于排名的故障定位技术(即基于频谱的故障定位(SBFL))时的调试性能。在本文中，我们通过解决Parnin和Orso的研究中没有解决的一些主要问题，重新审视了SBFL的实际帮助。我们的调查涉及207名参与者和17个调试任务。采用了用户友好的sffl工具。结果表明，SBFL往往无助于提高调试效率。通过跟踪和分析程序员关注的焦点，我们描述了他们的源代码导航模式，并对观察结果提供了深入的解释。结果表明:(1)对源代码进行短暂的“第一次扫描”往往会导致调试效率低下;(2)在“后续浏览”中对指定报表的检查通常只是快速浏览。此外，我们发现SBFL的帮助甚至会略微削弱程序员的故障检测能力。我们的观察暗示了自动故障定位机制和程序员在调试中需要的实际帮助之间的干扰。为了解决这种干扰，我们提供了一些见解和建议。

{"title":"Revisit of Automatic Debugging via Human Focus-Tracking Analysis","authors":"Xiaoyuan Xie, Zicong Liu, Shuo Song, Zhenyu Chen, J. Xuan, Baowen Xu","doi":"10.1145/2884781.2884834","DOIUrl":"https://doi.org/10.1145/2884781.2884834","url":null,"abstract":"In many fields of software engineering, studies on human behavior have attracted a lot of attention; however, few such studies exist in automated debugging. Parnin and Orso conducted a pioneering study comparing the performance of programmers in debugging with and without a ranking-based fault localization technique, namely Spectrum-Based Fault Localization (SBFL). In this paper, we revisit the actual helpfulness of SBFL, by addressing some major problems that were not resolved in Parnin and Orso’s study. Our investigation involved 207 participants and 17 debugging tasks. A user-friendly SBFL tool was adopted. It was found that SBFL tended not to be helpful in improving the efficiency of debugging. By tracking and analyzing programmers’ focus of attention, we characterized their source code navigation patterns and provided in-depth explanations to the observations. Results indicated that (1) a short “first scan” on the source code tended to result in inefficient debugging; and (2) inspections on the pinpointed statements during the “follow-up browsing” were normally just quick skimming. Moreover, we found that the SBFL assistance may even slightly weaken programmers’ abilities in fault detection. Our observations imply interference between the mechanism of automated fault localization and the actual assistance needed by programmers in debugging. To resolve this interference, we provide several insights and suggestions.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"34 1","pages":"808-819"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79699883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀