2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)最新文献_第10页

Test-Driven Code Review: An Empirical Study 测试驱动代码审查:一项实证研究

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00110

D. Spadini, Fabio Palomba, T. Baum, Stefan Hanenberg, M. Bruntink, Alberto Bacchelli

Test-Driven Code Review (TDR) is a code review practice in which a reviewer inspects a patch by examining the changed test code before the changed production code. Although this practice has been mentioned positively by practitioners in informal literature and interviews, there is no systematic knowledge of its effects, prevalence, problems, and advantages. In this paper, we aim at empirically understanding whether this practice has an effect on code review effectiveness and how developers' perceive TDR. We conduct (i) a controlled experiment with 93 developers that perform more than 150 reviews, and (ii) 9 semi-structured interviews and a survey with 103 respondents to gather information on how TDR is perceived. Key results from the experiment show that developers adopting TDR find the same proportion of defects in production code, but more in test code, at the expenses of fewer maintainability issues in production code. Furthermore, we found that most developers prefer to review production code as they deem it more critical and tests should follow from it. Moreover, general poor test code quality and no tool support hinder the adoption of TDR. Public preprint: https://doi.org/10.5281/zenodo.2551217, data and materials: https://doi.org/10.5281/zenodo.2553139

测试驱动代码审查(TDR)是一种代码审查实践，在这种实践中，审查者通过在更改生产代码之前检查更改的测试代码来检查补丁。尽管从业者在非正式文献和访谈中积极地提到了这种做法，但对其效果、流行程度、问题和优势却没有系统的了解。在本文中，我们的目标是通过经验来理解这种做法是否对代码审查有效性有影响，以及开发人员如何看待TDR。我们进行了(i)对93名开发人员进行了超过150次审查的对照实验，以及(ii)对103名受访者进行了9次半结构化访谈和调查，以收集有关如何看待TDR的信息。实验的关键结果表明，采用TDR的开发人员在生产代码中发现了相同比例的缺陷，但在测试代码中发现的缺陷更多，而在生产代码中发现的可维护性问题较少。此外，我们发现大多数开发人员更喜欢审查产品代码，因为他们认为它更重要，测试应该紧随其后。此外，普遍较差的测试代码质量和没有工具支持阻碍了TDR的采用。公开预印本:https://doi.org/10.5281/zenodo.2551217，数据资料:https://doi.org/10.5281/zenodo.2553139

{"title":"Test-Driven Code Review: An Empirical Study","authors":"D. Spadini, Fabio Palomba, T. Baum, Stefan Hanenberg, M. Bruntink, Alberto Bacchelli","doi":"10.1109/ICSE.2019.00110","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00110","url":null,"abstract":"Test-Driven Code Review (TDR) is a code review practice in which a reviewer inspects a patch by examining the changed test code before the changed production code. Although this practice has been mentioned positively by practitioners in informal literature and interviews, there is no systematic knowledge of its effects, prevalence, problems, and advantages. In this paper, we aim at empirically understanding whether this practice has an effect on code review effectiveness and how developers' perceive TDR. We conduct (i) a controlled experiment with 93 developers that perform more than 150 reviews, and (ii) 9 semi-structured interviews and a survey with 103 respondents to gather information on how TDR is perceived. Key results from the experiment show that developers adopting TDR find the same proportion of defects in production code, but more in test code, at the expenses of fewer maintainability issues in production code. Furthermore, we found that most developers prefer to review production code as they deem it more critical and tests should follow from it. Moreover, general poor test code quality and no tool support hinder the adoption of TDR. Public preprint: https://doi.org/10.5281/zenodo.2551217, data and materials: https://doi.org/10.5281/zenodo.2553139","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"423 1","pages":"1061-1072"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75047860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

IconIntent: Automatic Identification of Sensitive UI Widgets Based on Icon Classification for Android Apps IconIntent: Android应用中基于图标分类的敏感UI小部件的自动识别

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00041

Xusheng Xiao, Xiaoyin Wang, Zhihao Cao, Hanlin Wang, Peng Gao

Many mobile applications (i.e., apps) include UI widgets to use or collect users' sensitive data. Thus, to identify suspicious sensitive data usage such as UI-permission mismatch, it is crucial to understand the intentions of UI widgets. However, many UI widgets leverage icons of specific shapes (object icons) and icons embedded with text (text icons) to express their intentions, posing challenges for existing detection techniques that analyze only textual data to identify sensitive UI widgets. In this work, we propose a novel app analysis framework, ICONINTENT, that synergistically combines program analysis and icon classification to identify sensitive UI widgets in Android apps. ICONINTENT automatically associates UI widgets and icons via static analysis on app's UI layout files and code, and then adapts computer vision techniques to classify the associated icons into eight categories of sensitive data. Our evaluations of ICONINTENT on 150 apps from Google Play show that ICONINTENT can detect 248 sensitive UI widgets in 97 apps, achieving a precision of 82.4%. When combined with SUPOR, the state-of-the-art sensitive UI widget identification technique based on text analysis, SUPOR +ICONINTENT can detect 487 sensitive UI widgets (101.2% improvement over SUPOR only), and reduces suspicious permissions to be inspected by 50.7% (129.4% improvement over SUPOR only).

许多移动应用程序(即应用程序)包括UI小部件来使用或收集用户的敏感数据。因此，要识别可疑的敏感数据使用，如UI权限不匹配，理解UI小部件的意图至关重要。然而，许多UI小部件利用特定形状的图标(对象图标)和嵌入文本的图标(文本图标)来表达它们的意图，这对现有的仅分析文本数据以识别敏感UI小部件的检测技术提出了挑战。在这项工作中，我们提出了一个新的应用程序分析框架，ICONINTENT，它协同结合了程序分析和图标分类，以识别Android应用程序中的敏感UI小部件。ICONINTENT通过对app的UI布局文件和代码进行静态分析，自动关联UI小部件和图标，然后利用计算机视觉技术将关联的图标分为8类敏感数据。我们对来自Google Play的150个应用程序的ICONINTENT评估表明，ICONINTENT可以检测97个应用程序中的248个敏感UI小部件，精度达到82.4%。当与基于文本分析的最先进的敏感UI小部件识别技术SUPOR结合使用时，SUPOR +ICONINTENT可以检测487个敏感UI小部件(仅比SUPOR提高101.2%)，并将可疑权限检查减少50.7%(仅比SUPOR提高129.4%)。

{"title":"IconIntent: Automatic Identification of Sensitive UI Widgets Based on Icon Classification for Android Apps","authors":"Xusheng Xiao, Xiaoyin Wang, Zhihao Cao, Hanlin Wang, Peng Gao","doi":"10.1109/ICSE.2019.00041","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00041","url":null,"abstract":"Many mobile applications (i.e., apps) include UI widgets to use or collect users' sensitive data. Thus, to identify suspicious sensitive data usage such as UI-permission mismatch, it is crucial to understand the intentions of UI widgets. However, many UI widgets leverage icons of specific shapes (object icons) and icons embedded with text (text icons) to express their intentions, posing challenges for existing detection techniques that analyze only textual data to identify sensitive UI widgets. In this work, we propose a novel app analysis framework, ICONINTENT, that synergistically combines program analysis and icon classification to identify sensitive UI widgets in Android apps. ICONINTENT automatically associates UI widgets and icons via static analysis on app's UI layout files and code, and then adapts computer vision techniques to classify the associated icons into eight categories of sensitive data. Our evaluations of ICONINTENT on 150 apps from Google Play show that ICONINTENT can detect 248 sensitive UI widgets in 97 apps, achieving a precision of 82.4%. When combined with SUPOR, the state-of-the-art sensitive UI widget identification technique based on text analysis, SUPOR +ICONINTENT can detect 487 sensitive UI widgets (101.2% improvement over SUPOR only), and reduces suspicious permissions to be inspected by 50.7% (129.4% improvement over SUPOR only).","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"126 1","pages":"257-268"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75826196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

A Framework for Checking Regression Test Selection Tools 检验回归测试选择工具的框架

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00056

Chenguang Zhu, Owolabi Legunsen, A. Shi, Miloš Gligorić

Regression test selection (RTS) reduces regression testing costs by re-running only tests that can change behavior due to code changes. Researchers and large software organizations recently developed and adopted several RTS tools to deal with the rapidly growing costs of regression testing. As RTS tools gain adoption, it becomes critical to check that they are correct and efficient. Unfortunately, checking RTS tools currently relies solely on limited tests that RTS tool developers manually write. We present RTSCheck, the first framework for checking RTS tools. RTSCheck feeds evolving programs (i.e., sequences of program revisions) to an RTS tool and checks the output against rules inspired by existing RTS test suites. Violations of these rules are likely due to deviations from expected RTS tool behavior, and indicative of bugs in the tool. RTSCheck uses three components to obtain evolving programs: (1) AutoEP automatically generates evolving programs and corresponding tests, (2) DefectsEP uses buggy and fixed program revisions from bug databases, and (3) EvoEP uses sequences of program revisions from actual open-source projects' histories. We used RTSCheck to check three recently developed RTS tools for Java: Clover, Ekstazi, and STARTS. RTSCheck discovered 27 bugs in these three tools.

回归测试选择(RTS)通过只重新运行可能由于代码更改而改变行为的测试来减少回归测试成本。研究人员和大型软件组织最近开发并采用了几个RTS工具来处理快速增长的回归测试成本。随着RTS工具的普及，检查它们的正确性和效率变得至关重要。不幸的是，检查RTS工具目前完全依赖于RTS工具开发人员手动编写的有限测试。我们提出RTSCheck，第一个检查RTS工具的框架。RTSCheck将不断发展的程序(即程序修订序列)提供给RTS工具，并根据现有RTS测试套件启发的规则检查输出。违反这些规则可能是由于偏离预期的RTS工具行为，以及工具中的缺陷。RTSCheck使用三个组件来获取演进的程序:(1)AutoEP自动生成演进的程序和相应的测试，(2)DefectsEP使用来自bug数据库的有bug和固定的程序修订，(3)EvoEP使用来自实际开源项目历史的程序修订序列。我们使用RTSCheck检查了最近开发的三种Java RTS工具:Clover、Ekstazi和STARTS。RTSCheck在这三个工具中发现了27个bug。

{"title":"A Framework for Checking Regression Test Selection Tools","authors":"Chenguang Zhu, Owolabi Legunsen, A. Shi, Miloš Gligorić","doi":"10.1109/ICSE.2019.00056","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00056","url":null,"abstract":"Regression test selection (RTS) reduces regression testing costs by re-running only tests that can change behavior due to code changes. Researchers and large software organizations recently developed and adopted several RTS tools to deal with the rapidly growing costs of regression testing. As RTS tools gain adoption, it becomes critical to check that they are correct and efficient. Unfortunately, checking RTS tools currently relies solely on limited tests that RTS tool developers manually write. We present RTSCheck, the first framework for checking RTS tools. RTSCheck feeds evolving programs (i.e., sequences of program revisions) to an RTS tool and checks the output against rules inspired by existing RTS test suites. Violations of these rules are likely due to deviations from expected RTS tool behavior, and indicative of bugs in the tool. RTSCheck uses three components to obtain evolving programs: (1) AutoEP automatically generates evolving programs and corresponding tests, (2) DefectsEP uses buggy and fixed program revisions from bug databases, and (3) EvoEP uses sequences of program revisions from actual open-source projects' histories. We used RTSCheck to check three recently developed RTS tools for Java: Clover, Ekstazi, and STARTS. RTSCheck discovered 27 bugs in these three tools.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"13 1","pages":"430-441"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78797011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Could I Have a Stack Trace to Examine the Dependency Conflict Issue? 我可以使用堆栈跟踪来检查依赖冲突问题吗?

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00068

Ying Wang, Ming Wen, Rongxin Wu, Zhenwei Liu, Shin Hwei Tan, Zhiliang Zhu, Hai Yu, S. Cheung

Intensive use of libraries in Java projects brings potential risk of dependency conflicts, which occur when a project directly or indirectly depends on multiple versions of the same library or class. When this happens, JVM loads one version and shadows the others. Runtime exceptions can occur when methods in the shadowed versions are referenced. Although project management tools such as Maven are able to give warnings of potential dependency conflicts when a project is built, developers often ask for crashing stack traces before examining these warnings. It motivates us to develop Riddle, an automated approach that generates tests and collects crashing stack traces for projects subject to risk of dependency conflicts. Riddle, built on top of Asm and Evosuite, combines condition mutation, search strategies and condition restoration. We applied Riddle on 19 real-world Java projects with duplicate libraries or classes. We reported 20 identified dependency conflicts including their induced crashing stack traces and the details of generated tests. Among them, 15 conflicts were confirmed by developers as real issues, and 10 were readily fixed. The evaluation results demonstrate the effectiveness and usefulness of Riddle.

在Java项目中大量使用库会带来依赖冲突的潜在风险，当项目直接或间接依赖于同一库或类的多个版本时，就会发生依赖冲突。发生这种情况时，JVM加载一个版本并隐藏其他版本。当引用隐藏版本中的方法时，可能会发生运行时异常。尽管Maven等项目管理工具能够在构建项目时对潜在的依赖冲突发出警告，但开发人员在检查这些警告之前通常会要求查看崩溃堆栈跟踪。它激励我们开发Riddle，这是一种自动化的方法，可以生成测试，并为受依赖冲突风险影响的项目收集崩溃堆栈跟踪。Riddle是建立在Asm和Evosuite之上的，它结合了条件突变、搜索策略和条件恢复。我们在19个具有重复库或类的真实Java项目中应用了Riddle。我们报告了20个确定的依赖冲突，包括它们引起的崩溃堆栈跟踪和生成测试的详细信息。其中，15个冲突被开发人员确认为实际问题，10个已经得到了修复。评价结果表明了该方法的有效性和实用性。

{"title":"Could I Have a Stack Trace to Examine the Dependency Conflict Issue?","authors":"Ying Wang, Ming Wen, Rongxin Wu, Zhenwei Liu, Shin Hwei Tan, Zhiliang Zhu, Hai Yu, S. Cheung","doi":"10.1109/ICSE.2019.00068","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00068","url":null,"abstract":"Intensive use of libraries in Java projects brings potential risk of dependency conflicts, which occur when a project directly or indirectly depends on multiple versions of the same library or class. When this happens, JVM loads one version and shadows the others. Runtime exceptions can occur when methods in the shadowed versions are referenced. Although project management tools such as Maven are able to give warnings of potential dependency conflicts when a project is built, developers often ask for crashing stack traces before examining these warnings. It motivates us to develop Riddle, an automated approach that generates tests and collects crashing stack traces for projects subject to risk of dependency conflicts. Riddle, built on top of Asm and Evosuite, combines condition mutation, search strategies and condition restoration. We applied Riddle on 19 real-world Java projects with duplicate libraries or classes. We reported 20 identified dependency conflicts including their induced crashing stack traces and the details of generated tests. Among them, 15 conflicts were confirmed by developers as real issues, and 10 were readily fixed. The evaluation results demonstrate the effectiveness and usefulness of Riddle.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"572-583"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90403364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns 重点:挖掘API函数调用和使用模式的推荐系统

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00109

P. Nguyen, Juri Di Rocco, D. D. Ruscio, Lina Ochoa, Thomas Degueule, M. D. Penta

Software developers interact with APIs on a daily basis and, therefore, often face the need to learn how to use new APIs suitable for their purposes. Previous work has shown that recommending usage patterns to developers facilitates the learning process. Current approaches to usage pattern recommendation, however, still suffer from high redundancy and poor run-time performance. In this paper, we reformulate the problem of usage pattern recommendation in terms of a collaborative-filtering recommender system. We present a new tool, FOCUS, which mines open-source project repositories to recommend API method invocations and usage patterns by analyzing how APIs are used in projects similar to the current project. We evaluate FOCUS on a large number of Java projects extracted from GitHub and Maven Central and find that it outperforms the state-of-the-art approach PAM with regards to success rate, accuracy, and execution time. Results indicate the suitability of context-aware collaborative-filtering recommender systems to provide API usage patterns.

软件开发人员每天都要与api进行交互，因此，经常需要学习如何使用适合其目的的新api。以前的工作表明，向开发人员推荐使用模式可以促进学习过程。但是，当前的使用模式推荐方法仍然存在高冗余和运行时性能差的问题。在本文中，我们从协同过滤推荐系统的角度重新表述了使用模式推荐问题。我们提出了一个新工具FOCUS，它通过分析API在与当前项目相似的项目中的使用情况，挖掘开源项目存储库，从而推荐API方法调用和使用模式。我们对从GitHub和Maven Central中提取的大量Java项目进行了评估，发现它在成功率、准确性和执行时间方面优于最先进的PAM方法。结果表明上下文感知协同过滤推荐系统提供API使用模式的适用性。

引用次数: 77

Supporting the Statistical Analysis of Variability Models 支持变异模型的统计分析

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00091

R. Heradio, David Fernández-Amorós, Christoph Mayr-Dorn, Alexander Egyed

Variability models are broadly used to specify the configurable features of highly customizable software. In practice, they can be large, defining thousands of features with their dependencies and conflicts. In such cases, visualization techniques and automated analysis support are crucial for understanding the models. This paper contributes to this line of research by presenting a novel, probabilistic foundation for statistical reasoning about variability models. Our approach not only provides a new way to visualize, describe and interpret variability models, but it also supports the improvement of additional state-of-the-art methods for software product lines; for instance, providing exact computations where only approximations were available before, and increasing the sensitivity of existing analysis operations for variability models. We demonstrate the benefits of our approach using real case studies with up to 17,365 features, and written in two different languages (KConfig and feature models).

可变性模型被广泛用于指定高度可定制软件的可配置特性。在实践中，它们可能很大，定义了数千个带有依赖关系和冲突的特性。在这种情况下，可视化技术和自动分析支持对于理解模型至关重要。本文通过提出一种关于变异模型的统计推理的新颖的概率基础，为这一研究领域做出了贡献。我们的方法不仅提供了一种可视化、描述和解释可变性模型的新方法，而且还支持对软件产品线的其他最先进方法的改进;例如，提供精确的计算，而以前只有近似值可用，并增加现有的分析操作对可变性模型的敏感性。我们通过使用两种不同的语言(KConfig和特征模型)编写的多达17,365个特征的真实案例研究来展示我们方法的好处。

引用次数: 16

Leveraging Artifact Trees to Evolve and Reuse Safety Cases 利用工件树来发展和重用安全用例

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00124

Ankit Agrawal, S. Khoshmanesh, Michael Vierhauser, Mona Rahimi, J. Cleland-Huang, R. Lutz

Safety Assurance Cases (SACs) are increasingly used to guide and evaluate the safety of software-intensive systems. They are used to construct a hierarchically organized set of claims, arguments, and evidence in order to provide a structured argument that a system is safe for use. However, as the system evolves and grows in size, a SAC can be difficult to maintain. In this paper we utilize design science to develop a novel solution for identifying areas of a SAC that are affected by changes to the system. Moreover, we generate actionable recommendations for updating the SAC, including its underlying artifacts and trace links, in order to evolve an existing safety case for use in a new version of the system. Our approach, Safety Artifact Forest Analysis (SAFA), leverages traceability to automatically compare software artifacts from a previously approved or certified version with a new version of the system. We identify, visualize, and explain changes in a Delta Tree. We evaluate our approach using the Dronology system for monitoring and coordinating the actions of cooperating, small Unmanned Aerial Vehicles. Results from a user study show that SAFA helped users to identify changes that potentially impacted system safety and provided information that could be used to help maintain and evolve a SAC.

安全保证案例(SACs)越来越多地用于指导和评估软件密集型系统的安全性。它们被用来构造一组分层组织的声明、论证和证据，以提供一个结构化的论证，说明系统可以安全使用。然而，随着系统的发展和规模的增长，SAC可能难以维护。在本文中，我们利用设计科学来开发一种新的解决方案，用于识别受系统变化影响的SAC区域。此外，我们为更新SAC(包括其底层工件和跟踪链接)生成可操作的建议，以便发展现有的安全案例，以便在系统的新版本中使用。我们的方法，安全工件森林分析(SAFA)，利用可追溯性来自动比较来自先前批准或认证版本的软件工件与系统的新版本。我们识别、可视化并解释Delta树中的变化。我们利用无人机系统评估我们的方法，以监测和协调合作的小型无人机的行动。来自用户研究的结果表明，SAFA帮助用户识别可能影响系统安全的变化，并提供可用于帮助维护和发展SAC的信息。

{"title":"Leveraging Artifact Trees to Evolve and Reuse Safety Cases","authors":"Ankit Agrawal, S. Khoshmanesh, Michael Vierhauser, Mona Rahimi, J. Cleland-Huang, R. Lutz","doi":"10.1109/ICSE.2019.00124","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00124","url":null,"abstract":"Safety Assurance Cases (SACs) are increasingly used to guide and evaluate the safety of software-intensive systems. They are used to construct a hierarchically organized set of claims, arguments, and evidence in order to provide a structured argument that a system is safe for use. However, as the system evolves and grows in size, a SAC can be difficult to maintain. In this paper we utilize design science to develop a novel solution for identifying areas of a SAC that are affected by changes to the system. Moreover, we generate actionable recommendations for updating the SAC, including its underlying artifacts and trace links, in order to evolve an existing safety case for use in a new version of the system. Our approach, Safety Artifact Forest Analysis (SAFA), leverages traceability to automatically compare software artifacts from a previously approved or certified version with a new version of the system. We identify, visualize, and explain changes in a Delta Tree. We evaluate our approach using the Dronology system for monitoring and coordinating the actions of cooperating, small Unmanned Aerial Vehicles. Results from a user study show that SAFA helped users to identify changes that potentially impacted system safety and provided information that could be used to help maintain and evolve a SAC.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"25 1","pages":"1222-1233"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78117406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Intention-Based Integration of Software Variants 基于意图的软件变体集成

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00090

Max Lillack, Stefan Stanciulescu, Wilhelm Hedman, T. Berger, A. Wąsowski

Cloning is a simple way to create new variants of a system. While cheap at first, it increases maintenance cost in the long term. Eventually, the cloned variants need to be integrated into a configurable platform. Such an integration is challenging: it involves merging the usual code improvements between the variants, and also integrating the variable code (features) into the platform. Thus, variant integration differs from traditional soft- ware merging, which does not produce or organize configurable code, but creates a single system that cannot be configured into variants. In practice, variant integration requires fine-grained code edits, performed in an exploratory manner, in multiple iterations. Unfortunately, little tool support exists for integrating cloned variants. In this work, we show that fine-grained code edits needed for integration can be alleviated by a small set of integration intentions-domain-specific actions declared over code snippets controlling the integration. Developers can interactively explore the integration space by declaring (or revoking) intentions on code elements. We contribute the intentions (e.g., 'keep functionality' or 'keep as a configurable feature') and the IDE tool INCLINE, which implements the intentions and five editable views that visualize the integration process and allow declaring intentions producing a configurable integrated platform. In a series of experiments, we evaluated the completeness of the pro- posed intentions, the correctness and performance of INCLINE, and the benefits of using intentions for variant integration. The experiments show that INCLINE can handle complex integration tasks, that views help to navigate the code, and that it consistently reduces mistakes made by developers during variant integration.

克隆是一种创建系统新变体的简单方法。虽然一开始很便宜，但从长远来看，它增加了维护成本。最终，克隆的变体需要集成到一个可配置的平台中。这样的集成是具有挑战性的:它涉及到在变体之间合并通常的代码改进，以及将可变代码(特性)集成到平台中。因此，变体集成不同于传统的软件合并，后者不产生或组织可配置代码，而是创建一个不能配置为变体的单一系统。在实践中，变体集成需要在多个迭代中以探索性的方式执行细粒度的代码编辑。不幸的是，很少有工具支持集成克隆变体。在这项工作中，我们展示了集成所需的细粒度代码编辑可以通过一小组集成意图(在控制集成的代码片段上声明的特定于领域的操作)来减轻。开发人员可以通过在代码元素上声明(或撤销)意图来交互式地探索集成空间。我们贡献了意图(例如，“保持功能”或“保持作为一个可配置的特性”)和IDE工具INCLINE，它实现了意图和五个可视化集成过程的可编辑视图，并允许声明意图来产生一个可配置的集成平台。在一系列的实验中，我们评估了所提出意图的完整性、倾向性的正确性和性能，以及使用意图进行变量集成的好处。实验表明，INCLINE可以处理复杂的集成任务，视图有助于导航代码，并且它始终减少开发人员在变体集成期间所犯的错误。

{"title":"Intention-Based Integration of Software Variants","authors":"Max Lillack, Stefan Stanciulescu, Wilhelm Hedman, T. Berger, A. Wąsowski","doi":"10.1109/ICSE.2019.00090","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00090","url":null,"abstract":"Cloning is a simple way to create new variants of a system. While cheap at first, it increases maintenance cost in the long term. Eventually, the cloned variants need to be integrated into a configurable platform. Such an integration is challenging: it involves merging the usual code improvements between the variants, and also integrating the variable code (features) into the platform. Thus, variant integration differs from traditional soft- ware merging, which does not produce or organize configurable code, but creates a single system that cannot be configured into variants. In practice, variant integration requires fine-grained code edits, performed in an exploratory manner, in multiple iterations. Unfortunately, little tool support exists for integrating cloned variants. In this work, we show that fine-grained code edits needed for integration can be alleviated by a small set of integration intentions-domain-specific actions declared over code snippets controlling the integration. Developers can interactively explore the integration space by declaring (or revoking) intentions on code elements. We contribute the intentions (e.g., 'keep functionality' or 'keep as a configurable feature') and the IDE tool INCLINE, which implements the intentions and five editable views that visualize the integration process and allow declaring intentions producing a configurable integrated platform. In a series of experiments, we evaluated the completeness of the pro- posed intentions, the correctness and performance of INCLINE, and the benefits of using intentions for variant integration. The experiments show that INCLINE can handle complex integration tasks, that views help to navigate the code, and that it consistently reduces mistakes made by developers during variant integration.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"13 1","pages":"831-842"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74431010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Global Optimization of Numerical Programs Via Prioritized Stochastic Algebraic Transformations 基于优先随机代数变换的数值程序全局优化

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00116

Xie Wang, Huaijin Wang, Z. Su, Enyi Tang, Xin Chen, Weijun Shen, Zhenyu Chen, Linzhang Wang, Xianpei Zhang, Xuandong Li

Numerical code is often applied in the safety-critical, but resource-limited areas. Hence, it is crucial for it to be correct and efficient, both of which are difficult to ensure. On one hand, accumulated rounding errors in numerical programs can cause system failures. On the other hand, arbitrary/infinite-precision arithmetic, although accurate, is infeasible in practice and especially in resource-limited scenarios because it performs thousands of times slower than floating-point arithmetic. Thus, it has been a significant challenge to obtain high-precision, easy-to-maintain, and efficient numerical code. This paper introduces a novel global optimization framework to tackle this challenge. Using our framework, a developer simply writes the infinite-precision numerical program directly following the problem's mathematical requirement specification. The resulting code is correct and easy-to-maintain, but inefficient. Our framework then optimizes the program in a global fashion (i.e., considering the whole program, rather than individual expressions or statements as in prior work), the key technical difficulty this work solves. To this end, it analyzes the program's numerical value flows across different statements through a symbolic trace extraction algorithm, and generates optimized traces via stochastic algebraic transformations guided by effective rule selection. We first evaluate our technique on numerical benchmarks from the literature; results show that our global optimization achieves significantly higher worst-case accuracy than the state-of-the-art numerical optimization tool. Second, we show that our framework is also effective on benchmarks having complicated program structures, which are challenging for numerical optimization. Finally, we apply our framework on real-world code to successfully detect numerical bugs that have been confirmed by developers.

数字编码通常应用于安全关键但资源有限的领域。因此，正确和高效是至关重要的，而这两者很难保证。一方面，数值程序中累积的舍入误差会导致系统故障。另一方面，任意/无限精度算法虽然准确，但在实践中是不可行的，特别是在资源有限的情况下，因为它的执行速度比浮点算法慢数千倍。因此，获得高精度、易于维护和高效的数字代码一直是一个重大挑战。本文介绍了一种新的全局优化框架来解决这一挑战。使用我们的框架，开发人员只需直接按照问题的数学需求规范编写无限精度数值程序。生成的代码是正确且易于维护的，但效率低下。然后，我们的框架以全局方式优化程序(即，考虑整个程序，而不是像之前的工作那样考虑单个表达式或语句)，这是这项工作解决的关键技术难题。为此，通过符号轨迹提取算法分析程序在不同语句之间的数值流，并在有效规则选择的指导下，通过随机代数变换生成优化的轨迹。我们首先在文献中的数值基准上评估我们的技术;结果表明，与最先进的数值优化工具相比，我们的全局优化实现了显著更高的最坏情况精度。其次，我们表明我们的框架在具有复杂程序结构的基准测试中也是有效的，这对数值优化具有挑战性。最后，我们将我们的框架应用于现实世界的代码，以成功地检测开发人员已经确认的数字错误。

{"title":"Global Optimization of Numerical Programs Via Prioritized Stochastic Algebraic Transformations","authors":"Xie Wang, Huaijin Wang, Z. Su, Enyi Tang, Xin Chen, Weijun Shen, Zhenyu Chen, Linzhang Wang, Xianpei Zhang, Xuandong Li","doi":"10.1109/ICSE.2019.00116","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00116","url":null,"abstract":"Numerical code is often applied in the safety-critical, but resource-limited areas. Hence, it is crucial for it to be correct and efficient, both of which are difficult to ensure. On one hand, accumulated rounding errors in numerical programs can cause system failures. On the other hand, arbitrary/infinite-precision arithmetic, although accurate, is infeasible in practice and especially in resource-limited scenarios because it performs thousands of times slower than floating-point arithmetic. Thus, it has been a significant challenge to obtain high-precision, easy-to-maintain, and efficient numerical code. This paper introduces a novel global optimization framework to tackle this challenge. Using our framework, a developer simply writes the infinite-precision numerical program directly following the problem's mathematical requirement specification. The resulting code is correct and easy-to-maintain, but inefficient. Our framework then optimizes the program in a global fashion (i.e., considering the whole program, rather than individual expressions or statements as in prior work), the key technical difficulty this work solves. To this end, it analyzes the program's numerical value flows across different statements through a symbolic trace extraction algorithm, and generates optimized traces via stochastic algebraic transformations guided by effective rule selection. We first evaluate our technique on numerical benchmarks from the literature; results show that our global optimization achieves significantly higher worst-case accuracy than the state-of-the-art numerical optimization tool. Second, we show that our framework is also effective on benchmarks having complicated program structures, which are challenging for numerical optimization. Finally, we apply our framework on real-world code to successfully detect numerical bugs that have been confirmed by developers.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"15 1","pages":"1131-1141"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81507014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes BugSwarm:挖掘和持续增长可复制的故障和修复数据集

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-03-15 DOI: 10.1109/ICSE.2019.00048

Naji Dmeiri, David A. Tomassi, Yichen Wang, Antara Bhowmick, Yen-Chuan Liu, Premkumar T. Devanbu, Bogdan Vasilescu, Cindy Rubio-Gonz'alez

Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like TRAVIS-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BUGSWARM, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BUGSWARM toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually.

故障检测、定位和修复方法对软件质量至关重要;但是很难评价它们的通用性、适用性和当前的有效性。大量的、多样化的、真实的、可持久再现的故障和修复的数据集对于软件质量方法的良好实验评估是至关重要的，但它们很难组装并保持最新，而且成本高昂。现代的持续集成(CI)方法，如TRAVIS-CI，被广泛使用，完全可配置，并在定制构建的容器中执行，保证了通往更大缺陷数据集的路径。如果我们能够识别并归档失败的运行和随后通过的运行，那么容器将为构建和测试的持久的未来再现性提供实质性的保证。然而，要使这成为实际的现实，必须克服若干障碍。我们描述了BUGSWARM，这是一个工具集，它可以克服这些障碍，创建一个可扩展的、多样化的、现实的、持续增长的、可持久复制的、失败的和通过的真实世界的开源系统版本。BUGSWARM工具包已经在Java和Python中收集了3091个失败对，所有这些都打包在完全可复制的容器中。此外，该工具包可以定期运行以检测失败通过的活动，从而不断增长数据集。

{"title":"BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes","authors":"Naji Dmeiri, David A. Tomassi, Yichen Wang, Antara Bhowmick, Yen-Chuan Liu, Premkumar T. Devanbu, Bogdan Vasilescu, Cindy Rubio-Gonz'alez","doi":"10.1109/ICSE.2019.00048","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00048","url":null,"abstract":"Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like TRAVIS-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BUGSWARM, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BUGSWARM toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"32 1","pages":"339-349"},"PeriodicalIF":0.0,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90948882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59