2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)最新文献_第10页

Interface Compliance of Inline Assembly: Automatically Check, Patch and Refine 内联装配接口合规性:自动检查、修补和改进

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-15 DOI: 10.1109/ICSE43902.2021.00113

Frédéric Recoules, Sébastien Bardin, Richard Bonichon, Matthieu Lemerre, L. Mounier, Marie-Laure Potet

Inline assembly is still a common practice in low-level C programming, typically for efficiency reasons or for accessing specific hardware resources. Such embedded assembly codes in the GNU syntax (supported by major compilers such as GCC, Clang and ICC) have an interface specifying how the assembly codes interact with the C environment. For simplicity reasons, the compiler treats GNU inline assembly codes as blackboxes and relies only on their interface to correctly glue them into the compiled C code. Therefore, the adequacy between the assembly chunk and its interface (named compliance) is of primary importance, as such compliance issues can lead to subtle and hard-to-find bugs. We propose RUSTInA, the first automated technique for formally checking inline assembly compliance, with the extra ability to propose (proven) patches and (optimization) refinements in certain cases. RUSTInA is based on an original formalization of the inline assembly compliance problem together with novel dedicated algorithms. Our prototype has been evaluated on 202 Debian packages with inline assembly (2656 chunks), finding 2183 issues in 85 packages – 986 significant issues in 54 packages (including major projects such as ffmpeg or ALSA), and proposing patches for 92% of them. Currently, 38 patches have already been accepted (solving 156 significant issues), with positive feedback from development teams.

内联汇编在低级C编程中仍然是一种常见的做法，通常是为了提高效率或访问特定的硬件资源。GNU语法中的嵌入式汇编代码(主要编译器如GCC、Clang和ICC都支持)有一个接口，指定汇编代码如何与C环境交互。为了简单起见，编译器将GNU内联汇编代码视为黑盒，并且只依赖它们的接口将它们正确地粘合到已编译的C代码中。因此，程序集块与其接口(称为遵从性)之间的适当性至关重要，因为这样的遵从性问题可能导致微妙且难以发现的错误。我们推荐russtina，这是第一个用于正式检查内联装配遵从性的自动化技术，具有在某些情况下提出(经过验证的)补丁和(优化)改进的额外能力。RUSTInA基于内联装配遵从性问题的原始形式化以及新颖的专用算法。我们的原型已经在202个带有内联汇编的Debian软件包(2656块)上进行了评估，在85个软件包中发现了2183个问题——在54个软件包中发现了986个重大问题(包括ffmpeg或ALSA等主要项目)，并为其中92%的问题提出了补丁。目前，已经接受了38个补丁(解决了156个重大问题)，并得到了开发团队的积极反馈。

{"title":"Interface Compliance of Inline Assembly: Automatically Check, Patch and Refine","authors":"Frédéric Recoules, Sébastien Bardin, Richard Bonichon, Matthieu Lemerre, L. Mounier, Marie-Laure Potet","doi":"10.1109/ICSE43902.2021.00113","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00113","url":null,"abstract":"Inline assembly is still a common practice in low-level C programming, typically for efficiency reasons or for accessing specific hardware resources. Such embedded assembly codes in the GNU syntax (supported by major compilers such as GCC, Clang and ICC) have an interface specifying how the assembly codes interact with the C environment. For simplicity reasons, the compiler treats GNU inline assembly codes as blackboxes and relies only on their interface to correctly glue them into the compiled C code. Therefore, the adequacy between the assembly chunk and its interface (named compliance) is of primary importance, as such compliance issues can lead to subtle and hard-to-find bugs. We propose RUSTInA, the first automated technique for formally checking inline assembly compliance, with the extra ability to propose (proven) patches and (optimization) refinements in certain cases. RUSTInA is based on an original formalization of the inline assembly compliance problem together with novel dedicated algorithms. Our prototype has been evaluated on 202 Debian packages with inline assembly (2656 chunks), finding 2183 issues in 85 packages – 986 significant issues in 54 packages (including major projects such as ffmpeg or ALSA), and proposing patches for 92% of them. Currently, 38 patches have already been accepted (solving 156 significant issues), with positive feedback from development teams.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128657373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

An Evolutionary Study of Configuration Design and Implementation in Cloud Systems 云系统中配置设计与实现的演化研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-14 DOI: 10.1109/ICSE43902.2021.00029

Yuanliang Zhang, Haochen He, Owolabi Legunsen, Shanshan Li, Wei Dong, Tianyin Xu

Many techniques were proposed for detecting software misconfigurations in cloud systems and for diagnosing unintended behavior caused by such misconfigurations. Detection and diagnosis are steps in the right direction: misconfigurations cause many costly failures and severe performance issues. But, we argue that continued focus on detection and diagnosis is symptomatic of a more serious problem: configuration design and implementation are not yet first-class software engineering endeavors in cloud systems. Little is known about how and why developers evolve configuration design and implementation, and the challenges that they face in doing so. This paper presents a source-code level study of the evolution of configuration design and implementation in cloud systems. Our goal is to understand the rationale and developer practices for revising initial configuration design/implementation decisions, especially in response to consequences of misconfigurations. To this end, we studied 1178 configuration-related commits from a 2.5 year version-control history of four large-scale, actively-maintained open-source cloud systems (HDFS, HBase, Spark, and Cassandra). We derive new insights into the software configuration engineering process. Our results motivate new techniques for proactively reducing misconfigurations by improving the configuration design and implementation process in cloud systems. We highlight a number of future research directions.

提出了许多用于检测云系统中的软件错误配置和诊断由此类错误配置引起的意外行为的技术。检测和诊断是朝着正确方向迈出的步骤:错误配置会导致许多代价高昂的故障和严重的性能问题。但是，我们认为，持续关注检测和诊断是一个更严重问题的征兆:配置设计和实现在云系统中还不是一流的软件工程努力。很少有人知道开发人员如何以及为什么发展配置设计和实现，以及他们在这样做时面临的挑战。本文对云系统中配置设计和实现的演变进行了源代码级的研究。我们的目标是了解修改初始配置设计/实现决策的基本原理和开发人员实践，特别是在响应错误配置的后果时。为此，我们研究了四个大规模、主动维护的开源云系统(HDFS、HBase、Spark和Cassandra) 2.5年的版本控制历史中1178个与配置相关的提交。我们获得了对软件配置工程过程的新见解。我们的结果激发了通过改进云系统中的配置设计和实现过程来主动减少错误配置的新技术。我们强调了一些未来的研究方向。

{"title":"An Evolutionary Study of Configuration Design and Implementation in Cloud Systems","authors":"Yuanliang Zhang, Haochen He, Owolabi Legunsen, Shanshan Li, Wei Dong, Tianyin Xu","doi":"10.1109/ICSE43902.2021.00029","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00029","url":null,"abstract":"Many techniques were proposed for detecting software misconfigurations in cloud systems and for diagnosing unintended behavior caused by such misconfigurations. Detection and diagnosis are steps in the right direction: misconfigurations cause many costly failures and severe performance issues. But, we argue that continued focus on detection and diagnosis is symptomatic of a more serious problem: configuration design and implementation are not yet first-class software engineering endeavors in cloud systems. Little is known about how and why developers evolve configuration design and implementation, and the challenges that they face in doing so. This paper presents a source-code level study of the evolution of configuration design and implementation in cloud systems. Our goal is to understand the rationale and developer practices for revising initial configuration design/implementation decisions, especially in response to consequences of misconfigurations. To this end, we studied 1178 configuration-related commits from a 2.5 year version-control history of four large-scale, actively-maintained open-source cloud systems (HDFS, HBase, Spark, and Cassandra). We derive new insights into the software configuration engineering process. Our results motivate new techniques for proactively reducing misconfigurations by improving the configuration design and implementation process in cloud systems. We highlight a number of future research directions.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128301380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Automatically Matching Bug Reports With Related App Reviews 自动匹配Bug报告与相关的应用程序评论

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-14 DOI: 10.1109/ICSE43902.2021.00092

Marlo Häring, Christoph Stanik, W. Maalej

App stores allow users to give valuable feedback on apps, and developers to find this feedback and use it for the software evolution. However, finding user feedback that matches existing bug reports in issue trackers is challenging as users and developers often use a different language. In this work, we introduce DeepMatcher, an automatic approach using state-of-the-art deep learning methods to match problem reports in app reviews to bug reports in issue trackers. We evaluated DeepMatcher with four open-source apps quantitatively and qualitatively. On average, DeepMatcher achieved a hit ratio of 0.71 and a Mean Average Precision of 0.55. For 91 problem reports, DeepMatcher did not find any matching bug report. When manually analyzing these 91 problem reports and the issue trackers of the studied apps, we found that in 47 cases, users actually described a problem before developers discovered and documented it in the issue tracker. We discuss our findings and different use cases for DeepMatcher.

应用商店允许用户对应用提供有价值的反馈，开发者可以找到这些反馈并将其用于软件发展。然而，在问题跟踪器中找到与现有错误报告相匹配的用户反馈是具有挑战性的，因为用户和开发人员经常使用不同的语言。在这项工作中，我们介绍了DeepMatcher，这是一种使用最先进的深度学习方法来匹配应用程序审查中的问题报告和问题跟踪器中的错误报告的自动方法。我们用四个开源应用程序对DeepMatcher进行了定量和定性评估。平均而言，DeepMatcher的命中率为0.71，平均精度为0.55。对于91个问题报告，DeepMatcher没有找到任何匹配的错误报告。在手动分析这91个问题报告和问题跟踪器时，我们发现，在47个案例中，用户在开发人员发现问题并在问题跟踪器中记录问题之前就已经描述了问题。我们讨论了我们的发现和DeepMatcher的不同用例。

引用次数: 27

Why Security Defects Go Unnoticed During Code Reviews? A Case-Control Study of the Chromium OS Project 为什么在代码审查期间安全缺陷不被注意?铬操作系统项目的病例对照研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-13 DOI: 10.1109/ICSE43902.2021.00124

Rajshakhar Paul, Asif Kamal Turzo, Amiangshu Bosu

Peer code review has been found to be effective in identifying security vulnerabilities. However, despite practicing mandatory code reviews, many Open Source Software (OSS) projects still encounter a large number of post-release security vulnerabilities, as some security defects escape those. Therefore, a project manager may wonder if there was any weakness or inconsistency during a code review that missed a security vulnerability. Answers to this question may help a manager pinpointing areas of concern and taking measures to improve the effectiveness of his/her project's code reviews in identifying security defects. Therefore, this study aims to identify the factors that differentiate code reviews that successfully identified security defects from those that missed such defects. With this goal, we conduct a case-control study of Chromium OS project. Using multi-stage semi-automated approaches, we build a dataset of 516 code reviews that successfully identified security defects and 374 code reviews where security defects escaped. The results of our empirical study suggest that the are significant differences between the categories of security defects that are identified and that are missed during code reviews. A logistic regression model fitted on our dataset achieved an AUC score of 0.91 and has identified nine code review attributes that influence identifications of security defects. While time to complete a review, the number of mutual reviews between two developers, and if the review is for a bug fix have positive impacts on vulnerability identification, opposite effects are observed from the number of directories under review, the number of total reviews by a developer, and the total number of prior commits for the file under review.

对等代码审查已被发现在识别安全漏洞方面是有效的。然而，尽管实行了强制性的代码审查，许多开源软件(OSS)项目仍然遇到大量发布后的安全漏洞，因为一些安全缺陷逃避了这些漏洞。因此，项目经理可能会怀疑在代码审查期间是否存在任何缺陷或不一致，从而错过了安全漏洞。对这个问题的回答可以帮助经理精确地指出关注的领域，并采取措施来提高他/她的项目代码审查在识别安全性缺陷方面的有效性。因此，本研究的目的是识别那些区分代码审查的因素，这些代码审查成功地识别了安全缺陷，而那些没有识别出这些缺陷。为此，我们对Chromium OS项目进行了病例对照研究。使用多阶段半自动化的方法，我们构建了一个516个代码审查的数据集，这些代码审查成功地识别了安全缺陷，374个代码审查没有发现安全缺陷。我们的实证研究的结果表明，在代码审查期间被识别的和被遗漏的安全缺陷的类别之间存在显著的差异。在我们的数据集上拟合的逻辑回归模型获得了0.91的AUC分数，并确定了影响安全缺陷识别的9个代码审查属性。虽然完成审查的时间、两个开发人员之间相互审查的次数，以及是否审查是为了bug修复，都对漏洞识别有积极的影响，但从审查的目录数量、开发人员的总审查次数和审查文件的先前提交总数中可以观察到相反的影响。

{"title":"Why Security Defects Go Unnoticed During Code Reviews? A Case-Control Study of the Chromium OS Project","authors":"Rajshakhar Paul, Asif Kamal Turzo, Amiangshu Bosu","doi":"10.1109/ICSE43902.2021.00124","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00124","url":null,"abstract":"Peer code review has been found to be effective in identifying security vulnerabilities. However, despite practicing mandatory code reviews, many Open Source Software (OSS) projects still encounter a large number of post-release security vulnerabilities, as some security defects escape those. Therefore, a project manager may wonder if there was any weakness or inconsistency during a code review that missed a security vulnerability. Answers to this question may help a manager pinpointing areas of concern and taking measures to improve the effectiveness of his/her project's code reviews in identifying security defects. Therefore, this study aims to identify the factors that differentiate code reviews that successfully identified security defects from those that missed such defects. With this goal, we conduct a case-control study of Chromium OS project. Using multi-stage semi-automated approaches, we build a dataset of 516 code reviews that successfully identified security defects and 374 code reviews where security defects escaped. The results of our empirical study suggest that the are significant differences between the categories of security defects that are identified and that are missed during code reviews. A logistic regression model fitted on our dataset achieved an AUC score of 0.91 and has identified nine code review attributes that influence identifications of security defects. While time to complete a review, the number of mutual reviews between two developers, and if the review is for a bug fix have positive impacts on vulnerability identification, opposite effects are observed from the number of directories under review, the number of total reviews by a developer, and the total number of prior commits for the file under review.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115904372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

GenTree: Using Decision Trees to Learn Interactions for Configurable Software GenTree:使用决策树来学习可配置软件的交互

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-13 DOI: 10.1109/ICSE43902.2021.00142

KimHao Nguyen, Thanhvu Nguyen

Modern software systems are increasingly designed to be highly configurable, which increases flexibility but can make programs harder to develop, test, and analyze, e.g., how configuration options are set to reach certain locations, what characterizes the configuration space of an interesting or buggy program behavior? We introduce GenTree, a new dynamic analysis that automatically learns a program's interactions - logical formulae that describe how configuration option settings map to code coverage. GenTree uses an iterative refinement approach that runs the program under a small sample of configurations to obtain coverage data; uses a custom classifying algorithm on these data to build decision trees representing interaction candidates; and then analyzes the trees to generate new configurations to further refine the trees and interactions in the next iteration. Our experiments on 17 configurable systems spanning 4 languages show that GenTree efficiently finds precise interactions using a tiny fraction of the configuration space.

现代软件系统越来越多地被设计为高度可配置的，这增加了灵活性，但可能使程序更难开发、测试和分析，例如，如何设置配置选项以达到特定位置，有趣或有缺陷的程序行为的配置空间的特征是什么?我们介绍了gentrei，一种新的动态分析，可以自动学习程序的交互-描述配置选项设置如何映射到代码覆盖率的逻辑公式。GenTree使用迭代的细化方法，在一个小的配置样本下运行程序，以获得覆盖率数据;在这些数据上使用自定义分类算法构建代表交互候选的决策树;然后分析树以生成新的配置，以便在下一次迭代中进一步细化树和交互。我们在跨越4种语言的17个可配置系统上的实验表明，GenTree可以使用很小一部分配置空间有效地找到精确的交互。

引用次数: 2

Understanding Bounding Functions in Safety-Critical UAV Software 理解安全关键型无人机软件中的边界函数

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-13 DOI: 10.1109/ICSE43902.2021.00119

Xiaozhou Liang, John Henry Burns, Joseph Sanchez, Karthik Dantu, Lukasz Ziarek, Yu David Liu

Unmanned Aerial Vehicles (UAVs) are an emerging computation platform known for their safety-critical need. In this paper, we conduct an empirical study on a widely used open-source UAV software framework, Paparazzi, with the goal of understanding the safety-critical concerns of UAV software from a bottom-up developer-in-the-field perspective. We set our focus on the use of Bounding Functions (BFs), the runtime checks injected by Paparazzi developers on the range of variables. Through an in-depth analysis on BFs in the Paparazzi autopilot software, we found a large number of them (109 instances) are used to bound safety-critical variables essential to the cyber-physical nature of the UAV, such as its thrust, its speed, and its sensor values. The novel contributions of this study are two fold. First, we take a static approach to classify all BF instances, presenting a novel datatype-based 5-category taxonomy with fine-grained insight on the role of BFs in ensuring the safety of UAV systems. Second, we dynamically evaluate the impact of the BF uses through a differential approach, establishing the UAV behavioral difference with and without BFs. The two-pronged static and dynamic approach together illuminates a rarely studied design space of safety-critical UAV software systems.

无人驾驶飞行器(uav)是一种新兴的计算平台，以其安全需求而闻名。在本文中，我们对广泛使用的开源无人机软件框架Paparazzi进行了实证研究，目的是从自下而上的开发人员在现场的角度理解无人机软件的安全关键问题。我们将重点放在边界函数(Bounding Functions, BFs)的使用上，即狗仔队开发人员在变量范围上注入的运行时检查。通过对Paparazzi自动驾驶软件中的BFs进行深入分析，我们发现大量BFs(109个实例)用于绑定对无人机网络物理性质至关重要的安全关键变量，例如其推力，速度和传感器值。这项研究的新贡献有两个方面。首先，我们采用静态方法对所有BF实例进行分类，提出了一种新的基于数据类型的5类分类法，并对BF在确保无人机系统安全方面的作用进行了细粒度的洞察。其次，通过差分方法动态评估BF使用的影响，建立有BF和无BF的无人机行为差异。静态和动态双管齐下的方法共同阐明了一个很少研究安全关键型无人机软件系统的设计空间。

{"title":"Understanding Bounding Functions in Safety-Critical UAV Software","authors":"Xiaozhou Liang, John Henry Burns, Joseph Sanchez, Karthik Dantu, Lukasz Ziarek, Yu David Liu","doi":"10.1109/ICSE43902.2021.00119","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00119","url":null,"abstract":"Unmanned Aerial Vehicles (UAVs) are an emerging computation platform known for their safety-critical need. In this paper, we conduct an empirical study on a widely used open-source UAV software framework, Paparazzi, with the goal of understanding the safety-critical concerns of UAV software from a bottom-up developer-in-the-field perspective. We set our focus on the use of Bounding Functions (BFs), the runtime checks injected by Paparazzi developers on the range of variables. Through an in-depth analysis on BFs in the Paparazzi autopilot software, we found a large number of them (109 instances) are used to bound safety-critical variables essential to the cyber-physical nature of the UAV, such as its thrust, its speed, and its sensor values. The novel contributions of this study are two fold. First, we take a static approach to classify all BF instances, presenting a novel datatype-based 5-category taxonomy with fine-grained insight on the role of BFs in ensuring the safety of UAV systems. Second, we dynamically evaluate the impact of the BF uses through a differential approach, establishing the UAV behavioral difference with and without BFs. The two-pronged static and dynamic approach together illuminates a rarely studied design space of safety-critical UAV software systems.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124198538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Same File, Different Changes: The Potential of Meta-Maintenance on GitHub 相同的文件，不同的更改:GitHub上元维护的潜力

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-12 DOI: 10.1109/ICSE43902.2021.00076

Hideaki Hata, R. Kula, T. Ishio, Christoph Treude

Online collaboration platforms such as GitHub have provided software developers with the ability to easily reuse and share code between repositories. With clone-and-own and forking becoming prevalent, maintaining these shared files is important, especially for keeping the most up-to-date version of reused code. Different to related work, we propose the concept of meta-maintenance—i.e., tracking how the same files evolve in different repositories with the aim to provide useful maintenance opportunities to those files. We conduct an exploratory study by analyzing repositories from seven different programming languages to explore the potential of meta-maintenance. Our results indicate that a majority of active repositories on GitHub contains at least one file which is also present in another repository, and that a significant minority of these files are maintained differently in the different repositories which contain them. We manually analyzed a representative sample of shared files and their variants to understand which changes might be useful for meta-maintenance. Our findings support the potential of meta-maintenance and open up avenues for future work to capitalize on this potential.

像GitHub这样的在线协作平台为软件开发人员提供了在存储库之间轻松重用和共享代码的能力。随着“克隆并拥有”和分叉的流行，维护这些共享文件非常重要，特别是对于保持重用代码的最新版本。与相关工作不同，我们提出了元维护的概念，即:，跟踪相同文件在不同存储库中的演变，目的是为这些文件提供有用的维护机会。我们通过分析来自七种不同编程语言的存储库来进行探索性研究，以探索元维护的潜力。我们的结果表明，GitHub上的大多数活动存储库至少包含一个文件，这些文件也存在于另一个存储库中，并且这些文件中的一小部分在包含它们的不同存储库中以不同的方式维护。我们手动分析了共享文件及其变体的代表性示例，以了解哪些更改可能对元维护有用。我们的研究结果支持元维护的潜力，并为未来利用这一潜力的工作开辟了道路。

{"title":"Same File, Different Changes: The Potential of Meta-Maintenance on GitHub","authors":"Hideaki Hata, R. Kula, T. Ishio, Christoph Treude","doi":"10.1109/ICSE43902.2021.00076","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00076","url":null,"abstract":"Online collaboration platforms such as GitHub have provided software developers with the ability to easily reuse and share code between repositories. With clone-and-own and forking becoming prevalent, maintaining these shared files is important, especially for keeping the most up-to-date version of reused code. Different to related work, we propose the concept of meta-maintenance—i.e., tracking how the same files evolve in different repositories with the aim to provide useful maintenance opportunities to those files. We conduct an exploratory study by analyzing repositories from seven different programming languages to explore the potential of meta-maintenance. Our results indicate that a majority of active repositories on GitHub contains at least one file which is also present in another repository, and that a significant minority of these files are maintained differently in the different repositories which contain them. We manually analyzed a representative sample of shared files and their variants to understand which changes might be useful for meta-maintenance. Our findings support the potential of meta-maintenance and open up avenues for future work to capitalize on this potential.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121417187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

White-Box Performance-Influence Models: A Profiling and Learning Approach 白盒绩效影响模型:分析和学习方法

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-12 DOI: 10.1109/ICSE43902.2021.00099

Max Weber, S. Apel, Norbert Siegmund

Many modern software systems are highly configurable, allowing the user to tune them for performance and more. Current performance modeling approaches aim at finding performance-optimal configurations by building performance models in a black-box manner. While these models provide accurate estimates, they cannot pinpoint causes of observed performance behavior to specific code regions. This does not only hinder system understanding, but it also complicates tracing the influence of configuration options to individual methods. We propose a white-box approach that models configuration-dependent performance behavior at the method level. This allows us to predict the influence of configuration decisions on individual methods, supporting system understanding and performance debugging. The approach consists of two steps: First, we use a coarse-grained profiler and learn performance-influence models for all methods, potentially identifying some methods that are highly configuration-and performance-sensitive, causing inaccurate predictions. Second, we re-measure these methods with a fine-grained profiler and learn more accurate models, at higher cost, though. By means of 9 real-world Java software systems, we demonstrate that our approach can efficiently identify configuration-relevant methods and learn accurate performance-influence models.

许多现代软件系统都是高度可配置的，允许用户对它们进行性能调整等等。当前的性能建模方法旨在通过以黑盒方式构建性能模型来寻找性能最优配置。虽然这些模型提供了准确的估计，但它们不能精确地指出观察到的特定代码区域的性能行为的原因。这不仅阻碍了对系统的理解，而且还使跟踪配置选项对单个方法的影响变得复杂。我们提出了一种白盒方法，在方法级别对依赖于配置的性能行为进行建模。这使我们能够预测配置决策对单个方法的影响，从而支持系统理解和性能调试。该方法包括两个步骤:首先，我们使用粗粒度分析器并学习所有方法的性能影响模型，可能会识别出一些对配置和性能高度敏感的方法，从而导致不准确的预测。其次，我们使用细粒度分析器重新测量这些方法，并以更高的成本学习更准确的模型。通过9个真实的Java软件系统，我们证明了我们的方法可以有效地识别与配置相关的方法，并学习准确的性能影响模型。

{"title":"White-Box Performance-Influence Models: A Profiling and Learning Approach","authors":"Max Weber, S. Apel, Norbert Siegmund","doi":"10.1109/ICSE43902.2021.00099","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00099","url":null,"abstract":"Many modern software systems are highly configurable, allowing the user to tune them for performance and more. Current performance modeling approaches aim at finding performance-optimal configurations by building performance models in a black-box manner. While these models provide accurate estimates, they cannot pinpoint causes of observed performance behavior to specific code regions. This does not only hinder system understanding, but it also complicates tracing the influence of configuration options to individual methods. We propose a white-box approach that models configuration-dependent performance behavior at the method level. This allows us to predict the influence of configuration decisions on individual methods, supporting system understanding and performance debugging. The approach consists of two steps: First, we use a coarse-grained profiler and learn performance-influence models for all methods, potentially identifying some methods that are highly configuration-and performance-sensitive, causing inaccurate predictions. Second, we re-measure these methods with a fine-grained profiler and learn more accurate models, at higher cost, though. By means of 9 real-world Java software systems, we demonstrate that our approach can efficiently identify configuration-relevant methods and learn accurate performance-influence models.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124767806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

What Helped, and what did not? An Evaluation of the Strategies to Improve Continuous Integration 什么有帮助，什么没有?持续集成改进策略的评价

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-12 DOI: 10.1109/ICSE43902.2021.00031

Xianhao Jin, Francisco Servant

Continuous integration (CI) is a widely used practice in modern software engineering. Unfortunately, it is also an expensive practice - Google and Mozilla estimate their CI systems in millions of dollars. There are a number of techniques and tools designed to or having the potential to save the cost of CI or expand its benefit - reducing time to feedback. However, their benefits in some dimensions may also result in drawbacks in others. They may also be beneficial in other scenarios where they are not designed to help. In this paper, we perform the first exhaustive comparison of techniques to improve CI, evaluating 14 variants of 10 techniques using selection and prioritization strategies on build and test granularity. We evaluate their strengths and weaknesses with 10 different cost and time-tofeedback saving metrics on 100 real-world projects. We analyze the results of all techniques to understand the design decisions that helped different dimensions of benefit. We also synthesized those results to lay out a series of recommendations for the development of future research techniques to advance this area.

持续集成(CI)是现代软件工程中广泛应用的实践。不幸的是，这也是一种昂贵的做法——Google和Mozilla估计他们的CI系统耗资数百万美元。有许多技术和工具旨在或有潜力节省持续集成的成本或扩大其好处——减少反馈时间。然而，它们在某些方面的好处也可能导致其他方面的缺点。它们也可能在其他场景中是有益的，在这些场景中它们并没有被设计用来提供帮助。在本文中，我们对改进CI的技术进行了第一次详尽的比较，在构建和测试粒度上使用选择和优先级策略评估了10种技术的14种变体。我们用100个真实项目的10个不同的成本和时间反馈节约指标来评估他们的优势和劣势。我们分析了所有技术的结果，以了解有助于不同利益维度的设计决策。我们还综合了这些结果，为未来研究技术的发展提出了一系列建议，以推动这一领域的发展。

{"title":"What Helped, and what did not? An Evaluation of the Strategies to Improve Continuous Integration","authors":"Xianhao Jin, Francisco Servant","doi":"10.1109/ICSE43902.2021.00031","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00031","url":null,"abstract":"Continuous integration (CI) is a widely used practice in modern software engineering. Unfortunately, it is also an expensive practice - Google and Mozilla estimate their CI systems in millions of dollars. There are a number of techniques and tools designed to or having the potential to save the cost of CI or expand its benefit - reducing time to feedback. However, their benefits in some dimensions may also result in drawbacks in others. They may also be beneficial in other scenarios where they are not designed to help. In this paper, we perform the first exhaustive comparison of techniques to improve CI, evaluating 14 variants of 10 techniques using selection and prioritization strategies on build and test granularity. We evaluate their strengths and weaknesses with 10 different cost and time-tofeedback saving metrics on 100 real-world projects. We analyze the results of all techniques to understand the design decisions that helped different dimensions of benefit. We also synthesized those results to lay out a series of recommendations for the development of future research techniques to advance this area.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124437098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Leaving My Fingerprints: Motivations and Challenges of Contributing to OSS for Social Good 留下我的指纹:为社会公益贡献OSS的动机和挑战

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-12 DOI: 10.1109/ICSE43902.2021.00096

Yu Huang, Denae Ford, Thomas Zimmermann

When inspiring software developers to contribute to open source software, the act is often referenced as an opportunity to build tools to support the developer community. However, that is not the only charge that propels contributions—growing interest in open source has also been attributed to software developers deciding to use their technical skills to benefit a common societal good. To understand how developers identify these projects, their motivations for contributing, and challenges they face, we conducted 21 semi-structured interviews with OSS for Social Good (OSS4SG) contributors. From our interview analysis, we identified themes of contribution styles that we wanted to understand at scale by deploying a survey to over 5765 OSS and Open Source Software for Social Good contributors. From our quantitative analysis of 517 responses, we find that the majority of contributors demonstrate a distinction between OSS4SG and OSS. Likewise, contributors described definitions based on what societal issue the project was to mitigate and who the outcomes of the project were going to benefit. In addition, we find that OSS4SG contributors focus less on benefiting themselves by padding their resume with new technology skills and are more interested in leaving their mark on society at statistically significant levels. We also find that OSS4SG contributors evaluate the owners of the project significantly more than OSS contributors. These findings inform implications to help contributors identify high societal impact projects, help project maintainers reduce barriers to entry, and help organizations understand why contributors are drawn to these projects to sustain active participation.

当鼓励软件开发人员为开源软件做出贡献时，这种行为通常被认为是构建工具以支持开发人员社区的机会。然而，这并不是推动贡献的唯一原因——对开源的兴趣日益增长也归因于软件开发人员决定利用他们的技术技能造福于共同的社会利益。为了了解开发人员如何识别这些项目，他们贡献的动机，以及他们面临的挑战，我们对OSS for Social Good (OSS4SG)贡献者进行了21次半结构化访谈。从我们的访谈分析中，我们确定了贡献风格的主题，我们希望通过对5765多个OSS和开源软件的社会公益贡献者进行调查来大规模地了解这些主题。从我们对517个回复的定量分析中，我们发现大多数贡献者展示了OSS4SG和OSS之间的区别。同样，贡献者根据项目要缓解的社会问题以及项目的结果将使谁受益来描述定义。此外，我们发现OSS4SG贡献者不太关注通过在简历中填充新技术技能来为自己谋取利益，而更感兴趣的是在统计上显著的水平上给社会留下印记。我们还发现OSS4SG贡献者对项目所有者的评估明显高于OSS贡献者。这些发现可以帮助贡献者识别高社会影响的项目，帮助项目维护者减少进入的障碍，并帮助组织理解为什么贡献者被吸引到这些项目中来保持积极的参与。

{"title":"Leaving My Fingerprints: Motivations and Challenges of Contributing to OSS for Social Good","authors":"Yu Huang, Denae Ford, Thomas Zimmermann","doi":"10.1109/ICSE43902.2021.00096","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00096","url":null,"abstract":"When inspiring software developers to contribute to open source software, the act is often referenced as an opportunity to build tools to support the developer community. However, that is not the only charge that propels contributions—growing interest in open source has also been attributed to software developers deciding to use their technical skills to benefit a common societal good. To understand how developers identify these projects, their motivations for contributing, and challenges they face, we conducted 21 semi-structured interviews with OSS for Social Good (OSS4SG) contributors. From our interview analysis, we identified themes of contribution styles that we wanted to understand at scale by deploying a survey to over 5765 OSS and Open Source Software for Social Good contributors. From our quantitative analysis of 517 responses, we find that the majority of contributors demonstrate a distinction between OSS4SG and OSS. Likewise, contributors described definitions based on what societal issue the project was to mitigate and who the outcomes of the project were going to benefit. In addition, we find that OSS4SG contributors focus less on benefiting themselves by padding their resume with new technology skills and are more interested in leaving their mark on society at statistically significant levels. We also find that OSS4SG contributors evaluate the owners of the project significantly more than OSS contributors. These findings inform implications to help contributors identify high societal impact projects, help project maintainers reduce barriers to entry, and help organizations understand why contributors are drawn to these projects to sustain active participation.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"10 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131521018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16