2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)最新文献_第5页

An Empirical Comparison of Compiler Testing Techniques 编译器测试技术的实证比较

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884878

Junjie Chen, Wenxiang Hu, Dan Hao, Yingfei Xiong, Hongyu Zhang, Lu Zhang, Bing Xie

Compilers, as one of the most important infrastructure of today’s digital world, are expected to be trustworthy. Different testing techniques are developed for testing compilers automatically. However, it is unknown so far how these testing techniques compared to each other in terms of testing effectiveness: how many bugs a testing technique can find within a time limit. In this paper, we conduct a systematic and comprehensive empirical comparison of three compiler testing techniques, namely, Randomized Differential Testing (RDT), a variant of RDT—Different Optimization Levels (DOL), and Equivalence Modulo Inputs (EMI). Our results show that DOL is more effective at detecting bugs related to optimization, whereas RDT is more effective at detecting other types of bugs, and the three techniques can complement each other to a certain degree. Furthermore, in order to understand why their effectiveness differs, we investigate three factors that influence the effectiveness of compiler testing, namely, efficiency, strength of test oracles, and effectiveness of generated test programs. The results indicate that all the three factors are statistically significant, and efficiency has the most significant impact.

编译器作为当今数字世界最重要的基础设施之一，被期望是值得信赖的。为自动测试编译器开发了不同的测试技术。然而，目前还不清楚这些测试技术在测试效率方面是如何相互比较的:一种测试技术在一个时间限制内可以发现多少错误。本文对随机差分测试(RDT)、随机差分测试的一种变体——不同优化水平测试(DOL)和等效模输入测试(EMI)三种编译器测试技术进行了系统、全面的实证比较。我们的研究结果表明，DOL在检测与优化相关的bug时更有效，而RDT在检测其他类型的bug时更有效，并且这三种技术可以在一定程度上互补。此外，为了理解为什么它们的有效性不同，我们研究了影响编译器测试有效性的三个因素，即效率、测试oracle的强度和生成的测试程序的有效性。结果表明，三个影响因素均具有统计学意义，其中效率的影响最为显著。

{"title":"An Empirical Comparison of Compiler Testing Techniques","authors":"Junjie Chen, Wenxiang Hu, Dan Hao, Yingfei Xiong, Hongyu Zhang, Lu Zhang, Bing Xie","doi":"10.1145/2884781.2884878","DOIUrl":"https://doi.org/10.1145/2884781.2884878","url":null,"abstract":"Compilers, as one of the most important infrastructure of today’s digital world, are expected to be trustworthy. Different testing techniques are developed for testing compilers automatically. However, it is unknown so far how these testing techniques compared to each other in terms of testing effectiveness: how many bugs a testing technique can find within a time limit. In this paper, we conduct a systematic and comprehensive empirical comparison of three compiler testing techniques, namely, Randomized Differential Testing (RDT), a variant of RDT—Different Optimization Levels (DOL), and Equivalence Modulo Inputs (EMI). Our results show that DOL is more effective at detecting bugs related to optimization, whereas RDT is more effective at detecting other types of bugs, and the three techniques can complement each other to a certain degree. Furthermore, in order to understand why their effectiveness differs, we investigate three factors that influence the effectiveness of compiler testing, namely, efficiency, strength of test oracles, and effectiveness of generated test programs. The results indicate that all the three factors are statistically significant, and efficiency has the most significant impact.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"17 1","pages":"180-190"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82032512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 89

How Does the Degree of Variability Affect Bug Finding? 可变性的程度如何影响Bug发现?

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884831

Jean Melo, Claus Brabrand, A. Wąsowski

Software projects embrace variability to increase adaptability and to lower cost; however, others blame variability for increasing complexity and making reasoning about programs more difficult. We carry out a controlled experiment to quantify the impact of variability on debugging of preprocessor- based programs. We measure speed and precision for bug finding tasks defined at three different degrees of variability on several subject programs derived from real systems. The results show that the speed of bug finding decreases linearly with the degree of variability, while effectiveness of finding bugs is relatively independent of the degree of variability. Still, identifying the set of configurations in which the bug manifests itself is difficult already for a low degree of variability. Surprisingly, identifying the exact set of affected configurations appears to be harder than finding the bug in the first place. The difficulty in reasoning about several configurations is a likely reason why the variability bugs are actually introduced in configurable programs. We hope that the detailed findings presented here will inspire the creation of programmer support tools addressing the challenges faced by developers when reasoning about configurations, contributing to more effective debugging and, ultimately, fewer bugs in highly-configurable systems.

软件项目采用可变性来增加适应性和降低成本;然而，其他人指责可变性增加了复杂性，使程序推理变得更加困难。我们进行了一项对照实验来量化可变性对基于预处理器的程序调试的影响。我们在几个源自真实系统的主题程序中，以三种不同程度的可变性来衡量查找bug任务的速度和精度。结果表明，查找bug的速度随变异程度线性降低，而查找bug的有效性与变异程度相对独立。尽管如此，由于低程度的可变性，确定bug所表现的配置集已经很困难了。令人惊讶的是，确定受影响的配置的确切集合似乎比首先找到错误要困难。对几种配置进行推理的困难可能是在可配置程序中引入可变性错误的原因。我们希望这里提供的详细发现将启发程序员支持工具的创建，解决开发人员在推理配置时面临的挑战，有助于更有效的调试，并最终减少高度可配置系统中的错误。

{"title":"How Does the Degree of Variability Affect Bug Finding?","authors":"Jean Melo, Claus Brabrand, A. Wąsowski","doi":"10.1145/2884781.2884831","DOIUrl":"https://doi.org/10.1145/2884781.2884831","url":null,"abstract":"Software projects embrace variability to increase adaptability and to lower cost; however, others blame variability for increasing complexity and making reasoning about programs more difficult. We carry out a controlled experiment to quantify the impact of variability on debugging of preprocessor- based programs. We measure speed and precision for bug finding tasks defined at three different degrees of variability on several subject programs derived from real systems. The results show that the speed of bug finding decreases linearly with the degree of variability, while effectiveness of finding bugs is relatively independent of the degree of variability. Still, identifying the set of configurations in which the bug manifests itself is difficult already for a low degree of variability. Surprisingly, identifying the exact set of affected configurations appears to be harder than finding the bug in the first place. The difficulty in reasoning about several configurations is a likely reason why the variability bugs are actually introduced in configurable programs. We hope that the detailed findings presented here will inspire the creation of programmer support tools addressing the challenges faced by developers when reasoning about configurations, contributing to more effective debugging and, ultimately, fewer bugs in highly-configurable systems.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"45 1","pages":"679-690"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90394986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

StubDroid: Automatic Inference of Precise Data-Flow Summaries for the Android Framework StubDroid: Android框架中精确数据流摘要的自动推断

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884816

Steven Arzt, E. Bodden

Smartphone users suffer from insucient information on how commercial as well as malicious apps handle sensitive data stored on their phones. Automated taint analyses address this problem by allowing users to detect and investigate how applications access and handle this data. A current problem with virtually all those analysis approaches is, though, that they rely on explicit models of the Android runtime library. In most cases, the existence of those models is taken for granted, despite the fact that the models are hard to come by: Given the size and evolution speed of a modern smartphone operating system it is prohibitively expensive to derive models manually from code or documentation. In this work, we therefore present StubDroid, the first fully automated approach for inferring precise and efficient library models for taint-analysis problems. StubDroid automatically constructs these summaries from a binary distribution of the library. In our experiments, we use StubDroid-inferred models to prevent the static taint analysis FlowDroid from having to re-analyze the Android runtime library over and over again for each analyzed app. As the results show, the models make it possible to analyze apps in seconds whereas most complete re-analyses would time out after 30 minutes. Yet, StubDroid yields comparable precision. In comparison to manually crafted summaries, StubDroid's cause the analysis to be more precise and to use less time and memory.

智能手机用户对商业和恶意应用程序如何处理存储在他们手机上的敏感数据的信息一无所知。自动化污染分析通过允许用户检测和调查应用程序如何访问和处理这些数据来解决这个问题。然而，几乎所有这些分析方法当前的一个问题是，它们依赖于Android运行时库的显式模型。在大多数情况下，这些模型的存在被认为是理所当然的，尽管这些模型很难得到:考虑到现代智能手机操作系统的规模和发展速度，从代码或文档中手动导出模型的成本非常高。因此，在这项工作中，我们提出了StubDroid，这是第一个完全自动化的方法，用于推断用于污染分析问题的精确和有效的库模型。StubDroid从库的二进制分布自动构造这些摘要。在我们的实验中，我们使用stubdroid推断的模型来防止静态污染分析，FlowDroid不必为每个分析的应用程序一遍又一遍地重新分析Android运行时库。正如结果所示，模型可以在几秒钟内分析应用程序，而大多数完整的重新分析将在30分钟后超时。然而，StubDroid提供了相当的精度。与手工制作的摘要相比，StubDroid的分析更精确，使用的时间和内存更少。

{"title":"StubDroid: Automatic Inference of Precise Data-Flow Summaries for the Android Framework","authors":"Steven Arzt, E. Bodden","doi":"10.1145/2884781.2884816","DOIUrl":"https://doi.org/10.1145/2884781.2884816","url":null,"abstract":"Smartphone users suffer from insucient information on how commercial as well as malicious apps handle sensitive data stored on their phones. Automated taint analyses address this problem by allowing users to detect and investigate how applications access and handle this data. A current problem with virtually all those analysis approaches is, though, that they rely on explicit models of the Android runtime library. In most cases, the existence of those models is taken for granted, despite the fact that the models are hard to come by: Given the size and evolution speed of a modern smartphone operating system it is prohibitively expensive to derive models manually from code or documentation. In this work, we therefore present StubDroid, the first fully automated approach for inferring precise and efficient library models for taint-analysis problems. StubDroid automatically constructs these summaries from a binary distribution of the library. In our experiments, we use StubDroid-inferred models to prevent the static taint analysis FlowDroid from having to re-analyze the Android runtime library over and over again for each analyzed app. As the results show, the models make it possible to analyze apps in seconds whereas most complete re-analyses would time out after 30 minutes. Yet, StubDroid yields comparable precision. In comparison to manually crafted summaries, StubDroid's cause the analysis to be more precise and to use less time and memory.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"28 1","pages":"725-735"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85075169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

Cross-Supervised Synthesis of Web-Crawlers 网络爬虫的交叉监督合成

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884842

Adi Omari, Sharon Shoham, Eran Yahav

A web-crawler is a program that automatically and systematically tracks the links of a website and extracts information from its pages. Due to the different formats of websites, the crawling scheme for different sites can differ dramatically. Manually customizing a crawler for each specific site is time consuming and error-prone. Furthermore, because sites periodically change their format and presentation, crawling schemes have to be manually updated and adjusted. In this paper, we present a technique for automatic synthesis of web-crawlers from examples. The main idea is to use hand-crafted (possibly partial) crawlers for some websites as the basis for crawling other sites that contain the same kind of information. Technically, we use the data on one site to identify data on another site. We then use the identified data to learn the website structure and synthesize an appropriate extraction scheme. We iterate this process, as synthesized extraction schemes result in additional data to be used for re-learning the website structure. We implemented our approach and automatically synthesized 30 crawlers for websites from nine different categories: books, TVs, conferences, universities, cameras, phones, movies, songs, and hotels.

网络爬虫是一种自动系统地跟踪网站链接并从其页面中提取信息的程序。由于网站的格式不同，不同网站的抓取方案可能会有很大的不同。手动为每个特定站点定制爬虫既耗时又容易出错。此外，由于网站会定期更改其格式和表示，因此必须手动更新和调整爬行方案。本文从实例出发，提出了一种自动合成网络爬虫的技术。主要思想是对一些网站使用手工制作的(可能是部分的)抓取工具，作为抓取包含相同类型信息的其他网站的基础。从技术上讲，我们使用一个站点上的数据来识别另一个站点上的数据。然后利用识别出的数据来了解网站结构，并综合出合适的提取方案。我们重复这个过程，因为综合的提取方案会产生额外的数据，用于重新学习网站结构。我们实现了我们的方法，并自动合成了30个爬虫，用于9个不同类别的网站:书籍、电视、会议、大学、相机、电话、电影、歌曲和酒店。

{"title":"Cross-Supervised Synthesis of Web-Crawlers","authors":"Adi Omari, Sharon Shoham, Eran Yahav","doi":"10.1145/2884781.2884842","DOIUrl":"https://doi.org/10.1145/2884781.2884842","url":null,"abstract":"A web-crawler is a program that automatically and systematically tracks the links of a website and extracts information from its pages. Due to the different formats of websites, the crawling scheme for different sites can differ dramatically. Manually customizing a crawler for each specific site is time consuming and error-prone. Furthermore, because sites periodically change their format and presentation, crawling schemes have to be manually updated and adjusted. In this paper, we present a technique for automatic synthesis of web-crawlers from examples. The main idea is to use hand-crafted (possibly partial) crawlers for some websites as the basis for crawling other sites that contain the same kind of information. Technically, we use the data on one site to identify data on another site. We then use the identified data to learn the website structure and synthesize an appropriate extraction scheme. We iterate this process, as synthesized extraction schemes result in additional data to be used for re-learning the website structure. We implemented our approach and automatically synthesized 30 crawlers for websites from nine different categories: books, TVs, conferences, universities, cameras, phones, movies, songs, and hotels.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"18 1","pages":"368-379"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74179449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

On the Techniques We Create, the Tools We Build, and Their Misalignments: A Study of KLEE 论我们创造的技术、我们构建的工具及其错位:KLEE的研究

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884835

Eric F. Rizzi, Sebastian G. Elbaum, Matthew B. Dwyer

Our community constantly pushes the state-of-the-art by introducing “new” techniques. These techniques often build on top of, and are compared against, existing systems that realize previously published techniques. The underlying assumption is that existing systems correctly represent the techniques they implement. This pa- per examines that assumption through a study of KLEE, a popular and well-cited tool in our community. We briefly describe six improvements we made to KLEE, none of which can be considered “new” techniques, that provide order-of-magnitude performance gains. Given these improvements, we then investigate how the results and conclusions of a sample of papers that cite KLEE are affected. Our findings indicate that the strong emphasis on introducing “new” techniques may lead to wasted effort, missed opportunities for progress, an accretion of artifact complexity, and questionable research conclusions (in our study, 27% of the papers that depend on KLEE can be questioned). We conclude by revisiting initiatives that may help to realign the incentives to better support the foundations on which we build.

我们的社区不断通过引入“新”技术来推动最先进的技术。这些技术通常构建在实现先前发布的技术的现有系统之上，并与之进行比较。潜在的假设是，现有系统正确地表示了它们实现的技术。本文通过对KLEE的研究来检验这一假设，KLEE是我们社区中一个流行且被广泛引用的工具。我们简要描述了我们对KLEE所做的六项改进，其中没有一项可以被认为是“新”技术，它们提供了数量级的性能提升。鉴于这些改进，我们随后调查了引用KLEE的论文样本的结果和结论是如何受到影响的。我们的发现表明，过分强调引入“新”技术可能会导致浪费努力、错失进步的机会、人工制品复杂性的增加，以及有问题的研究结论(在我们的研究中，27%依赖于KLEE的论文可能会受到质疑)。最后，我们回顾了一些可能有助于重新调整激励机制的举措，以更好地支持我们赖以建立的基础。

{"title":"On the Techniques We Create, the Tools We Build, and Their Misalignments: A Study of KLEE","authors":"Eric F. Rizzi, Sebastian G. Elbaum, Matthew B. Dwyer","doi":"10.1145/2884781.2884835","DOIUrl":"https://doi.org/10.1145/2884781.2884835","url":null,"abstract":"Our community constantly pushes the state-of-the-art by introducing “new” techniques. These techniques often build on top of, and are compared against, existing systems that realize previously published techniques. The underlying assumption is that existing systems correctly represent the techniques they implement. This pa- per examines that assumption through a study of KLEE, a popular and well-cited tool in our community. We briefly describe six improvements we made to KLEE, none of which can be considered “new” techniques, that provide order-of-magnitude performance gains. Given these improvements, we then investigate how the results and conclusions of a sample of papers that cite KLEE are affected. Our findings indicate that the strong emphasis on introducing “new” techniques may lead to wasted effort, missed opportunities for progress, an accretion of artifact complexity, and questionable research conclusions (in our study, 27% of the papers that depend on KLEE can be questioned). We conclude by revisiting initiatives that may help to realign the incentives to better support the foundations on which we build.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"58 1","pages":"132-143"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73918012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Automated Test Suite Generation for Time-Continuous Simulink Models 时间连续Simulink模型的自动化测试套件生成

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884797

Reza Matinnejad, S. Nejati, L. Briand, T. Bruckmann

All engineering disciplines are founded and rely on models, although they may differ on purposes and usages of modeling. Interdisciplinary domains such as Cyber Physical Systems (CPSs) seek approaches that incorporate different modeling needs and usages. Specifically, the Simulink modeling platform greatly appeals to CPS engineers due to its seamless support for simulation and code generation. In this paper, we propose a test generation approach that is applicable to Simulink models built for both purposes of simulation and code generation. We define test inputs and outputs as signals that capture evolution of values over time. Our test generation approach is implemented as a meta-heuristic search algorithm and is guided to produce test outputs with diverse shapes according to our proposed notion of diversity. Our evaluation, performed on industrial and public domain models, demonstrates that: (1) In contrast to the existing tools for testing Simulink models that are only applicable to a subset of code generation models, our approach is applicable to both code generation and simulation Simulink models. (2) Our new notion of diversity for output signals outperforms random baseline testing and an existing notion of signal diversity in revealing faults in Simulink models. (3) The fault revealing ability of our test generation approach outperforms that of the Simulink Design Verifier, the only testing toolbox for Simulink.

所有的工程学科都是建立并依赖于模型的，尽管它们在建模的目的和用法上可能有所不同。诸如网络物理系统(cps)这样的跨学科领域寻求结合不同建模需求和用法的方法。具体来说，Simulink建模平台由于其对仿真和代码生成的无缝支持而极大地吸引了CPS工程师。在本文中，我们提出了一种测试生成方法，该方法适用于为模拟和代码生成而构建的Simulink模型。我们将测试输入和输出定义为捕获随时间变化的值的信号。我们的测试生成方法是作为一种元启发式搜索算法来实现的，并根据我们提出的多样性概念来指导产生具有不同形状的测试输出。我们对工业和公共领域模型进行的评估表明:(1)与仅适用于代码生成模型子集的现有测试Simulink模型的工具相比，我们的方法既适用于代码生成模型，也适用于仿真Simulink模型。(2)我们的输出信号分集的新概念在揭示Simulink模型中的故障方面优于随机基线测试和现有的信号分集概念。(3)我们的测试生成方法的故障揭示能力优于Simulink唯一的测试工具箱Simulink Design Verifier。

{"title":"Automated Test Suite Generation for Time-Continuous Simulink Models","authors":"Reza Matinnejad, S. Nejati, L. Briand, T. Bruckmann","doi":"10.1145/2884781.2884797","DOIUrl":"https://doi.org/10.1145/2884781.2884797","url":null,"abstract":"All engineering disciplines are founded and rely on models, although they may differ on purposes and usages of modeling. Interdisciplinary domains such as Cyber Physical Systems (CPSs) seek approaches that incorporate different modeling needs and usages. Specifically, the Simulink modeling platform greatly appeals to CPS engineers due to its seamless support for simulation and code generation. In this paper, we propose a test generation approach that is applicable to Simulink models built for both purposes of simulation and code generation. We define test inputs and outputs as signals that capture evolution of values over time. Our test generation approach is implemented as a meta-heuristic search algorithm and is guided to produce test outputs with diverse shapes according to our proposed notion of diversity. Our evaluation, performed on industrial and public domain models, demonstrates that: (1) In contrast to the existing tools for testing Simulink models that are only applicable to a subset of code generation models, our approach is applicable to both code generation and simulation Simulink models. (2) Our new notion of diversity for output signals outperforms random baseline testing and an existing notion of signal diversity in revealing faults in Simulink models. (3) The fault revealing ability of our test generation approach outperforms that of the Simulink Design Verifier, the only testing toolbox for Simulink.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"26 1","pages":"595-606"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78112018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 83

Revisiting Code Ownership and Its Relationship with Software Quality in the Scope of Modern Code Review 从现代代码审查的角度审视代码所有权及其与软件质量的关系

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884852

Patanamon Thongtanunam, Shane McIntosh, A. Hassan, Hajimu Iida

Code ownership establishes a chain of responsibility for modules in large software systems. Although prior work uncovers a link between code ownership heuristics and software quality, these heuristics rely solely on the authorship of code changes. In addition to authoring code changes, developers also make important contributions to a module by reviewing code changes. Indeed, recent work shows that reviewers are highly active in modern code review processes, often suggesting alternative solutions or providing updates to the code changes. In this paper, we complement traditional code ownership heuristics using code review activity. Through a case study of six releases of the large Qt and OpenStack systems, we find that: (1) 67%-86% of developers did not author any code changes for a module, but still actively contributed by reviewing 21%-39% of the code changes, (2) code ownership heuristics that are aware of reviewing activity share a relationship with software quality, and (3) the proportion of reviewers without expertise shares a strong, increasing relationship with the likelihood of having post-release defects. Our results suggest that reviewing activity captures an important aspect of code ownership, and should be included in approximations of it in future studies.

代码所有权建立了大型软件系统中模块的责任链。尽管先前的工作揭示了代码所有权启发式和软件质量之间的联系，但是这些启发式仅仅依赖于代码变更的作者。除了编写代码更改之外，开发人员还通过审查代码更改对模块做出重要贡献。事实上，最近的工作表明，审查者在现代代码审查过程中非常活跃，经常建议替代解决方案或提供代码更改的更新。在本文中，我们使用代码审查活动来补充传统的代码所有权启发式方法。通过对大型Qt和OpenStack系统的六个版本的案例研究，我们发现:(1)67%-86%的开发人员没有为模块编写任何代码更改，但仍然通过审查21%-39%的代码更改积极贡献;(2)意识到审查活动的代码所有权启发式与软件质量有关系;(3)没有专业知识的审查人员的比例与发布后缺陷的可能性有很强的关系。我们的结果表明，审查活动抓住了代码所有权的一个重要方面，并且应该在未来的研究中包含它的近似值。

{"title":"Revisiting Code Ownership and Its Relationship with Software Quality in the Scope of Modern Code Review","authors":"Patanamon Thongtanunam, Shane McIntosh, A. Hassan, Hajimu Iida","doi":"10.1145/2884781.2884852","DOIUrl":"https://doi.org/10.1145/2884781.2884852","url":null,"abstract":"Code ownership establishes a chain of responsibility for modules in large software systems. Although prior work uncovers a link between code ownership heuristics and software quality, these heuristics rely solely on the authorship of code changes. In addition to authoring code changes, developers also make important contributions to a module by reviewing code changes. Indeed, recent work shows that reviewers are highly active in modern code review processes, often suggesting alternative solutions or providing updates to the code changes. In this paper, we complement traditional code ownership heuristics using code review activity. Through a case study of six releases of the large Qt and OpenStack systems, we find that: (1) 67%-86% of developers did not author any code changes for a module, but still actively contributed by reviewing 21%-39% of the code changes, (2) code ownership heuristics that are aware of reviewing activity share a relationship with software quality, and (3) the proportion of reviewers without expertise shares a strong, increasing relationship with the likelihood of having post-release defects. Our results suggest that reviewing activity captures an important aspect of code ownership, and should be included in approximations of it in future studies.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"24 1","pages":"1039-1050"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78784161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 111

Quantifying and Mitigating Turnover-Induced Knowledge Loss: Case Studies of Chrome and a Project at Avaya 量化和减轻人员流失导致的知识损失:Chrome和Avaya项目的案例研究

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884851

Peter C. Rigby, Y. Zhu, Samuel M. Donadelli, A. Mockus

The utility of source code, as of other knowledge artifacts, is predicated on the existence of individuals skilled enough to derive value by using or improving it. Developers leaving a software project deprive the project of the knowledge of the decisions they have made. Previous research shows that the survivors and newcomers maintaining abandoned code have reduced productivity and are more likely to make mistakes. We focus on quantifying the extent of abandoned source files and adapt methods from financial risk analysis to assess the susceptibility of the project to developer turnover. In particular, we measure the historical loss distribution and find (1) that projects are susceptible to losses that are more than three times larger than the expected loss. Using historical simulations we find (2) that projects are susceptible to large losses that are over five times larger than the expected loss. We use Monte Carlo simulations of disaster loss scenarios and find (3) that simplistic estimates of the `truck factor' exaggerate the potential for loss. To mitigate loss from developer turnover, we modify Cataldo et al's coordination requirements matrices. We find (4) that we can recommend the correct successor 34% to 48% of the time. We also find that having successors reduces the expected loss by as much as 15%. Our approach helps large projects assess the risk of turnover thereby making risk more transparent and manageable.

源代码的效用，就像其他知识工件一样，是建立在有足够技能的个体的存在之上的，这些个体能够通过使用或改进它来获得价值。离开软件项目的开发人员剥夺了项目对他们所做决策的了解。先前的研究表明，幸存者和维护废弃代码的新人降低了生产力，更容易出错。我们专注于量化废弃源文件的程度，并采用财务风险分析的方法来评估项目对开发人员流失的敏感性。特别是，我们测量了历史损失分布，并发现(1)项目容易受到比预期损失大三倍以上的损失的影响。使用历史模拟，我们发现(2)项目容易遭受比预期损失大五倍以上的巨大损失。我们使用蒙特卡罗模拟灾害损失情景，发现(3)对“卡车因素”的简单估计夸大了潜在的损失。为了减轻开发人员流失造成的损失，我们修改了Cataldo等人的协调需求矩阵。我们发现(4)在34%到48%的时间里，我们可以推荐正确的继任者。我们还发现，有继任者可以减少高达15%的预期损失。我们的方法帮助大型项目评估周转风险，从而使风险更加透明和可管理。

{"title":"Quantifying and Mitigating Turnover-Induced Knowledge Loss: Case Studies of Chrome and a Project at Avaya","authors":"Peter C. Rigby, Y. Zhu, Samuel M. Donadelli, A. Mockus","doi":"10.1145/2884781.2884851","DOIUrl":"https://doi.org/10.1145/2884781.2884851","url":null,"abstract":"The utility of source code, as of other knowledge artifacts, is predicated on the existence of individuals skilled enough to derive value by using or improving it. Developers leaving a software project deprive the project of the knowledge of the decisions they have made. Previous research shows that the survivors and newcomers maintaining abandoned code have reduced productivity and are more likely to make mistakes. We focus on quantifying the extent of abandoned source files and adapt methods from financial risk analysis to assess the susceptibility of the project to developer turnover. In particular, we measure the historical loss distribution and find (1) that projects are susceptible to losses that are more than three times larger than the expected loss. Using historical simulations we find (2) that projects are susceptible to large losses that are over five times larger than the expected loss. We use Monte Carlo simulations of disaster loss scenarios and find (3) that simplistic estimates of the `truck factor' exaggerate the potential for loss. To mitigate loss from developer turnover, we modify Cataldo et al's coordination requirements matrices. We find (4) that we can recommend the correct successor 34% to 48% of the time. We also find that having successors reduces the expected loss by as much as 15%. Our approach helps large projects assess the risk of turnover thereby making risk more transparent and manageable.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"1006-1016"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75049591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Understanding and Fixing Multiple Language Interoperability Issues: The C/Fortran Case 理解和修复多语言互操作性问题:C/Fortran案例

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884858

Nawrin Sultana, J. Middleton, J. Overbey, M. Hafiz

We performed an empirical study to understand interoperability issues in C and Fortran programs. C/Fortran interoperability is very common and is representative of general language interoperability issues, such as how interfaces between languages are defined and how data types are shared. Fortran presents an additional challenge, since several ad hoc approaches to C/Fortran interoperability were in use long before a standard mechanism was defined. We explored 20 applications, automatically analyzing over 12 million lines of code. We found that only 3% of interoperability instances follow the ISO standard to describe interfaces; the rest follow a combination of compiler-dependent ad hoc approaches. Several parameters in cross-language functions did not have standards-compliant interoperable types, and about one-fourth of the parameters that were passed by reference could be passed by value. We propose that automated refactoring tools may provide a viable way to migrate programs to use the new interoperability features. We present two refactorings to transform code for this purpose and one refactoring to evolve code thereafter; all of these are instances of multiple language refactorings.

我们进行了一项实证研究，以了解C和Fortran程序中的互操作性问题。C/Fortran互操作性非常普遍，它代表了一般语言互操作性问题，例如语言之间的接口如何定义以及数据类型如何共享。Fortran提出了一个额外的挑战，因为在定义标准机制之前，就已经使用了几种特殊的C/Fortran互操作性方法。我们研究了20个应用程序，自动分析了超过1200万行代码。我们发现只有3%的互操作性实例遵循ISO标准来描述接口;其余的则采用依赖于编译器的特别方法的组合。跨语言函数中的几个参数没有符合标准的可互操作类型，并且通过引用传递的参数中大约有四分之一可以通过值传递。我们建议自动化重构工具可以提供一种可行的方法来迁移程序以使用新的互操作性特性。为此，我们提出了两个重构来转换代码，一个重构来演进代码;所有这些都是多语言重构的实例。

{"title":"Understanding and Fixing Multiple Language Interoperability Issues: The C/Fortran Case","authors":"Nawrin Sultana, J. Middleton, J. Overbey, M. Hafiz","doi":"10.1145/2884781.2884858","DOIUrl":"https://doi.org/10.1145/2884781.2884858","url":null,"abstract":"We performed an empirical study to understand interoperability issues in C and Fortran programs. C/Fortran interoperability is very common and is representative of general language interoperability issues, such as how interfaces between languages are defined and how data types are shared. Fortran presents an additional challenge, since several ad hoc approaches to C/Fortran interoperability were in use long before a standard mechanism was defined. We explored 20 applications, automatically analyzing over 12 million lines of code. We found that only 3% of interoperability instances follow the ISO standard to describe interfaces; the rest follow a combination of compiler-dependent ad hoc approaches. Several parameters in cross-language functions did not have standards-compliant interoperable types, and about one-fourth of the parameters that were passed by reference could be passed by value. We propose that automated refactoring tools may provide a viable way to migrate programs to use the new interoperability features. We present two refactorings to transform code for this purpose and one refactoring to evolve code thereafter; all of these are instances of multiple language refactorings.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"29 1","pages":"772-783"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78048338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier 基于连通性的无监督分类器的跨项目缺陷预测

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884839

Feng Zhang, Q. Zheng, Ying Zou, A. Hassan

Defect prediction on projects with limited historical data has attracted great interest from both researchers and practitioners. Cross-project defect prediction has been the main area of progress by reusing classifiers from other projects. However, existing approaches require some degree of homogeneity (e.g., a similar distribution of metric values) between the training projects and the target project. Satisfying the homogeneity requirement often requires significant effort (currently a very active area of research). An unsupervised classifier does not require any training data, therefore the heterogeneity challenge is no longer an issue. In this paper, we examine two types of unsupervised classifiers: a) distance-based classifiers (e.g., k-means); and b) connectivity-based classifiers. While distance-based unsupervised classifiers have been previously used in the defect prediction literature with disappointing performance, connectivity-based classifiers have never been explored before in our community. We compare the performance of unsupervised classifiers versus supervised classifiers using data from 26 projects from three publicly available datasets (i.e., AEEEM, NASA, and PROMISE). In the cross-project setting, our proposed connectivity-based classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree) and five unsupervised classifiers (i.e., k-means, partition around medoids, fuzzy C-means, neural-gas, and spectral clustering). In the within-project setting (i.e., models are built and applied on the same project), our spectral classifier ranks in the second tier, while only random forest ranks in the first tier. Hence, connectivity-based unsupervised classifiers offer a viable solution for cross and within project defect predictions.

基于有限历史数据的项目缺陷预测已经引起了研究者和实践者的极大兴趣。通过重用来自其他项目的分类器，跨项目缺陷预测一直是进展的主要领域。然而，现有的方法在训练项目和目标项目之间需要某种程度的同质性(例如，度量值的相似分布)。满足同质性要求通常需要大量的努力(目前是一个非常活跃的研究领域)。无监督分类器不需要任何训练数据，因此异构性挑战不再是一个问题。在本文中，我们研究了两种类型的无监督分类器:a)基于距离的分类器(例如k-means);b)基于连接的分类器。虽然基于距离的无监督分类器以前在缺陷预测文献中使用过，但性能令人失望，但基于连接的分类器以前从未在我们的社区中进行过探索。我们使用来自三个公开数据集(即AEEEM, NASA和PROMISE)的26个项目的数据来比较无监督分类器和监督分类器的性能。在跨项目设置中，我们提出的基于连通性的分类器(通过谱聚类)在五个广泛使用的监督分类器(即随机森林、朴素贝叶斯、逻辑回归、决策树和逻辑模型树)和五个无监督分类器(即k-means、围绕介质的划分、模糊C-means、神经气体和谱聚类)中排名第一。在项目内设置(即在同一项目上构建和应用模型)中，我们的光谱分类器排在第二层，而只有随机森林排在第一层。因此，基于连通性的无监督分类器为跨项目和项目内部缺陷预测提供了可行的解决方案。

{"title":"Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier","authors":"Feng Zhang, Q. Zheng, Ying Zou, A. Hassan","doi":"10.1145/2884781.2884839","DOIUrl":"https://doi.org/10.1145/2884781.2884839","url":null,"abstract":"Defect prediction on projects with limited historical data has attracted great interest from both researchers and practitioners. Cross-project defect prediction has been the main area of progress by reusing classifiers from other projects. However, existing approaches require some degree of homogeneity (e.g., a similar distribution of metric values) between the training projects and the target project. Satisfying the homogeneity requirement often requires significant effort (currently a very active area of research). An unsupervised classifier does not require any training data, therefore the heterogeneity challenge is no longer an issue. In this paper, we examine two types of unsupervised classifiers: a) distance-based classifiers (e.g., k-means); and b) connectivity-based classifiers. While distance-based unsupervised classifiers have been previously used in the defect prediction literature with disappointing performance, connectivity-based classifiers have never been explored before in our community. We compare the performance of unsupervised classifiers versus supervised classifiers using data from 26 projects from three publicly available datasets (i.e., AEEEM, NASA, and PROMISE). In the cross-project setting, our proposed connectivity-based classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree) and five unsupervised classifiers (i.e., k-means, partition around medoids, fuzzy C-means, neural-gas, and spectral clustering). In the within-project setting (i.e., models are built and applied on the same project), our spectral classifier ranks in the second tier, while only random forest ranks in the first tier. Hence, connectivity-based unsupervised classifiers offer a viable solution for cross and within project defect predictions.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"157 1","pages":"309-320"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79960642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 210