2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)最新文献

英文中文

FLOSS Participants' Perceptions About Gender and Inclusiveness: A Survey FLOSS参与者对性别和包容性的看法:一项调查

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00077

Amanda Lee, Jeffrey C. Carver

Background: While FLOSS projects espouse openness and acceptance for all, in practice, female contributors often face discriminatory barriers to contribution. Aims: In this paper, we examine the extent to which these problems still exist. We also study male and female contributors' perceptions of other contributors. Method: We surveyed participants from 15 FLOSS projects, asking a series of open-ended, closed-ended, and behavioral scale questions to gather information about the issue of gender in FLOSS projects. Results: Though many of those we surveyed expressed a positive sentiment towards females who participate in FLOSS projects, some were still strongly against their inclusion. Often, the respondents who were against inclusiveness also believed their own sentiments were the prevailing belief in the community, contrary to our findings. Others did not see the purpose of attempting to be inclusive, expressing the sentiment that a discussion of gender has no place in FLOSS. Conclusions: FLOSS projects have started to move forwards in terms of gender acceptance. However, there is still a need for more progress in the inclusion of gender-diverse contributors.

背景:虽然FLOSS项目支持对所有人开放和接受，但实际上，女性贡献者在贡献方面经常面临歧视性障碍。目的:在本文中，我们考察了这些问题仍然存在的程度。我们还研究了男性和女性贡献者对其他贡献者的看法。方法:我们调查了来自15个FLOSS项目的参与者，询问了一系列开放式、封闭式和行为量表的问题，以收集有关FLOSS项目中性别问题的信息。结果:虽然我们调查的许多人对参与FLOSS项目的女性表达了积极的情绪，但仍然有一些人强烈反对她们的加入。通常，反对包容的受访者也认为自己的观点是社会的主流信念，这与我们的调查结果相反。另一些人不认为试图具有包容性的目的，他们表示，在FLOSS中没有讨论性别的余地。结论:在性别接受方面，FLOSS项目已开始取得进展。然而，在接纳性别多样化的贡献者方面仍需取得更大进展。

{"title":"FLOSS Participants' Perceptions About Gender and Inclusiveness: A Survey","authors":"Amanda Lee, Jeffrey C. Carver","doi":"10.1109/ICSE.2019.00077","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00077","url":null,"abstract":"Background: While FLOSS projects espouse openness and acceptance for all, in practice, female contributors often face discriminatory barriers to contribution. Aims: In this paper, we examine the extent to which these problems still exist. We also study male and female contributors' perceptions of other contributors. Method: We surveyed participants from 15 FLOSS projects, asking a series of open-ended, closed-ended, and behavioral scale questions to gather information about the issue of gender in FLOSS projects. Results: Though many of those we surveyed expressed a positive sentiment towards females who participate in FLOSS projects, some were still strongly against their inclusion. Often, the respondents who were against inclusiveness also believed their own sentiments were the prevailing belief in the community, contrary to our findings. Others did not see the purpose of attempting to be inclusive, expressing the sentiment that a discussion of gender has no place in FLOSS. Conclusions: FLOSS projects have started to move forwards in terms of gender acceptance. However, there is still a need for more progress in the inclusion of gender-diverse contributors.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"90 1","pages":"677-687"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85913306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Automatically Generating Precise Oracles from Structured Natural Language Specifications 从结构化自然语言规范自动生成精确的oracle

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00035

Manish Motwani, Yuriy Brun

Software specifications often use natural language to describe the desired behavior, but such specifications are difficult to verify automatically. We present Swami, an automated technique that extracts test oracles and generates executable tests from structured natural language specifications. Swami focuses on exceptional behavior and boundary conditions that often cause field failures but that developers often fail to manually write tests for. Evaluated on the official JavaScript specification (ECMA-262), 98.4% of the tests Swami generated were precise to the specification. Using Swami to augment developer-written test suites improved coverage and identified 1 previously unknown defect and 15 missing JavaScript features in Rhino, 1 previously unknown defect in Node.js, and 18 semantic ambiguities in the ECMA-262 specification.

软件规范通常使用自然语言来描述期望的行为，但是这样的规范很难自动验证。我们介绍了Swami，一种从结构化的自然语言规范中提取测试预言并生成可执行测试的自动化技术。Swami关注的是异常行为和边界条件，这些通常会导致字段失败，但开发人员经常无法手动编写测试。在官方JavaScript规范(ECMA-262)上进行评估后，Swami生成的98.4%的测试都精确地符合规范。使用Swami增强开发人员编写的测试套件提高了覆盖率，并确定了Rhino中1个以前未知的缺陷和15个缺失的JavaScript特性，Node.js中1个以前未知的缺陷，以及ECMA-262规范中的18个语义歧义。

引用次数: 31

DockerizeMe: Automatic Inference of Environment Dependencies for Python Code Snippets DockerizeMe:自动推断Python代码片段的环境依赖关系

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00047

Eric Horton, Chris Parnin

Platforms like Stack Overflow and GitHub's gist system promote the sharing of ideas and programming techniques via the distribution of code snippets designed to illustrate particular tasks. Python, a popular and fast-growing programming language, sees heavy use on both sites, with nearly one million questions asked on Stack Overflow and 400 thousand public gists on GitHub. Unfortunately, around 75% of the Python example code shared through these sites cannot be directly executed. When run in a clean environment, over 50% of public Python gists fail due to an import error for a missing library. We present DockerizeMe, a technique for inferring the dependencies needed to execute a Python code snippet without import error. DockerizeMe starts with offline knowledge acquisition of the resources and dependencies for popular Python packages from the Python Package Index (PyPI). It then builds Docker specifications using a graph-based inference procedure. Our inference procedure resolves import errors in 892 out of nearly 3,000 gists from the Gistable dataset for which Gistable's baseline approach could not find and install all dependencies.

Stack Overflow和GitHub的gist系统等平台通过分发旨在说明特定任务的代码片段来促进思想和编程技术的共享。Python是一种流行且快速发展的编程语言，在这两个网站上都有大量使用，Stack Overflow上有近100万个问题，GitHub上有40万名公众。不幸的是，通过这些站点共享的大约75%的Python示例代码不能直接执行。当在干净的环境中运行时，由于缺少库的导入错误，超过50%的公共Python列表会失败。我们介绍DockerizeMe，这是一种推断执行Python代码段所需的依赖项而不出现导入错误的技术。DockerizeMe首先从Python包索引(PyPI)中获取流行Python包的资源和依赖项的离线知识。然后，它使用基于图的推理过程构建Docker规范。我们的推理过程解决了来自Gistable数据集的近3000个gist中的892个的导入错误，其中Gistable的基线方法无法找到并安装所有依赖项。

{"title":"DockerizeMe: Automatic Inference of Environment Dependencies for Python Code Snippets","authors":"Eric Horton, Chris Parnin","doi":"10.1109/ICSE.2019.00047","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00047","url":null,"abstract":"Platforms like Stack Overflow and GitHub's gist system promote the sharing of ideas and programming techniques via the distribution of code snippets designed to illustrate particular tasks. Python, a popular and fast-growing programming language, sees heavy use on both sites, with nearly one million questions asked on Stack Overflow and 400 thousand public gists on GitHub. Unfortunately, around 75% of the Python example code shared through these sites cannot be directly executed. When run in a clean environment, over 50% of public Python gists fail due to an import error for a missing library. We present DockerizeMe, a technique for inferring the dependencies needed to execute a Python code snippet without import error. DockerizeMe starts with offline knowledge acquisition of the resources and dependencies for popular Python packages from the Python Package Index (PyPI). It then builds Docker specifications using a graph-based inference procedure. Our inference procedure resolves import errors in 892 out of nearly 3,000 gists from the Gistable dataset for which Gistable's baseline approach could not find and install all dependencies.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"38 1","pages":"328-338"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82635248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

RESTler: Stateful REST API Fuzzing RESTler:有状态REST API模糊测试

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00083

Vaggelis Atlidakis, Patrice Godefroid, Marina Polishchuk

This paper introduces RESTler, the first stateful REST API fuzzer. RESTler analyzes the API specification of a cloud service and generates sequences of requests that automatically test the service through its API. RESTler generates test sequences by (1) inferring producer-consumer dependencies among request types declared in the specification (e.g., inferring that "a request B should be executed after request A" because B takes as an input a resource-id x produced by A) and by (2) analyzing dynamic feedback from responses observed during prior test executions in order to generate new tests (e.g., learning that "a request C after a request sequence A;B is refused by the service" and therefore avoiding this combination in the future). We present experimental results showing that these two techniques are necessary to thoroughly exercise a service under test while pruning the large search space of possible request sequences. We used RESTler to test GitLab, an open-source Git service, as well as several Microsoft Azure and Office365 cloud services. RESTler found 28 bugs in GitLab and several bugs in each of the Azure and Office365 cloud services tested so far. These bugs have been confirmed and fixed by the service owners.

本文介绍了第一个有状态REST API模糊器RESTler。RESTler分析云服务的API规范，并生成通过API自动测试服务的请求序列。RESTler通过(1)推断规范中声明的请求类型之间的生产者-消费者依赖关系(例如，推断“请求B应该在请求a之后执行”，因为B将a生成的资源id x作为输入)和(2)分析在先前测试执行期间观察到的响应的动态反馈，以生成新的测试(例如，了解到“在请求序列a之后的请求C;B被服务拒绝”，从而避免将来出现这种组合)。我们提供的实验结果表明，这两种技术对于在测试中彻底练习服务是必要的，同时修剪可能请求序列的大搜索空间。我们使用RESTler来测试GitLab，一个开源的Git服务，以及几个微软Azure和Office365云服务。雷斯特勒在GitLab中发现了28个bug，在测试的Azure和Office365云服务中都发现了几个bug。服务所有者已经确认并修复了这些错误。

{"title":"RESTler: Stateful REST API Fuzzing","authors":"Vaggelis Atlidakis, Patrice Godefroid, Marina Polishchuk","doi":"10.1109/ICSE.2019.00083","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00083","url":null,"abstract":"This paper introduces RESTler, the first stateful REST API fuzzer. RESTler analyzes the API specification of a cloud service and generates sequences of requests that automatically test the service through its API. RESTler generates test sequences by (1) inferring producer-consumer dependencies among request types declared in the specification (e.g., inferring that \"a request B should be executed after request A\" because B takes as an input a resource-id x produced by A) and by (2) analyzing dynamic feedback from responses observed during prior test executions in order to generate new tests (e.g., learning that \"a request C after a request sequence A;B is refused by the service\" and therefore avoiding this combination in the future). We present experimental results showing that these two techniques are necessary to thoroughly exercise a service under test while pruning the large search space of possible request sequences. We used RESTler to test GitLab, an open-source Git service, as well as several Microsoft Azure and Office365 cloud services. RESTler found 28 bugs in GitLab and several bugs in each of the Azure and Office365 cloud services tested so far. These bugs have been confirmed and fixed by the service owners.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"748-758"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83315177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 124

Interactive Production Performance Feedback in the IDE IDE中的交互式生产性能反馈

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00102

Jürgen Cito, P. Leitner, M. Rinard, H. Gall

Because of differences between development and production environments, many software performance problems are detected only after software enters production. We present PerformanceHat, a new system that uses profiling information from production executions to develop a global performance model suitable for integration into interactive development environments. PerformanceHat's ability to incrementally update this global model as the software is changed in the development environment enables it to deliver near real-time predictions of performance consequences reflecting the impact on the production environment. We implement PerformanceHat as an Eclipse plugin and evaluate it in a controlled experiment with 20 professional software developers implementing several software maintenance tasks using our approach and a representative baseline (Kibana). Our results indicate that developers using PerformanceHat were significantly faster in (1) detecting the performance problem, and (2) finding the root-cause of the problem. These results provide encouraging evidence that our approach helps developers detect, prevent, and debug production performance problems during development before the problem manifests in production.

由于开发环境和生产环境之间的差异，许多软件性能问题只有在软件进入生产环境之后才会被检测到。我们介绍了PerformanceHat，这是一个新系统，它使用来自生产执行的分析信息来开发适合集成到交互式开发环境中的全局性能模型。随着软件在开发环境中的变化，PerformanceHat能够增量地更新这个全局模型，这使得它能够提供近乎实时的性能结果预测，反映对生产环境的影响。我们将PerformanceHat作为Eclipse插件实现，并在20个专业软件开发人员使用我们的方法和代表性基线(Kibana)实现几个软件维护任务的受控实验中对其进行评估。我们的结果表明，使用PerformanceHat的开发人员在(1)检测性能问题和(2)找到问题的根本原因方面明显更快。这些结果提供了令人鼓舞的证据，证明我们的方法可以帮助开发人员在问题出现在生产环境之前，在开发过程中检测、预防和调试产品性能问题。

{"title":"Interactive Production Performance Feedback in the IDE","authors":"Jürgen Cito, P. Leitner, M. Rinard, H. Gall","doi":"10.1109/ICSE.2019.00102","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00102","url":null,"abstract":"Because of differences between development and production environments, many software performance problems are detected only after software enters production. We present PerformanceHat, a new system that uses profiling information from production executions to develop a global performance model suitable for integration into interactive development environments. PerformanceHat's ability to incrementally update this global model as the software is changed in the development environment enables it to deliver near real-time predictions of performance consequences reflecting the impact on the production environment. We implement PerformanceHat as an Eclipse plugin and evaluate it in a controlled experiment with 20 professional software developers implementing several software maintenance tasks using our approach and a representative baseline (Kibana). Our results indicate that developers using PerformanceHat were significantly faster in (1) detecting the performance problem, and (2) finding the root-cause of the problem. These results provide encouraging evidence that our approach helps developers detect, prevent, and debug production performance problems during development before the problem manifests in production.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"64 1","pages":"971-981"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75027500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

DeepPerf: Performance Prediction for Configurable Software with Deep Sparse Neural Network 基于深度稀疏神经网络的可配置软件性能预测

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00113

Huong Ha, Hongyu Zhang

Many software systems provide users with a set of configuration options and different configurations may lead to different runtime performance of the system. As the combination of configurations could be exponential, it is difficult to exhaustively deploy and measure system performance under all possible configurations. Recently, several learning methods have been proposed to build a performance prediction model based on performance data collected from a small sample of configurations, and then use the model to predict system performance under a new configuration. In this paper, we propose a novel approach to model highly configurable software system using a deep feedforward neural network (FNN) combined with a sparsity regularization technique, e.g. the L1 regularization. Besides, we also design a practical search strategy for automatically tuning the network hyperparameters efficiently. Our method, called DeepPerf, can predict performance values of highly configurable software systems with binary and/or numeric configuration options at much higher prediction accuracy with less training data than the state-of-the art approaches. Experimental results on eleven public real-world datasets confirm the effectiveness of our approach.

许多软件系统为用户提供了一组配置选项，不同的配置可能导致系统的不同运行时性能。由于配置组合可能呈指数级增长，因此很难在所有可能的配置下详尽地部署和测量系统性能。近年来，人们提出了几种学习方法，基于从小样本配置中收集的性能数据建立性能预测模型，然后使用该模型预测新配置下的系统性能。在本文中，我们提出了一种利用深度前馈神经网络(FNN)结合稀疏正则化技术(如L1正则化)来建模高度可配置软件系统的新方法。此外，我们还设计了一种实用的搜索策略，可以有效地自动调整网络超参数。我们的方法，称为DeepPerf，可以预测具有二进制和/或数字配置选项的高度可配置软件系统的性能值，与最先进的方法相比，使用更少的训练数据，预测精度更高。在11个公开的真实数据集上的实验结果证实了我们方法的有效性。

{"title":"DeepPerf: Performance Prediction for Configurable Software with Deep Sparse Neural Network","authors":"Huong Ha, Hongyu Zhang","doi":"10.1109/ICSE.2019.00113","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00113","url":null,"abstract":"Many software systems provide users with a set of configuration options and different configurations may lead to different runtime performance of the system. As the combination of configurations could be exponential, it is difficult to exhaustively deploy and measure system performance under all possible configurations. Recently, several learning methods have been proposed to build a performance prediction model based on performance data collected from a small sample of configurations, and then use the model to predict system performance under a new configuration. In this paper, we propose a novel approach to model highly configurable software system using a deep feedforward neural network (FNN) combined with a sparsity regularization technique, e.g. the L1 regularization. Besides, we also design a practical search strategy for automatically tuning the network hyperparameters efficiently. Our method, called DeepPerf, can predict performance values of highly configurable software systems with binary and/or numeric configuration options at much higher prediction accuracy with less training data than the state-of-the art approaches. Experimental results on eleven public real-world datasets confirm the effectiveness of our approach.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"56 1","pages":"1095-1106"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75327611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68

Investigating the Effects of Gender Bias on GitHub 调查性别偏见对GitHub的影响

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00079

Nasif Imtiaz, Justin Middleton, Joymallya Chakraborty, N. Robson, Gina R. Bai, E. Murphy-Hill

Diversity, including gender diversity, is valued by many software development organizations, yet the field remains dominated by men. One reason for this lack of diversity is gender bias. In this paper, we study the effects of that bias by using an existing framework derived from the gender studies literature.We adapt the four main effects proposed in the framework by posing hypotheses about how they might manifest on GitHub,then evaluate those hypotheses quantitatively. While our results how that effects of gender bias are largely invisible on the GitHub platform itself, there are still signals of women concentrating their work in fewer places and being more restrained in communication than men.

多样性，包括性别多样性，被许多软件开发组织所重视，然而该领域仍然由男性主导。缺乏多样性的一个原因是性别偏见。在本文中，我们通过使用源自性别研究文献的现有框架来研究这种偏见的影响。我们通过提出关于它们如何在GitHub上表现的假设来调整框架中提出的四个主要影响，然后定量地评估这些假设。虽然我们的研究结果表明，性别偏见的影响在GitHub平台本身基本上是看不见的，但仍有迹象表明，女性将工作集中在更少的地方，在沟通上比男性更克制。

引用次数: 69

Multifaceted Automated Analyses for Variability-Intensive Embedded Systems 可变密集嵌入式系统的多方面自动化分析

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00092

Sami Lazreg, Maxime Cordy, P. Collet, P. Heymans, Sébastien Mosser

Embedded systems, like those found in the automotive domain, must comply with stringent functional and non-functional requirements. To fulfil these requirements, engineers are confronted with a plethora of design alternatives both at the software and hardware level, out of which they must select the optimal solution wrt. possibly-antagonistic quality attributes (e.g. cost of manufacturing vs. speed of execution). We propose a model-driven framework to assist engineers in this choice. It captures high-level specifications of the system in the form of variable dataflows and configurable hardware platforms. A mapping algorithm then derives the design space, i.e. the set of compatible pairs of application and platform variants, and a variability-aware executable model, which encodes the functional and non-functional behaviour of all viable system variants. Novel verification algorithms then pinpoint the optimal system variants efficiently. The benefits of our approach are evaluated through a real-world case study from the automotive industry.

嵌入式系统，就像在汽车领域中发现的那样，必须符合严格的功能和非功能需求。为了满足这些要求，工程师们在软件和硬件层面都面临着大量的设计选择，他们必须从中选择最优的解决方案。可能对立的质量属性(例如，制造成本与执行速度)。我们提出了一个模型驱动的框架来帮助工程师进行这种选择。它以可变数据流和可配置硬件平台的形式捕获系统的高级规范。然后，映射算法派生出设计空间，即一组兼容的应用程序和平台变体，以及一个变量感知的可执行模型，该模型对所有可行的系统变体的功能和非功能行为进行编码。新的验证算法，然后查明最优的系统变体有效。我们通过汽车行业的实际案例研究来评估我们方法的好处。

引用次数: 10

CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries CRADLE:跨后端验证以检测和定位深度学习库中的错误

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00107

H. Pham, Thibaud Lutellier, Weizhen Qi, Lin Tan

Deep learning (DL) systems are widely used in domains including aircraft collision avoidance systems, Alzheimer's disease diagnosis, and autonomous driving cars. Despite the requirement for high reliability, DL systems are difficult to test. Existing DL testing work focuses on testing the DL models, not the implementations (e.g., DL software libraries) of the models. One key challenge of testing DL libraries is the difficulty of knowing the expected output of DL libraries given an input instance. Fortunately, there are multiple implementations of the same DL algorithms in different DL libraries. Thus, we propose CRADLE, a new approach that focuses on finding and localizing bugs in DL software libraries. CRADLE (1) performs cross-implementation inconsistency checking to detect bugs in DL libraries, and (2) leverages anomaly propagation tracking and analysis to localize faulty functions in DL libraries that cause the bugs. We evaluate CRADLE on three libraries (TensorFlow, CNTK, and Theano), 11 datasets (including ImageNet, MNIST, and KGS Go game), and 30 pre-trained models. CRADLE detects 12 bugs and 104 unique inconsistencies, and highlights functions relevant to the causes of inconsistencies for all 104 unique inconsistencies.

深度学习(DL)系统被广泛应用于飞机防撞系统、阿尔茨海默病诊断和自动驾驶汽车等领域。尽管对高可靠性有要求，但深度学习系统很难测试。现有的深度学习测试工作侧重于测试深度学习模型，而不是模型的实现(例如，深度学习软件库)。测试DL库的一个关键挑战是很难知道给定输入实例的DL库的预期输出。幸运的是，相同的深度学习算法在不同的深度学习库中有多种实现。因此，我们提出了CRADLE，这是一种专注于查找和定位DL软件库中的错误的新方法。CRADLE(1)执行跨实现不一致性检查以检测DL库中的错误，(2)利用异常传播跟踪和分析来定位导致错误的DL库中的错误函数。我们在三个库(TensorFlow, CNTK和Theano)， 11个数据集(包括ImageNet, MNIST和KGS Go游戏)和30个预训练模型上评估CRADLE。CRADLE检测到12个bug和104个唯一的不一致，并突出显示与所有104个唯一不一致的不一致原因相关的函数。

{"title":"CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries","authors":"H. Pham, Thibaud Lutellier, Weizhen Qi, Lin Tan","doi":"10.1109/ICSE.2019.00107","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00107","url":null,"abstract":"Deep learning (DL) systems are widely used in domains including aircraft collision avoidance systems, Alzheimer's disease diagnosis, and autonomous driving cars. Despite the requirement for high reliability, DL systems are difficult to test. Existing DL testing work focuses on testing the DL models, not the implementations (e.g., DL software libraries) of the models. One key challenge of testing DL libraries is the difficulty of knowing the expected output of DL libraries given an input instance. Fortunately, there are multiple implementations of the same DL algorithms in different DL libraries. Thus, we propose CRADLE, a new approach that focuses on finding and localizing bugs in DL software libraries. CRADLE (1) performs cross-implementation inconsistency checking to detect bugs in DL libraries, and (2) leverages anomaly propagation tracking and analysis to localize faulty functions in DL libraries that cause the bugs. We evaluate CRADLE on three libraries (TensorFlow, CNTK, and Theano), 11 datasets (including ImageNet, MNIST, and KGS Go game), and 30 pre-trained models. CRADLE detects 12 bugs and 104 unique inconsistencies, and highlights functions relevant to the causes of inconsistencies for all 104 unique inconsistencies.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"40 1","pages":"1027-1038"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81531561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 122

Investigating the Impact of Multiple Dependency Structures on Software Defects 研究多种依赖结构对软件缺陷的影响

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00069

Di Cui, Ting Liu, Yuanfang Cai, Q. Zheng, Qiong Feng, Wuxia Jin, Jiaqi Guo, YunHuan Qu

Over the past decades, numerous approaches were proposed to help practitioner to predict or locate defective files. These techniques often use syntactic dependency, history co-change relation, or semantic similarity. The problem is that, it remains unclear whether these different dependency relations will present similar accuracy in terms of defect prediction and localization. In this paper, we present our systematic investigation of this question from the perspective of software architecture. Considering files involved in each dependency type as an individual design space, we model such a design space using one DRSpace. We derived 3 DRSpaces for each of the 117 Apache open source projects, with 643,079 revision commits and 101,364 bug reports in total, and calculated their interactions with defective files. The experiment results are surprising: the three dependency types present significantly different architectural views, and their interactions with defective files are also drastically different. Intuitively, they play completely different roles when used for defect prediction/localization. The good news is that the combination of these structures has the potential to improve the accuracy of defect prediction/localization. In summary, our work provides a new perspective regarding to which type(s) of relations should be used for the task of defect prediction/localization. These quantitative and qualitative results also advance our knowledge of the relationship between software quality and architectural views formed using different dependency types.

在过去的几十年里，提出了许多方法来帮助从业者预测或定位有缺陷的文件。这些技术通常使用句法依赖性、历史共变关系或语义相似性。问题是，这些不同的依赖关系是否会在缺陷预测和定位方面呈现相似的准确性仍然不清楚。在本文中，我们从软件体系结构的角度对这个问题进行了系统的研究。考虑到每个依赖类型中涉及的文件作为一个单独的设计空间，我们使用一个DRSpace对这样的设计空间进行建模。我们为117个Apache开源项目中的每个项目导出了3个drspace，总共有643,079个修订提交和101,364个错误报告，并计算了它们与有缺陷文件的交互。实验结果令人惊讶:三种依赖类型呈现出明显不同的体系结构视图，它们与有缺陷的文件的交互也完全不同。直观地说，当用于缺陷预测/定位时，它们扮演着完全不同的角色。好消息是，这些结构的组合有可能提高缺陷预测/定位的准确性。总之，我们的工作提供了一种新的视角，关于哪种类型的关系应该用于缺陷预测/定位的任务。这些定量和定性的结果也促进了我们对软件质量和使用不同依赖类型形成的架构视图之间关系的认识。

{"title":"Investigating the Impact of Multiple Dependency Structures on Software Defects","authors":"Di Cui, Ting Liu, Yuanfang Cai, Q. Zheng, Qiong Feng, Wuxia Jin, Jiaqi Guo, YunHuan Qu","doi":"10.1109/ICSE.2019.00069","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00069","url":null,"abstract":"Over the past decades, numerous approaches were proposed to help practitioner to predict or locate defective files. These techniques often use syntactic dependency, history co-change relation, or semantic similarity. The problem is that, it remains unclear whether these different dependency relations will present similar accuracy in terms of defect prediction and localization. In this paper, we present our systematic investigation of this question from the perspective of software architecture. Considering files involved in each dependency type as an individual design space, we model such a design space using one DRSpace. We derived 3 DRSpaces for each of the 117 Apache open source projects, with 643,079 revision commits and 101,364 bug reports in total, and calculated their interactions with defective files. The experiment results are surprising: the three dependency types present significantly different architectural views, and their interactions with defective files are also drastically different. Intuitively, they play completely different roles when used for defect prediction/localization. The good news is that the combination of these structures has the potential to improve the accuracy of defect prediction/localization. In summary, our work provides a new perspective regarding to which type(s) of relations should be used for the task of defect prediction/localization. These quantitative and qualitative results also advance our knowledge of the relationship between software quality and architectural views formed using different dependency types.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"31 1","pages":"584-595"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89912721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀