2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)最新文献

英文中文

An Empirical Study on the Usage of Automated Machine Learning Tools 自动化机器学习工具使用的实证研究

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-08-28 DOI: 10.1109/ICSME55016.2022.00014

Forough Majidi, Moses Openja, Foutse Khomh, Heng Li

The popularity of automated machine learning (AutoML) tools in different domains has increased over the past few years. Machine learning (ML) practitioners use AutoML tools to automate and optimize the process of feature engineering, model training, and hyperparameter optimization and so on. Recent work performed qualitative studies on practitioners’ experiences of using AutoML tools and compared different AutoML tools based on their performance and provided features, but none of the existing work studied the practices of using AutoML tools in real-world projects at a large scale. Therefore, we conducted an empirical study to understand how ML practitioners use AutoML tools in their projects. To this end, we examined the top 10 most used AutoML tools and their respective usages in a large number of open-source project repositories hosted on GitHub. The results of our study show 1) which AutoML tools are mostly used by ML practitioners and 2) the characteristics of the repositories that use these AutoML tools. Also, we identified the purpose of using AutoML tools (e.g. model parameter sampling, search space management, model evaluation/error-analysis, Data/ feature transformation, and data labeling) and the stages of the ML pipeline (e.g. feature engineering) where AutoML tools are used. Finally, we report how often AutoML tools are used together in the same source code files. We hope our results can help ML practitioners learn about different AutoML tools and their usages, so that they can pick the right tool for their purposes. Besides, AutoML tool developers can benefit from our findings to gain insight into the usages of their tools and improve their tools to better fit the users’ usages and needs.

在过去的几年中，自动化机器学习(AutoML)工具在不同领域的普及程度有所增加。机器学习(ML)从业者使用AutoML工具来自动化和优化特征工程、模型训练和超参数优化等过程。最近的工作对从业人员使用AutoML工具的经验进行了定性研究，并根据它们的性能和提供的特性比较了不同的AutoML工具，但是现有的工作都没有研究在实际项目中大规模使用AutoML工具的实践。因此，我们进行了一项实证研究，以了解ML从业者如何在他们的项目中使用AutoML工具。为此，我们研究了十大最常用的AutoML工具，以及它们在GitHub上托管的大量开源项目存储库中的各自用法。我们的研究结果显示了1)哪些AutoML工具最常被ML从业者使用，以及2)使用这些AutoML工具的存储库的特征。此外，我们还确定了使用AutoML工具的目的(例如，模型参数采样、搜索空间管理、模型评估/错误分析、数据/特征转换和数据标记)以及使用AutoML工具的ML管道阶段(例如，特征工程)。最后，我们报告了在相同的源代码文件中一起使用AutoML工具的频率。我们希望我们的结果可以帮助ML从业者了解不同的AutoML工具及其用法，以便他们可以根据自己的目的选择正确的工具。此外，AutoML工具开发人员可以从我们的发现中受益，从而深入了解他们的工具的使用，并改进他们的工具，以更好地适应用户的使用和需求。

{"title":"An Empirical Study on the Usage of Automated Machine Learning Tools","authors":"Forough Majidi, Moses Openja, Foutse Khomh, Heng Li","doi":"10.1109/ICSME55016.2022.00014","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00014","url":null,"abstract":"The popularity of automated machine learning (AutoML) tools in different domains has increased over the past few years. Machine learning (ML) practitioners use AutoML tools to automate and optimize the process of feature engineering, model training, and hyperparameter optimization and so on. Recent work performed qualitative studies on practitioners’ experiences of using AutoML tools and compared different AutoML tools based on their performance and provided features, but none of the existing work studied the practices of using AutoML tools in real-world projects at a large scale. Therefore, we conducted an empirical study to understand how ML practitioners use AutoML tools in their projects. To this end, we examined the top 10 most used AutoML tools and their respective usages in a large number of open-source project repositories hosted on GitHub. The results of our study show 1) which AutoML tools are mostly used by ML practitioners and 2) the characteristics of the repositories that use these AutoML tools. Also, we identified the purpose of using AutoML tools (e.g. model parameter sampling, search space management, model evaluation/error-analysis, Data/ feature transformation, and data labeling) and the stages of the ML pipeline (e.g. feature engineering) where AutoML tools are used. Finally, we report how often AutoML tools are used together in the same source code files. We hope our results can help ML practitioners learn about different AutoML tools and their usages, so that they can pick the right tool for their purposes. Besides, AutoML tool developers can benefit from our findings to gain insight into the usages of their tools and improve their tools to better fit the users’ usages and needs.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134365813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Don’t Reinvent the Wheel: Towards Automatic Replacement of Custom Implementations with APIs 不要重新发明轮子:用api自动替换自定义实现

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-08-16 DOI: 10.1109/ICSME55016.2022.00046

Rosalia Tufano, Emad Aghajani, G. Bavota

Reusing code is a common practice in software development: It helps developers speedup the implementation task while also reducing the chances of introducing bugs, given the assumption that the reused code has been tested, possibly in production. Despite these benefits, opportunities for reuse are not always in plain sight and, thus, developers may miss them. We present our preliminary steps in building RETIWA, a recommender able to automatically identify custom implementations in a given project that are good candidates to be replaced by open source APIs. RETIWA relies on a "knowledge base" consisting of real examples of custom implementation-to-API replacements. In this work, we present the mining strategy we tailored to automatically and reliably extract replacements of custom implementations with APIs from open source projects. This is the first step towards building the envisioned recommender.

重用代码是软件开发中的一种常见做法:它可以帮助开发人员加快实现任务，同时还可以减少引入错误的机会，因为假设重用的代码已经经过了测试，可能是在生产环境中。尽管有这些好处，但是重用的机会并不总是显而易见的，因此，开发人员可能会错过它们。我们介绍了构建RETIWA的初步步骤，它是一个推荐者，能够自动识别给定项目中的自定义实现，这些实现可以被开源api替代。RETIWA依赖于一个“知识库”，该知识库由自定义实现到api替换的真实示例组成。在这项工作中，我们提出了我们定制的挖掘策略，以自动可靠地从开源项目中提取带有api的自定义实现的替代品。这是构建所设想的推荐器的第一步。

引用次数: 0

How to Configure Masked Event Anomaly Detection on Software Logs? 如何配置对软件日志进行屏蔽事件异常检测?

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-08-03 DOI: 10.1109/ICSME55016.2022.00050

Jesse Nyyssölä, M. Mäntylä, M. Varela

Software Log anomaly event detection with masked event prediction has various technical approaches with countless configurations and parameters. Our objective is to provide a baseline of settings for similar studies in the future. The models we use are the N-Gram model, which is a classic approach in the field of natural language processing (NLP), and two deep learning (DL) models long short-term memory (LSTM) and convolutional neural network (CNN). For datasets we used four datasets Profilence, BlueGene/L (BGL), Hadoop Distributed File System (HDFS) and Hadoop. Other settings are the size of the sliding window which determines how many surrounding events we are using to predict a given event, mask position (the position within the window we are predicting), the usage of only unique sequences, and the portion of data that is used for training. The results show clear indications of settings that can be generalized across datasets. The performance of the DL models does not deteriorate as the window size increases while the N-Gram model shows worse performance with large window sizes on the BGL and Profilence datasets. Despite the popularity of Next Event Prediction, the results show that in this context it is better not to predict events at the edges of the subsequence, i.e., first or last event, with the best result coming from predicting the fourth event when the window size is five. Regarding the amount of data used for training, the results show differences across datasets and models. For example, the N-Gram model appears to be more sensitive toward the lack of data than the DL models. Overall, for similar experimental setups we suggest the following general baseline: Window size 10, mask position second to last, do not filter out non-unique sequences, and use a half of the total data for training.

基于屏蔽事件预测的软件日志异常事件检测技术手段多种多样，配置和参数无数。我们的目标是为将来类似的研究提供一个基线。我们使用的模型是N-Gram模型，这是自然语言处理(NLP)领域的经典方法，以及两个深度学习(DL)模型长短期记忆(LSTM)和卷积神经网络(CNN)。对于数据集，我们使用了四个数据集Profilence, BlueGene/L (BGL)， Hadoop Distributed File System (HDFS)和Hadoop。其他设置包括滑动窗口的大小，它决定了我们使用多少周围事件来预测给定事件，掩码位置(我们预测的窗口内的位置)，仅使用唯一序列，以及用于训练的数据部分。结果显示了可以跨数据集推广的设置的明确指示。DL模型的性能不会随着窗口大小的增加而下降，而N-Gram模型在BGL和Profilence数据集上随着窗口大小的增加而表现出更差的性能。尽管下一个事件预测很受欢迎，但结果表明，在这种情况下，最好不要预测子序列边缘的事件，即第一个或最后一个事件，当窗口大小为5时，最好的结果来自预测第四个事件。关于用于训练的数据量，结果显示了数据集和模型之间的差异。例如，N-Gram模型似乎比DL模型对缺乏数据更敏感。总的来说，对于类似的实验设置，我们建议以下一般基线:窗口大小为10，掩码位置倒数第二，不过滤掉非唯一序列，并使用总数据的一半进行训练。

{"title":"How to Configure Masked Event Anomaly Detection on Software Logs?","authors":"Jesse Nyyssölä, M. Mäntylä, M. Varela","doi":"10.1109/ICSME55016.2022.00050","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00050","url":null,"abstract":"Software Log anomaly event detection with masked event prediction has various technical approaches with countless configurations and parameters. Our objective is to provide a baseline of settings for similar studies in the future. The models we use are the N-Gram model, which is a classic approach in the field of natural language processing (NLP), and two deep learning (DL) models long short-term memory (LSTM) and convolutional neural network (CNN). For datasets we used four datasets Profilence, BlueGene/L (BGL), Hadoop Distributed File System (HDFS) and Hadoop. Other settings are the size of the sliding window which determines how many surrounding events we are using to predict a given event, mask position (the position within the window we are predicting), the usage of only unique sequences, and the portion of data that is used for training. The results show clear indications of settings that can be generalized across datasets. The performance of the DL models does not deteriorate as the window size increases while the N-Gram model shows worse performance with large window sizes on the BGL and Profilence datasets. Despite the popularity of Next Event Prediction, the results show that in this context it is better not to predict events at the edges of the subsequence, i.e., first or last event, with the best result coming from predicting the fourth event when the window size is five. Regarding the amount of data used for training, the results show differences across datasets and models. For example, the N-Gram model appears to be more sensitive toward the lack of data than the DL models. Overall, for similar experimental setups we suggest the following general baseline: Window size 10, mask position second to last, do not filter out non-unique sequences, and use a half of the total data for training.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121040449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Together or Apart? Investigating a mediator bot to aggregate bot’s comments on pull requests 一起还是分开?调查一个中介机器人来聚合机器人对拉取请求的评论

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-08-02 DOI: 10.1109/ICSME55016.2022.00054

Eric Ribeiro, Ronan Nascimento, Igor Steinmacher, Laerte Xavier, M. Gerosa, Hugo de Paula, M. Wessel

Software bots connect users and tools, streamlining the pull request review process in social coding platforms. However, bots can introduce information overload into developers’ communication. Information overload is especially problematic for newcomers, who are still exploring the project and may feel overwhelmed by the number of messages. Inspired by the literature of other domains, we designed and evaluated FunnelBot, a bot that acts as a mediator between developers and other bots in the repository. We conducted a within-subject study with 25 newcomers to capture their perceptions and preferences. Our results provide insights for bot developers who want to mitigate noise and create bots for supporting newcomers, laying a foundation for designing better bots.

软件机器人连接用户和工具，简化了社交编码平台上的拉取请求审查过程。然而，聊天机器人会给开发者的交流带来信息过载。信息过载对新手来说尤其成问题，他们仍在探索项目，可能会被大量的信息压垮。受其他领域文献的启发，我们设计并评估了FunnelBot，这是一个在开发人员和存储库中的其他机器人之间充当中介的机器人。我们对25名新人进行了一项主题内研究，以了解他们的看法和偏好。我们的研究结果为机器人开发者提供了见解，他们想要减少噪音，创造机器人来支持新手，为设计更好的机器人奠定了基础。

引用次数: 2

An Exploratory Study of Documentation Strategies for Product Features in Popular GitHub Projects 流行GitHub项目中产品特性文档策略的探索性研究

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-08-02 DOI: 10.1109/ICSME55016.2022.00043

Tim Puhlfurss, Lloyd Montgomery, W. Maalej

[Background] In large open-source software projects, development knowledge is often fragmented across multiple artefacts and contributors such that individual stakeholders are generally unaware of the full breadth of the product features. However, users want to know what the software is capable of, while contributors need to know where to fix, update, and add features. [Objective] This work aims at understanding how feature knowledge is documented in GitHub projects and how it is linked (if at all) to the source code. [Method] We conducted an in-depth qualitative exploratory content analysis of 25 popular GitHub repositories that provided the documentation artefacts recommended by GitHub’s Community Standards indicator. We first extracted strategies used to document software features in textual artefacts and then strategies used to link the feature documentation with source code. [Results] We observed feature documentation in all studied projects in artefacts such as READMEs, wikis, and website resource files. However, the features were often described in an unstructured way. Additionally, tracing techniques to connect feature documentation and source code were rarely used. [Conclusions] Our results suggest a lacking (or a low-prioritised) feature documentation in open-source projects, little use of normalised structures, and a rare explicit referencing to source code. As a result, product feature traceability is likely to be very limited, and maintainability to suffer over time.

[背景]在大型开源软件项目中，开发知识通常分散在多个工件和贡献者之间，因此单个涉众通常不知道产品特性的全部广度。然而，用户想知道软件的功能，而贡献者需要知道在哪里修复、更新和添加功能。[目的]这项工作旨在了解GitHub项目中的特性知识是如何记录的，以及它如何链接(如果有的话)到源代码。[方法]我们对25个流行的GitHub存储库进行了深入的定性探索性内容分析，这些存储库提供了GitHub社区标准指标推荐的文档工件。我们首先提取用于在文本工件中记录软件特性的策略，然后提取用于将特性文档与源代码链接的策略。[结果]我们观察了所有研究项目的工件中的特性文档，例如readme、wiki和网站资源文件。然而，这些特性通常以非结构化的方式描述。此外，很少使用连接特性文档和源代码的跟踪技术。[结论]我们的结果表明，在开源项目中缺乏(或低优先级)特性文档，很少使用规范化结构，并且很少明确引用源代码。因此，产品特性的可追溯性很可能非常有限，而且随着时间的推移，可维护性会受到影响。

{"title":"An Exploratory Study of Documentation Strategies for Product Features in Popular GitHub Projects","authors":"Tim Puhlfurss, Lloyd Montgomery, W. Maalej","doi":"10.1109/ICSME55016.2022.00043","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00043","url":null,"abstract":"[Background] In large open-source software projects, development knowledge is often fragmented across multiple artefacts and contributors such that individual stakeholders are generally unaware of the full breadth of the product features. However, users want to know what the software is capable of, while contributors need to know where to fix, update, and add features. [Objective] This work aims at understanding how feature knowledge is documented in GitHub projects and how it is linked (if at all) to the source code. [Method] We conducted an in-depth qualitative exploratory content analysis of 25 popular GitHub repositories that provided the documentation artefacts recommended by GitHub’s Community Standards indicator. We first extracted strategies used to document software features in textual artefacts and then strategies used to link the feature documentation with source code. [Results] We observed feature documentation in all studied projects in artefacts such as READMEs, wikis, and website resource files. However, the features were often described in an unstructured way. Additionally, tracing techniques to connect feature documentation and source code were rarely used. [Conclusions] Our results suggest a lacking (or a low-prioritised) feature documentation in open-source projects, little use of normalised structures, and a rare explicit referencing to source code. As a result, product feature traceability is likely to be very limited, and maintainability to suffer over time.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114669175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adding Context to Source Code Representations for Deep Learning 为深度学习的源代码表示添加上下文

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-07-30 DOI: 10.1109/ICSME55016.2022.00042

Fuwei Tian, Christoph Treude

Deep learning models have been successfully applied to a variety of software engineering tasks, such as code classification, summarisation, and bug and vulnerability detection. In order to apply deep learning to these tasks, source code needs to be represented in a format that is suitable for input into the deep learning model. Most approaches to representing source code, such as tokens, abstract syntax trees (ASTs), data flow graphs (DFGs), and control flow graphs (CFGs) only focus on the code itself and do not take into account additional context that could be useful for deep learning models. In this paper, we argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed. We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model for two software engineering tasks. We outline our research agenda for adding further contextual information to source code representations for deep learning.

深度学习模型已经成功地应用于各种软件工程任务，如代码分类、摘要、bug和漏洞检测。为了将深度学习应用于这些任务，源代码需要以适合输入深度学习模型的格式表示。大多数表示源代码的方法，如令牌、抽象语法树(ast)、数据流图(DFGs)和控制流图(CFGs)只关注代码本身，而没有考虑可能对深度学习模型有用的其他上下文。在本文中，我们认为深度学习模型能够访问被分析代码的附加上下文信息是有益的。我们提供的初步证据表明，调用层次结构中的编码上下文以及代码本身的信息可以提高两个软件工程任务中最先进的深度学习模型的性能。我们概述了我们的研究议程，为深度学习的源代码表示添加进一步的上下文信息。

引用次数: 1

Developers Struggle with Authentication in Blazor WebAssembly 开发人员在Blazor WebAssembly中挣扎于身份验证

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-07-30 DOI: 10.1109/ICSME55016.2022.00045

Pascal André, Quentin Sti'evenart, Mohammad Ghafari

WebAssembly is a growing technology to build cross-platform applications. We aim to understand the security issues that developers encounter when adopting WebAssembly. We mined WebAssembly questions on Stack Overflow and identified 359 security-related posts. We classified these posts into 8 themes, reflecting developer intentions, and 19 topics, representing developer issues in this domain. We found that the most prevalent themes are related to bug fix support, requests for how to implement particular features, clarification questions, and setup or configuration issues. We noted that the topmost issues attribute to authentication in Blazor WebAssembly. We discuss six of them and provide our suggestions to clear these issues in practice.

WebAssembly是一种不断发展的技术，用于构建跨平台应用程序。我们的目标是了解开发人员在采用WebAssembly时遇到的安全问题。我们挖掘了Stack Overflow上的WebAssembly问题，并确定了359个与安全相关的帖子。我们将这些帖子分为8个主题，反映了开发者的意图，以及19个主题，代表了开发者在这个领域的问题。我们发现，最普遍的主题与错误修复支持、如何实现特定功能的请求、澄清问题以及设置或配置问题有关。我们注意到，在Blazor WebAssembly中，最重要的问题是身份验证。本文讨论了其中的六个问题，并提出了在实践中解决这些问题的建议。

引用次数: 1

Perun: Performance Version System Perun:性能版本系统

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-07-26 DOI: 10.1109/ICSME55016.2022.00067

Tomás Fiedor, Jirí Pavela, Adam Rogalewicz, Tomáš Vojnar

In this paper, we present PERUN: an open-source tool suite for profiling-based performance analysis. At its core, PERUN maintains links between project versions and the corresponding stored performance profiles, which are then leveraged for automated detection of performance changes in new project versions. The PERUN tool suite further includes multiple profilers (and is designed such that further profilers can be easily added), a performance fuzz-tester for workload generation, methods for deriving performance models, and numerous visualization methods. We demonstrate how PERUN can help developers to analyze their program performance on two examples: detection and localization of a performance degradation and generation of inputs forcing performance issues to show up.

在本文中，我们介绍了PERUN:一个基于性能分析的开源工具套件。PERUN的核心是维护项目版本和相应存储的性能配置文件之间的链接，然后利用这些链接自动检测新项目版本中的性能变化。PERUN工具套件还包括多个分析器(并且可以很容易地添加更多的分析器)、用于工作负载生成的性能模糊测试器、派生性能模型的方法，以及许多可视化方法。我们通过两个示例演示PERUN如何帮助开发人员分析他们的程序性能:检测和定位性能下降，以及生成强制显示性能问题的输入。

引用次数: 0

What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness 是什么造成了这个测试片?精确定位导致测试不稳定的类

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-07-20 DOI: 10.1109/ICSME55016.2022.00039

Sarra Habchi, Guillaume Haben, Jeongju Sohn, Adriano Franci, Mike Papadakis, Maxime Cordy, Yves Le Traon

Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers’ time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect and isolate flaky tests. However, flaky tests are rarely fixed as developers struggle to localise and understand their causes. Additionally, developers working with large codebases often need to know the sources of non-determinism to preserve code quality, i.e., avoid introducing technical debt linked with non-deterministic behaviour, and to avoid introducing new flaky tests. To aid with these tasks, we propose re-targeting Fault Localisation techniques to the flaky component localisation problem, i.e., pinpointing program classes that cause the non-deterministic behaviour of flaky tests. In particular, we employ Spectrum-Based Fault Localisation (SBFL), a coverage-based fault localisation technique commonly adopted for its simplicity and effectiveness. We also utilise other data sources, such as change history and static code metrics, to further improve the localisation. Our results show that augmenting SBFL with change and code metrics ranks flaky classes in the top-1 and top-5 suggestions, in 26% and 47% of the cases. Overall, we successfully reduced the average number of classes inspected to locate the first flaky class to 19% of the total number of classes covered by flaky tests. Our results also show that localisation methods are effective in major flakiness categories, such as concurrency and asynchronous waits, indicating their general ability to identify flaky components.

不稳定测试被定义为对同一版本的代码间歇性地通过或失败，从而表现出不确定性行为的测试。这些测试用错误的警报破坏了持续集成，浪费了开发人员的时间，破坏了他们对回归测试的信任。为了减轻薄片的影响，研究人员和工业专家都提出了检测和隔离薄片测试的策略和工具。然而，不稳定的测试很少得到修复，因为开发人员努力本地化并了解其原因。此外，使用大型代码库的开发人员通常需要知道不确定性的来源，以保持代码质量，也就是说，避免引入与不确定性行为相关的技术债务，并避免引入新的不可靠的测试。为了帮助完成这些任务，我们建议将故障定位技术重新定位到片状组件定位问题，即精确定位导致片状测试的不确定性行为的程序类。特别是，我们采用了基于频谱的故障定位(SBFL)，这是一种基于覆盖的故障定位技术，因其简单有效而被广泛采用。我们还利用其他数据源，如变更历史和静态代码度量，来进一步改进本地化。我们的结果显示，在26%和47%的情况下，使用变更和代码度量来增加sffl将不稳定的类排在前1名和前5名的建议中。总的来说，我们成功地减少了为定位第一个片状类而检查的类的平均数量，使其占片状测试所覆盖的类总数的19%。我们的结果还表明，本地化方法在主要的脆弱类别(如并发和异步等待)中是有效的，这表明它们识别脆弱组件的一般能力。

{"title":"What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness","authors":"Sarra Habchi, Guillaume Haben, Jeongju Sohn, Adriano Franci, Mike Papadakis, Maxime Cordy, Yves Le Traon","doi":"10.1109/ICSME55016.2022.00039","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00039","url":null,"abstract":"Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers’ time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect and isolate flaky tests. However, flaky tests are rarely fixed as developers struggle to localise and understand their causes. Additionally, developers working with large codebases often need to know the sources of non-determinism to preserve code quality, i.e., avoid introducing technical debt linked with non-deterministic behaviour, and to avoid introducing new flaky tests. To aid with these tasks, we propose re-targeting Fault Localisation techniques to the flaky component localisation problem, i.e., pinpointing program classes that cause the non-deterministic behaviour of flaky tests. In particular, we employ Spectrum-Based Fault Localisation (SBFL), a coverage-based fault localisation technique commonly adopted for its simplicity and effectiveness. We also utilise other data sources, such as change history and static code metrics, to further improve the localisation. Our results show that augmenting SBFL with change and code metrics ranks flaky classes in the top-1 and top-5 suggestions, in 26% and 47% of the cases. Overall, we successfully reduced the average number of classes inspected to locate the first flaky class to 19% of the total number of classes covered by flaky tests. Our results also show that localisation methods are effective in major flakiness categories, such as concurrency and asynchronous waits, indicating their general ability to identify flaky components.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130975117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An Empirical Study of Flaky Tests in JavaScript JavaScript中片状测试的实证研究

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-07-03 DOI: 10.1109/ICSME55016.2022.00011

Negar Hashemi, Amjed Tahir, Shawn Rasheed

Flaky tests (tests with non-deterministic outcomes) can be problematic for testing efficiency and software reliability. Flaky tests in test suites can also significantly delay software releases. There have been several studies that attempt to quantify the impact of test flakiness in different programming languages (e.g., Java and Python) and application domains (e.g., mobile and GUI-based). In this paper, we conduct an empirical study of the state of flaky tests in JavaScript. We investigate two aspects of flaky tests in JavaScript projects: the main causes of flaky tests in these projects and common fixing strategies. By analysing 452 commits from large, top-scoring JavaScript projects from GitHub, we found that flakiness caused by concurrency-related issues (e.g., async wait, race conditions or deadlocks) is the most dominant reason for test flakiness. The other top causes of flaky tests are operating system-specific (e.g., features that work on specific OS or OS versions) and network stability (e.g., internet availability or bad socket management). In terms of how flaky tests are dealt with, the majority of those flaky tests (>80%) are fixed to eliminate flaky behaviour and developers sometimes skip, quarantine or remove flaky tests.

不稳定的测试(具有不确定结果的测试)可能会影响测试效率和软件可靠性。测试套件中不稳定的测试也会显著延迟软件的发布。已经有一些研究试图量化不同编程语言(例如，Java和Python)和应用领域(例如，移动和基于gui的)中测试脆弱性的影响。在本文中，我们对JavaScript中片状测试的状态进行了实证研究。我们研究了JavaScript项目中不稳定测试的两个方面:这些项目中不稳定测试的主要原因和常见的修复策略。通过分析来自GitHub的452个大型、得分最高的JavaScript项目的提交，我们发现由并发相关问题(例如，异步等待、竞争条件或死锁)引起的脆弱是导致测试脆弱的最主要原因。导致不可靠测试的其他主要原因是操作系统特定(例如，在特定操作系统或操作系统版本上工作的特性)和网络稳定性(例如，互联网可用性或不良的套接字管理)。就如何处理不稳定的测试而言，大多数不稳定的测试(大约80%)被修复以消除不稳定的行为，开发人员有时会跳过、隔离或删除不稳定的测试。

{"title":"An Empirical Study of Flaky Tests in JavaScript","authors":"Negar Hashemi, Amjed Tahir, Shawn Rasheed","doi":"10.1109/ICSME55016.2022.00011","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00011","url":null,"abstract":"Flaky tests (tests with non-deterministic outcomes) can be problematic for testing efficiency and software reliability. Flaky tests in test suites can also significantly delay software releases. There have been several studies that attempt to quantify the impact of test flakiness in different programming languages (e.g., Java and Python) and application domains (e.g., mobile and GUI-based). In this paper, we conduct an empirical study of the state of flaky tests in JavaScript. We investigate two aspects of flaky tests in JavaScript projects: the main causes of flaky tests in these projects and common fixing strategies. By analysing 452 commits from large, top-scoring JavaScript projects from GitHub, we found that flakiness caused by concurrency-related issues (e.g., async wait, race conditions or deadlocks) is the most dominant reason for test flakiness. The other top causes of flaky tests are operating system-specific (e.g., features that work on specific OS or OS versions) and network stability (e.g., internet availability or bad socket management). In terms of how flaky tests are dealt with, the majority of those flaky tests (>80%) are fixed to eliminate flaky behaviour and developers sometimes skip, quarantine or remove flaky tests.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129835506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀