What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2022-07-20 DOI:10.1109/ICSME55016.2022.00039

Sarra Habchi, Guillaume Haben, Jeongju Sohn, Adriano Franci, Mike Papadakis, Maxime Cordy, Yves Le Traon

{"title":"What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness","authors":"Sarra Habchi, Guillaume Haben, Jeongju Sohn, Adriano Franci, Mike Papadakis, Maxime Cordy, Yves Le Traon","doi":"10.1109/ICSME55016.2022.00039","DOIUrl":null,"url":null,"abstract":"Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers’ time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect and isolate flaky tests. However, flaky tests are rarely fixed as developers struggle to localise and understand their causes. Additionally, developers working with large codebases often need to know the sources of non-determinism to preserve code quality, i.e., avoid introducing technical debt linked with non-deterministic behaviour, and to avoid introducing new flaky tests. To aid with these tasks, we propose re-targeting Fault Localisation techniques to the flaky component localisation problem, i.e., pinpointing program classes that cause the non-deterministic behaviour of flaky tests. In particular, we employ Spectrum-Based Fault Localisation (SBFL), a coverage-based fault localisation technique commonly adopted for its simplicity and effectiveness. We also utilise other data sources, such as change history and static code metrics, to further improve the localisation. Our results show that augmenting SBFL with change and code metrics ranks flaky classes in the top-1 and top-5 suggestions, in 26% and 47% of the cases. Overall, we successfully reduced the average number of classes inspected to locate the first flaky class to 19% of the total number of classes covered by flaky tests. Our results also show that localisation methods are effective in major flakiness categories, such as concurrency and asynchronous waits, indicating their general ability to identify flaky components.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers’ time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect and isolate flaky tests. However, flaky tests are rarely fixed as developers struggle to localise and understand their causes. Additionally, developers working with large codebases often need to know the sources of non-determinism to preserve code quality, i.e., avoid introducing technical debt linked with non-deterministic behaviour, and to avoid introducing new flaky tests. To aid with these tasks, we propose re-targeting Fault Localisation techniques to the flaky component localisation problem, i.e., pinpointing program classes that cause the non-deterministic behaviour of flaky tests. In particular, we employ Spectrum-Based Fault Localisation (SBFL), a coverage-based fault localisation technique commonly adopted for its simplicity and effectiveness. We also utilise other data sources, such as change history and static code metrics, to further improve the localisation. Our results show that augmenting SBFL with change and code metrics ranks flaky classes in the top-1 and top-5 suggestions, in 26% and 47% of the cases. Overall, we successfully reduced the average number of classes inspected to locate the first flaky class to 19% of the total number of classes covered by flaky tests. Our results also show that localisation methods are effective in major flakiness categories, such as concurrency and asynchronous waits, indicating their general ability to identify flaky components.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

是什么造成了这个测试片?精确定位导致测试不稳定的类

不稳定测试被定义为对同一版本的代码间歇性地通过或失败，从而表现出不确定性行为的测试。这些测试用错误的警报破坏了持续集成，浪费了开发人员的时间，破坏了他们对回归测试的信任。为了减轻薄片的影响，研究人员和工业专家都提出了检测和隔离薄片测试的策略和工具。然而，不稳定的测试很少得到修复，因为开发人员努力本地化并了解其原因。此外，使用大型代码库的开发人员通常需要知道不确定性的来源，以保持代码质量，也就是说，避免引入与不确定性行为相关的技术债务，并避免引入新的不可靠的测试。为了帮助完成这些任务，我们建议将故障定位技术重新定位到片状组件定位问题，即精确定位导致片状测试的不确定性行为的程序类。特别是，我们采用了基于频谱的故障定位(SBFL)，这是一种基于覆盖的故障定位技术，因其简单有效而被广泛采用。我们还利用其他数据源，如变更历史和静态代码度量，来进一步改进本地化。我们的结果显示，在26%和47%的情况下，使用变更和代码度量来增加sffl将不稳定的类排在前1名和前5名的建议中。总的来说，我们成功地减少了为定位第一个片状类而检查的类的平均数量，使其占片状测试所覆盖的类总数的19%。我们的结果还表明，本地化方法在主要的脆弱类别(如并发和异步等待)中是有效的，这表明它们识别脆弱组件的一般能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量

期刊最新文献

RestTestGen: An Extensible Framework for Automated Black-box Testing of RESTful APIs COBREX: A Tool for Extracting Business Rules from COBOL On the Security of Python Virtual Machines: An Empirical Study The Phantom Menace: Unmasking Security Issues in Evolving Software Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair