A Multi-factor Approach for Flaky Test Detection and Automated Root Cause Analysis

Azeem Ahmad, F. D. O. Neto, Zhixiang Shi, K. Sandahl, O. Leifler
{"title":"A Multi-factor Approach for Flaky Test Detection and Automated Root Cause Analysis","authors":"Azeem Ahmad, F. D. O. Neto, Zhixiang Shi, K. Sandahl, O. Leifler","doi":"10.1109/APSEC53868.2021.00041","DOIUrl":null,"url":null,"abstract":"Developers often spend time to determine whether test case failures are real failures or flaky. The flaky tests, also known as non-deterministic tests, switch their outcomes without any modification in the codebase, hence reducing the confidence of developers during maintenance as well as in the quality of a product. Re-running test cases to reveal flakiness is resource-consuming, unreliable and does not reveal the root causes of test flakiness. Our paper evaluates a multi-factor approach to identify flaky test executions implemented in a tool named MDF laker. The four factors are: trace-back coverage, flaky frequency, number of test smells, and test size. Based on the extracted factors, MDFlaker uses k-Nearest Neighbor (KNN) to determine whether failed test executions are flaky. We investigate MDFlaker in a case study with 2166 test executions from different open-source repositories. We evaluate the effectiveness of our flaky detection tool. We illustrate how the multi-factor approach can be used to reveal root causes for flakiness, and we conduct a qualitative comparison between MDF laker and other tools proposed in literature. Our results show that the combination of different factors can be used to identify flaky tests. Each factor has its own trade-off, e.g., trace-back leads to many true positives, while flaky frequency yields more true negatives. Therefore, specific combinations of factors enable classification for testers with limited information (e.g., not enough test history information).","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Developers often spend time to determine whether test case failures are real failures or flaky. The flaky tests, also known as non-deterministic tests, switch their outcomes without any modification in the codebase, hence reducing the confidence of developers during maintenance as well as in the quality of a product. Re-running test cases to reveal flakiness is resource-consuming, unreliable and does not reveal the root causes of test flakiness. Our paper evaluates a multi-factor approach to identify flaky test executions implemented in a tool named MDF laker. The four factors are: trace-back coverage, flaky frequency, number of test smells, and test size. Based on the extracted factors, MDFlaker uses k-Nearest Neighbor (KNN) to determine whether failed test executions are flaky. We investigate MDFlaker in a case study with 2166 test executions from different open-source repositories. We evaluate the effectiveness of our flaky detection tool. We illustrate how the multi-factor approach can be used to reveal root causes for flakiness, and we conduct a qualitative comparison between MDF laker and other tools proposed in literature. Our results show that the combination of different factors can be used to identify flaky tests. Each factor has its own trade-off, e.g., trace-back leads to many true positives, while flaky frequency yields more true negatives. Therefore, specific combinations of factors enable classification for testers with limited information (e.g., not enough test history information).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
片状测试检测及自动化根本原因分析的多因素方法
开发人员经常花费时间来确定测试用例失败是真正的失败还是偶然的失败。不稳定的测试,也被称为非确定性测试,在代码库中没有任何修改的情况下切换它们的结果,因此降低了开发人员在维护期间以及对产品质量的信心。重新运行测试用例来揭示脆弱是消耗资源的,不可靠的,并且不能揭示测试脆弱的根本原因。我们的论文评估了一种多因素方法来识别在一个名为MDF lake的工具中实现的不稳定的测试执行。这四个因素是:回溯覆盖率、片状频率、测试气味的数量和测试大小。基于提取的因素,MDFlaker使用k-最近邻(KNN)来确定失败的测试执行是否是片状的。我们在一个案例研究中调查了来自不同开源存储库的2166个测试执行。我们评估我们的片状检测工具的有效性。我们说明了如何使用多因素方法来揭示片状的根本原因,并对MDF lake和文献中提出的其他工具进行了定性比较。结果表明,不同因素的组合可用于片状试验的识别。每个因素都有自己的权衡,例如,回溯导致许多真阳性,而片状频率产生更多的真阴性。因此,特定的因素组合可以对信息有限的测试人员进行分类(例如,没有足够的测试历史信息)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Verification Assisted Gas Reduction for Smart Contracts Effective Bug Triage Based on a Hybrid Neural Network Learn To Align: A Code Alignment Network For Code Clone Detection Framework for Recommending Data Residency Compliant Application Architecture Degree doesn't Matter: Identifying the Drivers of Interaction in Software Development Ecosystems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1