On the use of test smells for prediction of flaky tests

Proceedings of the 6th Brazilian Symposium on Systematic and Automated Software Testing Pub Date : 2021-08-26 DOI:10.1145/3482909.3482916

Bruno Henrique Pachulski Camara, Marco Aurélio Graciotto Silva, A. T. Endo, S. Vergilio

{"title":"On the use of test smells for prediction of flaky tests","authors":"Bruno Henrique Pachulski Camara, Marco Aurélio Graciotto Silva, A. T. Endo, S. Vergilio","doi":"10.1145/3482909.3482916","DOIUrl":null,"url":null,"abstract":"Regression testing is an important phase to deliver software with quality. However, flaky tests hamper the evaluation of test results and can increase costs. This is because a flaky test may pass or fail non-deterministically and to identify properly the flakiness of a test requires rerunning the test suite multiple times. To cope with this challenge, approaches have been proposed based on prediction models and machine learning. Existing approaches based on the use of the test case vocabulary may be context-sensitive and prone to overfitting, presenting low performance when executed in a cross-project scenario. To overcome these limitations, we investigate the use of test smells as predictors of flaky tests. We conducted an empirical study to understand if test smells have good performance as a classifier to predict the flakiness in the cross-project context, and analysed the information gain of each test smell. We also compared the test smell-based approach with the vocabulary-based one. As a result, we obtained a classifier that had a reasonable performance (Random Forest, 0.83%) to predict the flakiness in the testing phase. This classifier presented better performance than vocabulary-based model for cross-project prediction. The Assertion Roulette and Sleepy Test test smell types are the ones associated with the best information gain values.","PeriodicalId":355243,"journal":{"name":"Proceedings of the 6th Brazilian Symposium on Systematic and Automated Software Testing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th Brazilian Symposium on Systematic and Automated Software Testing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3482909.3482916","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Regression testing is an important phase to deliver software with quality. However, flaky tests hamper the evaluation of test results and can increase costs. This is because a flaky test may pass or fail non-deterministically and to identify properly the flakiness of a test requires rerunning the test suite multiple times. To cope with this challenge, approaches have been proposed based on prediction models and machine learning. Existing approaches based on the use of the test case vocabulary may be context-sensitive and prone to overfitting, presenting low performance when executed in a cross-project scenario. To overcome these limitations, we investigate the use of test smells as predictors of flaky tests. We conducted an empirical study to understand if test smells have good performance as a classifier to predict the flakiness in the cross-project context, and analysed the information gain of each test smell. We also compared the test smell-based approach with the vocabulary-based one. As a result, we obtained a classifier that had a reasonable performance (Random Forest, 0.83%) to predict the flakiness in the testing phase. This classifier presented better performance than vocabulary-based model for cross-project prediction. The Assertion Roulette and Sleepy Test test smell types are the ones associated with the best information gain values.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用试验气味预测片状试验

回归测试是交付高质量软件的重要阶段。然而，不可靠的测试妨碍了对测试结果的评估，并可能增加成本。这是因为不可靠的测试可能不确定地通过或失败，并且为了正确识别测试的不可靠，需要多次运行测试套件。为了应对这一挑战，人们提出了基于预测模型和机器学习的方法。基于使用测试用例词汇表的现有方法可能是上下文敏感的，并且容易过度拟合，在跨项目场景中执行时表现出较低的性能。为了克服这些限制，我们研究了使用测试气味作为片状测试的预测因子。我们进行了实证研究，以了解测试气味作为分类器在跨项目背景下是否具有良好的性能，并分析了每种测试气味的信息增益。我们还比较了基于气味的测试方法和基于词汇的测试方法。因此，我们获得了一个具有合理性能的分类器(Random Forest, 0.83%)来预测测试阶段的片状性。该分类器在跨项目预测中表现出比基于词汇表的模型更好的性能。断言轮盘赌和困倦测试测试气味类型与最佳信息增益值相关联。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 6th Brazilian Symposium on Systematic and Automated Software Testing

自引率

0.00%

发文量