测试和环境复杂性会增加缺陷吗?SAP HANA 的实证研究

Alexander Berndt, Thomas Bach, Sebastian Baltes
{"title":"测试和环境复杂性会增加缺陷吗?SAP HANA 的实证研究","authors":"Alexander Berndt, Thomas Bach, Sebastian Baltes","doi":"arxiv-2409.10062","DOIUrl":null,"url":null,"abstract":"Background: Test flakiness is a major problem in the software industry. Flaky\ntests fail seemingly at random without changes to the code and thus impede\ncontinuous integration (CI). Some researchers argue that all tests can be\nconsidered flaky and that tests only differ in their frequency of flaky\nfailures. Aims: With the goal of developing mitigation strategies to reduce the\nnegative impact of test flakiness, we study characteristics of tests and the\ntest environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a\n12-week period: one based on production data, the other based on targeted test\nexecutions from a dedicated flakiness experiment. We conduct correlation\nanalysis for test and test environment characteristics with respect to their\ninfluence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest\npositive correlation with the test flakiness rate (r = 0.79), which confirms\nprevious studies. Potential reasons for higher flakiness include the larger\ntest scope of long-running tests or test executions on a slower test\ninfrastructure. Interestingly, the load on the testing infrastructure was not\ncorrelated with test flakiness. The relationship between test flakiness and\nrequired resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running\ntests can be an important measure for practitioners to cope with test\nflakiness, as it enables parallelization of test executions and also reduces\nthe cost of re-executions. This effectively decreases the negative effects of\ntest flakiness in complex testing environments. However, when splitting\nlong-running tests, practitioners need to consider the potential test setup\noverhead of test splits.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA\",\"authors\":\"Alexander Berndt, Thomas Bach, Sebastian Baltes\",\"doi\":\"arxiv-2409.10062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Test flakiness is a major problem in the software industry. Flaky\\ntests fail seemingly at random without changes to the code and thus impede\\ncontinuous integration (CI). Some researchers argue that all tests can be\\nconsidered flaky and that tests only differ in their frequency of flaky\\nfailures. Aims: With the goal of developing mitigation strategies to reduce the\\nnegative impact of test flakiness, we study characteristics of tests and the\\ntest environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a\\n12-week period: one based on production data, the other based on targeted test\\nexecutions from a dedicated flakiness experiment. We conduct correlation\\nanalysis for test and test environment characteristics with respect to their\\ninfluence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest\\npositive correlation with the test flakiness rate (r = 0.79), which confirms\\nprevious studies. Potential reasons for higher flakiness include the larger\\ntest scope of long-running tests or test executions on a slower test\\ninfrastructure. Interestingly, the load on the testing infrastructure was not\\ncorrelated with test flakiness. The relationship between test flakiness and\\nrequired resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running\\ntests can be an important measure for practitioners to cope with test\\nflakiness, as it enables parallelization of test executions and also reduces\\nthe cost of re-executions. This effectively decreases the negative effects of\\ntest flakiness in complex testing environments. However, when splitting\\nlong-running tests, practitioners need to consider the potential test setup\\noverhead of test splits.\",\"PeriodicalId\":501278,\"journal\":{\"name\":\"arXiv - CS - Software Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景介绍测试缺陷是软件行业的一个主要问题。在不修改代码的情况下,虚假测试似乎是随机失败的,因此阻碍了持续集成(CI)。一些研究人员认为,所有测试都可以被视为缺陷测试,只是缺陷测试失败的频率不同而已。目的:为了制定缓解策略以降低测试易错性的负面影响,我们研究了可能影响测试易错性的测试和测试环境的特征。研究方法:我们根据 SAP HANA 在 12 周内的测试结果构建了两个数据集:一个数据集基于生产数据,另一个数据集基于专门的弱点实验中的目标测试执行。我们就测试和测试环境特征对片状测试失败频率的影响进行了相关性分析。研究结果在我们的研究中,平均测试执行时间与测试易错率的正相关性最强(r = 0.79),这证实了之前的研究。造成测试不稳定率较高的潜在原因包括长期运行测试的测试范围较大,或测试在速度较慢的测试基础设施上执行。有趣的是,测试基础设施的负载与测试易错性无关。测试易损性与测试执行所需资源之间的关系尚无定论。结论根据我们的研究结果,我们得出结论:对于从业人员来说,拆分长期运行的测试是应对测试易损性的一项重要措施,因为它可以实现测试执行的并行化,还能降低重新执行的成本。在复杂的测试环境中,这能有效降低测试松散性的负面影响。然而,在拆分长期运行的测试时,实践者需要考虑测试拆分可能带来的测试设置开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA
Background: Test flakiness is a major problem in the software industry. Flaky tests fail seemingly at random without changes to the code and thus impede continuous integration (CI). Some researchers argue that all tests can be considered flaky and that tests only differ in their frequency of flaky failures. Aims: With the goal of developing mitigation strategies to reduce the negative impact of test flakiness, we study characteristics of tests and the test environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a 12-week period: one based on production data, the other based on targeted test executions from a dedicated flakiness experiment. We conduct correlation analysis for test and test environment characteristics with respect to their influence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest positive correlation with the test flakiness rate (r = 0.79), which confirms previous studies. Potential reasons for higher flakiness include the larger test scope of long-running tests or test executions on a slower test infrastructure. Interestingly, the load on the testing infrastructure was not correlated with test flakiness. The relationship between test flakiness and required resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running tests can be an important measure for practitioners to cope with test flakiness, as it enables parallelization of test executions and also reduces the cost of re-executions. This effectively decreases the negative effects of test flakiness in complex testing environments. However, when splitting long-running tests, practitioners need to consider the potential test setup overhead of test splits.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization Shannon Entropy is better Feature than Category and Sentiment in User Feedback Processing Motivations, Challenges, Best Practices, and Benefits for Bots and Conversational Agents in Software Engineering: A Multivocal Literature Review A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems Investigating team maturity in an agile automotive reorganization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1