{"title":"测试和环境复杂性会增加缺陷吗?SAP HANA 的实证研究","authors":"Alexander Berndt, Thomas Bach, Sebastian Baltes","doi":"arxiv-2409.10062","DOIUrl":null,"url":null,"abstract":"Background: Test flakiness is a major problem in the software industry. Flaky\ntests fail seemingly at random without changes to the code and thus impede\ncontinuous integration (CI). Some researchers argue that all tests can be\nconsidered flaky and that tests only differ in their frequency of flaky\nfailures. Aims: With the goal of developing mitigation strategies to reduce the\nnegative impact of test flakiness, we study characteristics of tests and the\ntest environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a\n12-week period: one based on production data, the other based on targeted test\nexecutions from a dedicated flakiness experiment. We conduct correlation\nanalysis for test and test environment characteristics with respect to their\ninfluence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest\npositive correlation with the test flakiness rate (r = 0.79), which confirms\nprevious studies. Potential reasons for higher flakiness include the larger\ntest scope of long-running tests or test executions on a slower test\ninfrastructure. Interestingly, the load on the testing infrastructure was not\ncorrelated with test flakiness. The relationship between test flakiness and\nrequired resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running\ntests can be an important measure for practitioners to cope with test\nflakiness, as it enables parallelization of test executions and also reduces\nthe cost of re-executions. This effectively decreases the negative effects of\ntest flakiness in complex testing environments. However, when splitting\nlong-running tests, practitioners need to consider the potential test setup\noverhead of test splits.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA\",\"authors\":\"Alexander Berndt, Thomas Bach, Sebastian Baltes\",\"doi\":\"arxiv-2409.10062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Test flakiness is a major problem in the software industry. Flaky\\ntests fail seemingly at random without changes to the code and thus impede\\ncontinuous integration (CI). Some researchers argue that all tests can be\\nconsidered flaky and that tests only differ in their frequency of flaky\\nfailures. Aims: With the goal of developing mitigation strategies to reduce the\\nnegative impact of test flakiness, we study characteristics of tests and the\\ntest environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a\\n12-week period: one based on production data, the other based on targeted test\\nexecutions from a dedicated flakiness experiment. We conduct correlation\\nanalysis for test and test environment characteristics with respect to their\\ninfluence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest\\npositive correlation with the test flakiness rate (r = 0.79), which confirms\\nprevious studies. Potential reasons for higher flakiness include the larger\\ntest scope of long-running tests or test executions on a slower test\\ninfrastructure. Interestingly, the load on the testing infrastructure was not\\ncorrelated with test flakiness. The relationship between test flakiness and\\nrequired resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running\\ntests can be an important measure for practitioners to cope with test\\nflakiness, as it enables parallelization of test executions and also reduces\\nthe cost of re-executions. This effectively decreases the negative effects of\\ntest flakiness in complex testing environments. However, when splitting\\nlong-running tests, practitioners need to consider the potential test setup\\noverhead of test splits.\",\"PeriodicalId\":501278,\"journal\":{\"name\":\"arXiv - CS - Software Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
背景介绍测试缺陷是软件行业的一个主要问题。在不修改代码的情况下,虚假测试似乎是随机失败的,因此阻碍了持续集成(CI)。一些研究人员认为,所有测试都可以被视为缺陷测试,只是缺陷测试失败的频率不同而已。目的:为了制定缓解策略以降低测试易错性的负面影响,我们研究了可能影响测试易错性的测试和测试环境的特征。研究方法:我们根据 SAP HANA 在 12 周内的测试结果构建了两个数据集:一个数据集基于生产数据,另一个数据集基于专门的弱点实验中的目标测试执行。我们就测试和测试环境特征对片状测试失败频率的影响进行了相关性分析。研究结果在我们的研究中,平均测试执行时间与测试易错率的正相关性最强(r = 0.79),这证实了之前的研究。造成测试不稳定率较高的潜在原因包括长期运行测试的测试范围较大,或测试在速度较慢的测试基础设施上执行。有趣的是,测试基础设施的负载与测试易错性无关。测试易损性与测试执行所需资源之间的关系尚无定论。结论根据我们的研究结果,我们得出结论:对于从业人员来说,拆分长期运行的测试是应对测试易损性的一项重要措施,因为它可以实现测试执行的并行化,还能降低重新执行的成本。在复杂的测试环境中,这能有效降低测试松散性的负面影响。然而,在拆分长期运行的测试时,实践者需要考虑测试拆分可能带来的测试设置开销。
Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA
Background: Test flakiness is a major problem in the software industry. Flaky
tests fail seemingly at random without changes to the code and thus impede
continuous integration (CI). Some researchers argue that all tests can be
considered flaky and that tests only differ in their frequency of flaky
failures. Aims: With the goal of developing mitigation strategies to reduce the
negative impact of test flakiness, we study characteristics of tests and the
test environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a
12-week period: one based on production data, the other based on targeted test
executions from a dedicated flakiness experiment. We conduct correlation
analysis for test and test environment characteristics with respect to their
influence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest
positive correlation with the test flakiness rate (r = 0.79), which confirms
previous studies. Potential reasons for higher flakiness include the larger
test scope of long-running tests or test executions on a slower test
infrastructure. Interestingly, the load on the testing infrastructure was not
correlated with test flakiness. The relationship between test flakiness and
required resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running
tests can be an important measure for practitioners to cope with test
flakiness, as it enables parallelization of test executions and also reduces
the cost of re-executions. This effectively decreases the negative effects of
test flakiness in complex testing environments. However, when splitting
long-running tests, practitioners need to consider the potential test setup
overhead of test splits.