计算资源对片断测试的影响

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-09-18 DOI:10.1109/TSE.2024.3462251

Denini Silva;Martin Gruber;Satyajit Gokhale;Ellen Arteca;Alexi Turcotte;Marcelo d’Amorim;Wing Lam;Stefan Winter;Jonathan Bell

{"title":"计算资源对片断测试的影响","authors":"Denini Silva;Martin Gruber;Satyajit Gokhale;Ellen Arteca;Alexi Turcotte;Marcelo d’Amorim;Wing Lam;Stefan Winter;Jonathan Bell","doi":"10.1109/TSE.2024.3462251","DOIUrl":null,"url":null,"abstract":"Flaky tests are tests that non-deterministically pass and fail in unchanged code. These tests can be detrimental to developers’ productivity. Particularly when tests run in continuous integration environments, the tests may be competing for access to limited computational resources (CPUs, memory etc.), and we hypothesize that resource (un)-availability may be a significant factor in the failure rate of flaky tests. We present the first assessment of the impact that computational resources have on flaky tests, including a total of 52 projects written in Java, JavaScript and Python, and 27 different resource configurations. Using a rigorous statistical methodology, we determine which tests are RAFTs (Resource-Affected Flaky Tests). We find that 46.5% of the flaky tests in our dataset are RAFTs, indicating that a substantial proportion of flaky-test failures happen depending on the resources available when running tests. We report RAFTs and configurations to avoid them to developers, and received interest to either fix the RAFTs or to improve the specifications of the projects so that tests would be run only in configurations that are unlikely to encounter RAFT failures. Although most test suites in our dataset are executed quite quickly (under one minute) in a baseline configuration, our results highlight the possibility of using this methodology to detect RAFT to reduce the cost of cloud infrastructure for reliably running larger test suites.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 12","pages":"3104-3121"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10682606","citationCount":"0","resultStr":"{\"title\":\"The Effects of Computational Resources on Flaky Tests\",\"authors\":\"Denini Silva;Martin Gruber;Satyajit Gokhale;Ellen Arteca;Alexi Turcotte;Marcelo d’Amorim;Wing Lam;Stefan Winter;Jonathan Bell\",\"doi\":\"10.1109/TSE.2024.3462251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Flaky tests are tests that non-deterministically pass and fail in unchanged code. These tests can be detrimental to developers’ productivity. Particularly when tests run in continuous integration environments, the tests may be competing for access to limited computational resources (CPUs, memory etc.), and we hypothesize that resource (un)-availability may be a significant factor in the failure rate of flaky tests. We present the first assessment of the impact that computational resources have on flaky tests, including a total of 52 projects written in Java, JavaScript and Python, and 27 different resource configurations. Using a rigorous statistical methodology, we determine which tests are RAFTs (Resource-Affected Flaky Tests). We find that 46.5% of the flaky tests in our dataset are RAFTs, indicating that a substantial proportion of flaky-test failures happen depending on the resources available when running tests. We report RAFTs and configurations to avoid them to developers, and received interest to either fix the RAFTs or to improve the specifications of the projects so that tests would be run only in configurations that are unlikely to encounter RAFT failures. Although most test suites in our dataset are executed quite quickly (under one minute) in a baseline configuration, our results highlight the possibility of using this methodology to detect RAFT to reduce the cost of cloud infrastructure for reliably running larger test suites.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"50 12\",\"pages\":\"3104-3121\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10682606\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10682606/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10682606/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

不稳定的测试是在未更改的代码中不确定地通过和失败的测试。这些测试可能对开发人员的生产力有害。特别是当测试在持续集成环境中运行时，测试可能会竞争访问有限的计算资源（cpu、内存等），并且我们假设资源（非）可用性可能是导致零散测试失败率的一个重要因素。我们首次评估了计算资源对不稳定测试的影响，包括总共52个用Java、JavaScript和Python编写的项目，以及27种不同的资源配置。使用严格的统计方法，我们确定哪些测试是raft（受资源影响的片状测试）。我们发现数据集中46.5%的片状测试是raft，这表明在运行测试时，根据可用资源的不同，片状测试失败的发生比例很大。我们报告RAFT和配置是为了避免开发人员使用它们，并且收到了修复RAFT或改进项目规范的兴趣，以便测试只在不太可能遇到RAFT失败的配置中运行。尽管我们数据集中的大多数测试套件在基线配置中执行得相当快（不到一分钟），但我们的结果强调了使用这种方法检测RAFT以降低可靠运行大型测试套件的云基础设施成本的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Effects of Computational Resources on Flaky Tests

Flaky tests are tests that non-deterministically pass and fail in unchanged code. These tests can be detrimental to developers’ productivity. Particularly when tests run in continuous integration environments, the tests may be competing for access to limited computational resources (CPUs, memory etc.), and we hypothesize that resource (un)-availability may be a significant factor in the failure rate of flaky tests. We present the first assessment of the impact that computational resources have on flaky tests, including a total of 52 projects written in Java, JavaScript and Python, and 27 different resource configurations. Using a rigorous statistical methodology, we determine which tests are RAFTs (Resource-Affected Flaky Tests). We find that 46.5% of the flaky tests in our dataset are RAFTs, indicating that a substantial proportion of flaky-test failures happen depending on the resources available when running tests. We report RAFTs and configurations to avoid them to developers, and received interest to either fix the RAFTs or to improve the specifications of the projects so that tests would be run only in configurations that are unlikely to encounter RAFT failures. Although most test suites in our dataset are executed quite quickly (under one minute) in a baseline configuration, our results highlight the possibility of using this methodology to detect RAFT to reduce the cost of cloud infrastructure for reliably running larger test suites.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.

期刊最新文献

Towards measurement-based Software Engineering A Personal Retrospective on Symbolic Execution Trustworthy Distributed Certification of Program Execution Search-based DNN Testing and Retraining with GAN-enhanced Simulations Automated Test Case Repair Using Language Models