LTM：基于语言模型的可扩展黑盒相似性测试套件最小化

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-09-30 DOI:10.1109/TSE.2024.3469582

Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand

{"title":"LTM：基于语言模型的可扩展黑盒相似性测试套件最小化","authors":"Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand","doi":"10.1109/TSE.2024.3469582","DOIUrl":null,"url":null,"abstract":"Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (\nFDR\n) while the latter is faster. To address scalability while retaining a high \nFDR\n, we propose LTM (\nL\nanguage model-based \nT\nest suite \nM\ninimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (\n<inline-formula><tex-math>$41.72\\%$</tex-math></inline-formula>\n versus \n<inline-formula><tex-math>$41.02\\%$</tex-math></inline-formula>\n, on average); (b) attaining a significantly higher fault detection rate (\n<inline-formula><tex-math>$0.84$</tex-math></inline-formula>\n versus \n<inline-formula><tex-math>$0.81$</tex-math></inline-formula>\n, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"3053-3070"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10697930","citationCount":"0","resultStr":"{\"title\":\"LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models\",\"authors\":\"Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand\",\"doi\":\"10.1109/TSE.2024.3469582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (\\nFDR\\n) while the latter is faster. To address scalability while retaining a high \\nFDR\\n, we propose LTM (\\nL\\nanguage model-based \\nT\\nest suite \\nM\\ninimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (\\n<inline-formula><tex-math>$41.72\\\\%$</tex-math></inline-formula>\\n versus \\n<inline-formula><tex-math>$41.02\\\\%$</tex-math></inline-formula>\\n, on average); (b) attaining a significantly higher fault detection rate (\\n<inline-formula><tex-math>$0.84$</tex-math></inline-formula>\\n versus \\n<inline-formula><tex-math>$0.81$</tex-math></inline-formula>\\n, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"50 11\",\"pages\":\"3053-3070\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10697930\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10697930/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10697930/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

随着软件的发展，测试套件往往会不断增加，因此往往无法用分配的测试预算执行所有测试用例，尤其是大型软件系统。测试套件最小化（TSM）通过删除多余的测试用例来提高软件测试的效率，从而减少测试时间和资源，同时保持测试套件的故障检测能力。大多数现有的 TSM 方法都依赖于代码覆盖率（白盒）或基于模型的功能，而测试工程师并非总能获得这些功能。最近提出的 TSM 方法仅依赖于测试代码（黑盒），如 ATM 和 FAST-R。前者的故障检测率（FDR）更高，后者更快。为了在保持高 FDR 的同时解决可扩展性问题，我们提出了 LTM（基于语言模型的测试套件最小化），这是一种基于大型语言模型（LLM）的新颖、可扩展、基于黑盒相似性的 TSM 方法，也是 LLM 在 TSM 中的首次应用。为了支持使用测试方法嵌入进行相似性测量，我们研究了五种不同的预训练语言模型：CodeBERT、GraphCodeBERT、UniXcoder、StarEncoder 和 CodeLlama：余弦相似度和欧氏距离。我们的目标是找到不仅计算效率更高，而且能更好地指导遗传算法（GA）的相似性度量，遗传算法用于搜索最优的最小化测试套件，从而减少整体搜索时间。实验结果表明，LTM 的最佳配置（UniXcoder/Cosine）在以下三个方面优于 ATM：(a) 测试时间节省率略高（平均为 41.72 美元/%$，而 ATM 为 41.02 美元/%$）；(b) 故障检测率显著提高（平均为 0.84 美元/%$，而 ATM 为 0.81 美元/%$）；最重要的是，(c) 最小化测试套件的速度平均提高了近五倍，对于较大的测试套件和系统，提高的幅度更大，从而实现了更高的可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models

Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates ( FDR ) while the latter is faster. To address scalability while retaining a high FDR , we propose LTM ( L anguage model-based T est suite M inimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (

$41.72\%$

versus

$41.02\%$

, on average); (b) attaining a significantly higher fault detection rate (

$0.84$

versus

$0.81$

, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.