LTM:基于语言模型的可扩展黑盒相似性测试套件最小化

IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-09-30 DOI:10.1109/TSE.2024.3469582
Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand
{"title":"LTM:基于语言模型的可扩展黑盒相似性测试套件最小化","authors":"Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand","doi":"10.1109/TSE.2024.3469582","DOIUrl":null,"url":null,"abstract":"Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (\n<i>FDR</i>\n) while the latter is faster. To address scalability while retaining a high \n<i>FDR</i>\n, we propose LTM (\n<b>L</b>\nanguage model-based \n<b>T</b>\nest suite \n<b>M</b>\ninimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (\n<inline-formula><tex-math>$41.72\\%$</tex-math></inline-formula>\n versus \n<inline-formula><tex-math>$41.02\\%$</tex-math></inline-formula>\n, on average); (b) attaining a significantly higher fault detection rate (\n<inline-formula><tex-math>$0.84$</tex-math></inline-formula>\n versus \n<inline-formula><tex-math>$0.81$</tex-math></inline-formula>\n, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"3053-3070"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10697930","citationCount":"0","resultStr":"{\"title\":\"LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models\",\"authors\":\"Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand\",\"doi\":\"10.1109/TSE.2024.3469582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (\\n<i>FDR</i>\\n) while the latter is faster. To address scalability while retaining a high \\n<i>FDR</i>\\n, we propose LTM (\\n<b>L</b>\\nanguage model-based \\n<b>T</b>\\nest suite \\n<b>M</b>\\ninimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (\\n<inline-formula><tex-math>$41.72\\\\%$</tex-math></inline-formula>\\n versus \\n<inline-formula><tex-math>$41.02\\\\%$</tex-math></inline-formula>\\n, on average); (b) attaining a significantly higher fault detection rate (\\n<inline-formula><tex-math>$0.84$</tex-math></inline-formula>\\n versus \\n<inline-formula><tex-math>$0.81$</tex-math></inline-formula>\\n, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"50 11\",\"pages\":\"3053-3070\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10697930\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10697930/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10697930/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

随着软件的发展,测试套件往往会不断增加,因此往往无法用分配的测试预算执行所有测试用例,尤其是大型软件系统。测试套件最小化(TSM)通过删除多余的测试用例来提高软件测试的效率,从而减少测试时间和资源,同时保持测试套件的故障检测能力。大多数现有的 TSM 方法都依赖于代码覆盖率(白盒)或基于模型的功能,而测试工程师并非总能获得这些功能。最近提出的 TSM 方法仅依赖于测试代码(黑盒),如 ATM 和 FAST-R。前者的故障检测率(FDR)更高,后者更快。为了在保持高 FDR 的同时解决可扩展性问题,我们提出了 LTM(基于语言模型的测试套件最小化),这是一种基于大型语言模型(LLM)的新颖、可扩展、基于黑盒相似性的 TSM 方法,也是 LLM 在 TSM 中的首次应用。为了支持使用测试方法嵌入进行相似性测量,我们研究了五种不同的预训练语言模型:CodeBERT、GraphCodeBERT、UniXcoder、StarEncoder 和 CodeLlama:余弦相似度和欧氏距离。我们的目标是找到不仅计算效率更高,而且能更好地指导遗传算法(GA)的相似性度量,遗传算法用于搜索最优的最小化测试套件,从而减少整体搜索时间。实验结果表明,LTM 的最佳配置(UniXcoder/Cosine)在以下三个方面优于 ATM:(a) 测试时间节省率略高(平均为 41.72 美元/%$,而 ATM 为 41.02 美元/%$);(b) 故障检测率显著提高(平均为 0.84 美元/%$,而 ATM 为 0.81 美元/%$);最重要的是,(c) 最小化测试套件的速度平均提高了近五倍,对于较大的测试套件和系统,提高的幅度更大,从而实现了更高的可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models
Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates ( FDR ) while the latter is faster. To address scalability while retaining a high FDR , we propose LTM ( L anguage model-based T est suite M inimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time ( $41.72\%$ versus $41.02\%$ , on average); (b) attaining a significantly higher fault detection rate ( $0.84$ versus $0.81$ , on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
期刊最新文献
Triple Peak Day: Work Rhythms of Software Developers in Hybrid Work GenProgJS: a Baseline System for Test-based Automated Repair of JavaScript Programs On Inter-dataset Code Duplication and Data Leakage in Large Language Models Line-Level Defect Prediction by Capturing Code Contexts with Graph Convolutional Networks Does Treatment Adherence Impact Experiment Results in TDD?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1