{"title":"LTM:基于语言模型的可扩展黑盒相似性测试套件最小化","authors":"Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand","doi":"10.1109/TSE.2024.3469582","DOIUrl":null,"url":null,"abstract":"Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (\n<i>FDR</i>\n) while the latter is faster. To address scalability while retaining a high \n<i>FDR</i>\n, we propose LTM (\n<b>L</b>\nanguage model-based \n<b>T</b>\nest suite \n<b>M</b>\ninimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (\n<inline-formula><tex-math>$41.72\\%$</tex-math></inline-formula>\n versus \n<inline-formula><tex-math>$41.02\\%$</tex-math></inline-formula>\n, on average); (b) attaining a significantly higher fault detection rate (\n<inline-formula><tex-math>$0.84$</tex-math></inline-formula>\n versus \n<inline-formula><tex-math>$0.81$</tex-math></inline-formula>\n, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"3053-3070"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10697930","citationCount":"0","resultStr":"{\"title\":\"LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models\",\"authors\":\"Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand\",\"doi\":\"10.1109/TSE.2024.3469582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (\\n<i>FDR</i>\\n) while the latter is faster. To address scalability while retaining a high \\n<i>FDR</i>\\n, we propose LTM (\\n<b>L</b>\\nanguage model-based \\n<b>T</b>\\nest suite \\n<b>M</b>\\ninimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (\\n<inline-formula><tex-math>$41.72\\\\%$</tex-math></inline-formula>\\n versus \\n<inline-formula><tex-math>$41.02\\\\%$</tex-math></inline-formula>\\n, on average); (b) attaining a significantly higher fault detection rate (\\n<inline-formula><tex-math>$0.84$</tex-math></inline-formula>\\n versus \\n<inline-formula><tex-math>$0.81$</tex-math></inline-formula>\\n, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"50 11\",\"pages\":\"3053-3070\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10697930\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10697930/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10697930/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models
Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (
FDR
) while the latter is faster. To address scalability while retaining a high
FDR
, we propose LTM (
L
anguage model-based
T
est suite
M
inimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (
$41.72\%$
versus
$41.02\%$
, on average); (b) attaining a significantly higher fault detection rate (
$0.84$
versus
$0.81$
, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.