通过语义相关性学习构建方法级测试到代码的可追溯性链接

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-08-27 DOI:10.1109/TSE.2024.3449917

Weifeng Sun;Zhenting Guo;Meng Yan;Zhongxin Liu;Yan Lei;Hongyu Zhang

{"title":"通过语义相关性学习构建方法级测试到代码的可追溯性链接","authors":"Weifeng Sun;Zhenting Guo;Meng Yan;Zhongxin Liu;Yan Lei;Hongyu Zhang","doi":"10.1109/TSE.2024.3449917","DOIUrl":null,"url":null,"abstract":"Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: ① Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. ② Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named \n<sc>TestLinker\n. For the first challenge of existing static approaches, \n<sc>TestLinker\n introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the \n<italic>semantic correlation learning\n, which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). \n<sc>TestLinker\n further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that \n<sc>TestLinker\n significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, \n<sc>TestLinker\n, which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2656-2676"},"PeriodicalIF":6.5000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning\",\"authors\":\"Weifeng Sun;Zhenting Guo;Meng Yan;Zhongxin Liu;Yan Lei;Hongyu Zhang\",\"doi\":\"10.1109/TSE.2024.3449917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: ① Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. ② Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named \\n<sc>TestLinker\\n. For the first challenge of existing static approaches, \\n<sc>TestLinker\\n introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the \\n<italic>semantic correlation learning\\n, which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). \\n<sc>TestLinker\\n further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that \\n<sc>TestLinker\\n significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, \\n<sc>TestLinker\\n, which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"50 10\",\"pages\":\"2656-2676\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10648982/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10648982/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

测试到代码的可追溯性链接（TCTL）在测试工件和代码工件之间建立链接。这些链接使开发人员和测试人员能够快速识别特定测试用例所测试的特定代码片段，从而促进更高效的调试、回归测试和维护活动。基于不同的概念，人们提出了各种方法来建立方法级 TCTL，特别是将单元测试与相应的焦点方法联系起来。静态方法，如基于命名规则的方法，使用基于启发式和相似性的策略。然而，这类方法面临以下挑战：开发人员在特定场景和开发要求的驱动下，可能会偏离命名约定，导致 TCTL 识别失败。静态方法往往会忽略测试中蕴含的丰富语义，导致测试与语义无关的代码片段之间产生错误关联。尽管动态方法取得了可喜的成果，但它们要求项目可编译、测试可执行，从而限制了其可用性。对于需要大量测试代码对的下游任务来说，这一限制非常重要，因为并非所有项目都能满足这些要求。为了解决上述限制，我们提出了一种新颖的静态方法级 TCTL 方法，命名为 TestLinker。针对现有静态方法面临的第一个挑战，TestLinker 引入了两阶段 TCTL 框架，以分流方式适应不同的项目类型。针对第二个挑战，我们采用了语义相关性学习方法，即基于预训练代码模型（PCM）学习并建立测试与焦点方法之间的语义相关性。TestLinker 还进一步建立了映射规则，以准确地将推荐函数名称与具体的生产函数声明联系起来。在精心标注的数据集上进行的经验评估显示，TestLinker 的性能明显优于传统的静态技术，平均 F1 分数提高了 73.48% 到 202.00%。此外，与最先进的动态方法相比，仅利用静态信息的 TestLinker 表现出了相当甚至更好的性能，平均 F1 分数提高了 37.40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning

Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: ① Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. ② Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named TestLinker . For the first challenge of existing static approaches, TestLinker introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the semantic correlation learning , which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). TestLinker further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that TestLinker significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, TestLinker , which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.