Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning

IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-08-27 DOI:10.1109/TSE.2024.3449917
Weifeng Sun;Zhenting Guo;Meng Yan;Zhongxin Liu;Yan Lei;Hongyu Zhang
{"title":"Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning","authors":"Weifeng Sun;Zhenting Guo;Meng Yan;Zhongxin Liu;Yan Lei;Hongyu Zhang","doi":"10.1109/TSE.2024.3449917","DOIUrl":null,"url":null,"abstract":"Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: ① Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. ② Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named \n<sc>TestLinker</small>\n. For the first challenge of existing static approaches, \n<sc>TestLinker</small>\n introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the \n<italic>semantic correlation learning</i>\n, which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). \n<sc>TestLinker</small>\n further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that \n<sc>TestLinker</small>\n significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, \n<sc>TestLinker</small>\n, which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2656-2676"},"PeriodicalIF":6.5000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10648982/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: ① Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. ② Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named TestLinker . For the first challenge of existing static approaches, TestLinker introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the semantic correlation learning , which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). TestLinker further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that TestLinker significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, TestLinker , which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过语义相关性学习构建方法级测试到代码的可追溯性链接
测试到代码的可追溯性链接(TCTL)在测试工件和代码工件之间建立链接。这些链接使开发人员和测试人员能够快速识别特定测试用例所测试的特定代码片段,从而促进更高效的调试、回归测试和维护活动。基于不同的概念,人们提出了各种方法来建立方法级 TCTL,特别是将单元测试与相应的焦点方法联系起来。静态方法,如基于命名规则的方法,使用基于启发式和相似性的策略。然而,这类方法面临以下挑战:开发人员在特定场景和开发要求的驱动下,可能会偏离命名约定,导致 TCTL 识别失败。静态方法往往会忽略测试中蕴含的丰富语义,导致测试与语义无关的代码片段之间产生错误关联。尽管动态方法取得了可喜的成果,但它们要求项目可编译、测试可执行,从而限制了其可用性。对于需要大量测试代码对的下游任务来说,这一限制非常重要,因为并非所有项目都能满足这些要求。为了解决上述限制,我们提出了一种新颖的静态方法级 TCTL 方法,命名为 TestLinker。针对现有静态方法面临的第一个挑战,TestLinker 引入了两阶段 TCTL 框架,以分流方式适应不同的项目类型。针对第二个挑战,我们采用了语义相关性学习方法,即基于预训练代码模型(PCM)学习并建立测试与焦点方法之间的语义相关性。TestLinker 还进一步建立了映射规则,以准确地将推荐函数名称与具体的生产函数声明联系起来。在精心标注的数据集上进行的经验评估显示,TestLinker 的性能明显优于传统的静态技术,平均 F1 分数提高了 73.48% 到 202.00%。此外,与最先进的动态方法相比,仅利用静态信息的 TestLinker 表现出了相当甚至更好的性能,平均 F1 分数提高了 37.40%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
期刊最新文献
On-the-Fly Syntax Highlighting: Generalisation and Speed-ups Triple Peak Day: Work Rhythms of Software Developers in Hybrid Work GenProgJS: a Baseline System for Test-based Automated Repair of JavaScript Programs On Inter-dataset Code Duplication and Data Leakage in Large Language Models Line-Level Defect Prediction by Capturing Code Contexts with Graph Convolutional Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1