首页 > 最新文献

IEEE Transactions on Software Engineering最新文献

英文 中文
Advancing LLM-Generated Code Reliability: A Hybrid Approach for Hallucination Detection 提高llm生成代码的可靠性:一种用于幻觉检测的混合方法
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-05 DOI: 10.1109/TSE.2025.3640641
Bo Yang;Jiayi Dang;Huai Liu;Zhi Jin
The increasing use of Large Language Models (LLMs) for writing code has raised important concerns about “code hallucinations.” These occur when the generated code looks correct in terms of its structure (syntax) but contains mistakes in its meaning or logic. Such errors can then spread through software, leading to problems and inefficiencies in the final applications. Current research on finding these code hallucinations in LLM output often struggles with inefficiency. It also lacks a good collection of test cases specifically designed to properly evaluate how well different detection methods work. To address these issues, we introduce a new approach that effectively combines static and dynamic analysis techniques for hallucination detection (SDHD). While standard methods often fail to spot code hallucinations, SDHD shows significant improvement in performance across various datasets. For example, when tested on the MBPP, CodeHaluEval, and HalluCode datasets, SDHD achieved an average precision of 0.771, an average recall of 0.783, and an average F1-score of 0.776. These results are not just slightly better, but substantially higher than those of existing methods, clearly demonstrating SDHD’s superior effectiveness in overcoming the limitations of current hallucination detection approaches.
越来越多地使用大型语言模型(llm)编写代码已经引起了对“代码幻觉”的重要关注。当生成的代码在结构(语法)上看起来正确,但在含义或逻辑上包含错误时,就会出现这种情况。这样的错误会通过软件传播,导致最终应用程序出现问题和效率低下。目前在LLM输出中寻找这些代码幻觉的研究经常与低效率作斗争。它还缺乏一组良好的测试用例,这些测试用例是专门设计用来正确评估不同检测方法的工作效果的。为了解决这些问题,我们介绍了一种新的方法,有效地结合了静态和动态分析技术的幻觉检测(SDHD)。虽然标准方法通常无法发现代码幻觉,但sddd在不同数据集的性能上有显着改善。例如,在MBPP、CodeHaluEval和HalluCode数据集上进行测试时,sddd的平均精度为0.771,平均召回率为0.783,平均f1分数为0.776。这些结果不仅略好,而且大大高于现有的方法,清楚地表明SDHD在克服当前幻觉检测方法的局限性方面具有优越的有效性。
{"title":"Advancing LLM-Generated Code Reliability: A Hybrid Approach for Hallucination Detection","authors":"Bo Yang;Jiayi Dang;Huai Liu;Zhi Jin","doi":"10.1109/TSE.2025.3640641","DOIUrl":"10.1109/TSE.2025.3640641","url":null,"abstract":"The increasing use of Large Language Models (LLMs) for writing code has raised important concerns about “code hallucinations.” These occur when the generated code looks correct in terms of its structure (syntax) but contains mistakes in its meaning or logic. Such errors can then spread through software, leading to problems and inefficiencies in the final applications. Current research on finding these code hallucinations in LLM output often struggles with inefficiency. It also lacks a good collection of test cases specifically designed to properly evaluate how well different detection methods work. To address these issues, we introduce a new approach that effectively combines static and dynamic analysis techniques for hallucination detection (SDHD). While standard methods often fail to spot code hallucinations, SDHD shows significant improvement in performance across various datasets. For example, when tested on the MBPP, CodeHaluEval, and HalluCode datasets, SDHD achieved an average precision of 0.771, an average recall of 0.783, and an average F1-score of 0.776. These results are not just slightly better, but substantially higher than those of existing methods, clearly demonstrating SDHD’s superior effectiveness in overcoming the limitations of current hallucination detection approaches.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"578-594"},"PeriodicalIF":5.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
δ-SCALPEL: Docker Image Slimming Based on Source Code Static Analysis δ-SCALPEL:基于源代码静态分析的Docker图像瘦身
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-04 DOI: 10.1109/TSE.2025.3640123
Jiaxuan Han;Cheng Huang;Jiayong Liu;Tianwei Zhang
Containerization is the mainstream of current software development, which enables software to be used across platforms without additional configuration of running environment. However, many images created by developers are redundant and contain unnecessary code, packages, and components. This excess not only leads to bloated images that are cumbersome to transmit and store but also increases the attack surface, making them more vulnerable to security threats. Therefore, image slimming has emerged as a significant area of interest. Nevertheless, existing image slimming technologies face challenges, particularly regarding the incomplete extraction of environment dependencies required by project code. In this paper, we present a novel image slimming model named $delta$-SCALPEL. This model employs static data dependency analysis to extract the environment dependencies of the project code and utilizes a directed graph named command link directed graph for modeling the image’s file system. We select 30 NPM projects and two official Docker Hub images to construct a dataset for evaluating $delta$-SCALPEL. The evaluation results show that $delta$-SCALPEL is robust and can reduce image sizes by up to 61.4% while ensuring the normal operation of these projects.
容器化是当前软件开发的主流,它使软件能够跨平台使用,而无需额外配置运行环境。但是,开发人员创建的许多映像都是冗余的,并且包含不必要的代码、包和组件。这种过剩不仅导致图像臃肿,传输和存储都很麻烦,而且还增加了攻击面,使它们更容易受到安全威胁。因此,形象瘦身已经成为一个重要的兴趣领域。然而,现有的图像瘦身技术面临着挑战,特别是在项目代码所需的环境依赖关系的不完整提取方面。本文提出了一种新的图像瘦身模型$delta$-SCALPEL。该模型使用静态数据依赖性分析来提取项目代码的环境依赖性,并使用名为命令链接有向图的有向图来建模图像的文件系统。我们选择了30个NPM项目和两个官方Docker Hub映像来构建一个用于评估$delta$-SCALPEL的数据集。评估结果表明,$delta$-SCALPEL具有鲁棒性,在保证项目正常运行的前提下,可将图像尺寸减小61.4%。
{"title":"δ-SCALPEL: Docker Image Slimming Based on Source Code Static Analysis","authors":"Jiaxuan Han;Cheng Huang;Jiayong Liu;Tianwei Zhang","doi":"10.1109/TSE.2025.3640123","DOIUrl":"https://doi.org/10.1109/TSE.2025.3640123","url":null,"abstract":"Containerization is the mainstream of current software development, which enables software to be used across platforms without additional configuration of running environment. However, many images created by developers are redundant and contain unnecessary code, packages, and components. This excess not only leads to bloated images that are cumbersome to transmit and store but also increases the attack surface, making them more vulnerable to security threats. Therefore, image slimming has emerged as a significant area of interest. Nevertheless, existing image slimming technologies face challenges, particularly regarding the incomplete extraction of environment dependencies required by project code. In this paper, we present a novel image slimming model named <inline-formula><tex-math>$delta$</tex-math></inline-formula>-SCALPEL. This model employs static data dependency analysis to extract the environment dependencies of the project code and utilizes a directed graph named command link directed graph for modeling the image’s file system. We select 30 NPM projects and two official Docker Hub images to construct a dataset for evaluating <inline-formula><tex-math>$delta$</tex-math></inline-formula>-SCALPEL. The evaluation results show that <inline-formula><tex-math>$delta$</tex-math></inline-formula>-SCALPEL is robust and can reduce image sizes by up to 61.4% while ensuring the normal operation of these projects.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"562-577"},"PeriodicalIF":5.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146162235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AC2Next: A Novel Model That Can Predict the Next Animation API by Fusing the Animation API Context and the UI Animation Task AC2Next:通过融合动画API上下文和UI动画任务来预测下一个动画API的新模型
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-03 DOI: 10.1109/TSE.2025.3637777
Shanquan Gao;Yihui Wang;Liyuan Tan;Zhenwei Ou;Xun Li
The Android platform provides a series of animation APIs, with which app developers can improve the implementation efficiency of UI animations—specifically, reducing the effort and time required to implement them. To assist app developers in quickly finding the suitable animation APIs, we have proposed two recommendation models called Animation2API and U-A2A. Animation2API has the capability to generate a list of available animation APIs for the UI animation task using the collaborative filtering algorithm. In contrast, U-A2A can encode both the animation API context and the UI animation task, and then predict the next animation API for the current animation implementation based on the joint encoding of the two modalities. Since U-A2A can provide real-time recommendations throughout the process of animation implementation, it is effective in assisting developers in using animation API resources. Nevertheless, U-A2A has three key limitations. First, its GRU encoder for the animation API context has difficulty in adequately capturing the long-distance dependencies and the global information. Second, its 3D CNN encoder for the UI animation task fails to effectively extract the long-distance dependencies between video frames and the spatiotemporal features at different scales. Third, U-A2A consistently treats the two modalities equally when fusing their encodings, despite the need to adaptively adjust their contribution levels according to the actual situation. To address these limitations, the paper introduces a novel animation API recommendation model named AC2Next. AC2Next adopts an encoder component based on the self-attention mechanism to encode the animation API context and the UI animation task. Specifically, it uses GRU with the self-attention mechanism as the encoder of the animation API context and applies ViViT, a Transformer architecture with self-attention mechanisms, to encode the UI animation task. Meanwhile, AC2Next utilizes its adaptive weight layer to assign appropriate weights to the animation API context and the UI animation task during the information fusion process. The experimental results show that AC2Next can outperform U-A2A in any stage of the animation implementation. When considering 1, 3, 5, and 10 animation APIs, AC2Next achieves an improvement of 31.56%, 10.01%, 5.57%, and 3.34% respectively in recommendation accuracy compared to U-A2A.
Android平台提供了一系列的动画api,应用开发者可以通过这些api来提高UI动画的实现效率,特别是减少实现动画所需的精力和时间。为了帮助应用开发者快速找到合适的动画api,我们提出了两个推荐模型Animation2API和U-A2A。Animation2API能够使用协同过滤算法为UI动画任务生成可用的动画api列表。相比之下,U-A2A可以对动画API上下文和UI动画任务进行编码,然后基于两种模式的联合编码,预测当前动画实现的下一个动画API。由于U-A2A可以在动画实现的整个过程中提供实时的建议,因此它可以有效地帮助开发人员使用动画API资源。然而,U-A2A有三个关键的限制。首先,其用于动画API上下文的GRU编码器难以充分捕获远距离依赖关系和全局信息。其次,其用于UI动画任务的3D CNN编码器未能有效提取视频帧与不同尺度时空特征之间的远距离依赖关系。第三,U-A2A在融合编码时一贯平等对待两种模式,尽管需要根据实际情况自适应调整其贡献水平。为了解决这些局限性,本文引入了一种新的动画API推荐模型AC2Next。AC2Next采用基于自关注机制的编码器组件对动画API上下文和UI动画任务进行编码。具体来说,它使用GRU和自关注机制作为动画API上下文的编码器,并应用ViViT(一个具有自关注机制的Transformer架构)对UI动画任务进行编码。同时,AC2Next利用其自适应权值层,在信息融合过程中为动画API上下文和UI动画任务分配适当的权值。实验结果表明,AC2Next在动画实现的任何阶段都优于U-A2A。在考虑1、3、5、10个动画api时,AC2Next的推荐准确率比U-A2A分别提高了31.56%、10.01%、5.57%、3.34%。
{"title":"AC2Next: A Novel Model That Can Predict the Next Animation API by Fusing the Animation API Context and the UI Animation Task","authors":"Shanquan Gao;Yihui Wang;Liyuan Tan;Zhenwei Ou;Xun Li","doi":"10.1109/TSE.2025.3637777","DOIUrl":"10.1109/TSE.2025.3637777","url":null,"abstract":"The Android platform provides a series of animation APIs, with which app developers can improve the implementation efficiency of UI animations—specifically, reducing the effort and time required to implement them. To assist app developers in quickly finding the suitable animation APIs, we have proposed two recommendation models called Animation2API and U-A2A. Animation2API has the capability to generate a list of available animation APIs for the UI animation task using the collaborative filtering algorithm. In contrast, U-A2A can encode both the animation API context and the UI animation task, and then predict the next animation API for the current animation implementation based on the joint encoding of the two modalities. Since U-A2A can provide real-time recommendations throughout the process of animation implementation, it is effective in assisting developers in using animation API resources. Nevertheless, U-A2A has three key limitations. First, its GRU encoder for the animation API context has difficulty in adequately capturing the long-distance dependencies and the global information. Second, its 3D CNN encoder for the UI animation task fails to effectively extract the long-distance dependencies between video frames and the spatiotemporal features at different scales. Third, U-A2A consistently treats the two modalities equally when fusing their encodings, despite the need to adaptively adjust their contribution levels according to the actual situation. To address these limitations, the paper introduces a novel animation API recommendation model named AC2Next. AC2Next adopts an encoder component based on the self-attention mechanism to encode the animation API context and the UI animation task. Specifically, it uses GRU with the self-attention mechanism as the encoder of the animation API context and applies ViViT, a Transformer architecture with self-attention mechanisms, to encode the UI animation task. Meanwhile, AC2Next utilizes its adaptive weight layer to assign appropriate weights to the animation API context and the UI animation task during the information fusion process. The experimental results show that AC2Next can outperform U-A2A in any stage of the animation implementation. When considering 1, 3, 5, and 10 animation APIs, AC2Next achieves an improvement of 31.56%, 10.01%, 5.57%, and 3.34% respectively in recommendation accuracy compared to U-A2A.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 1","pages":"22-35"},"PeriodicalIF":5.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145664435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shield Broken: Black-Box Adversarial Attacks on LLM-Based Vulnerability Detectors 盾破:基于llm漏洞检测器的黑盒对抗性攻击
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-02 DOI: 10.1109/TSE.2025.3638998
Yuan Jiang;Shan Huang;Christoph Treude;Xiaohong Su;Tiantian Wang
Vulnerability detection is critical for ensuring software security. Although deep learning (DL) methods, particularly those employing large language models (LLMs), have shown strong performance in automating vulnerability identification, they remain susceptible to adversarial examples, which are carefully crafted inputs with subtle perturbations designed to evade detection. Existing adversarial attack methods often require access to model architectures or confidence scores, making them impractical for real-world black-box systems. In this paper, we propose SVulAttack, a novel label-only adversarial attack framework targeting LLM-based vulnerability detectors. Our key innovation lies in a similarity-based strategy that estimates statement importance and model confidence, thereby enabling more effective selection of semantic-preserving code perturbations. SVulAttack combines this strategy with a transformation component and a search component, based on either greedy or genetic algorithms, to effectively identify and apply optimal combinations of transformations. We evaluate SVulAttack on open-source models (LineVul, StagedVulBERT, Code Llama, Deepseek-Coder) and closed-source models (GPT-5 nano, GPT-4o, GPT-4o-mini, Claude Sonnet 4). Results show that SVulAttack significantly outperforms existing label-only black-box attack methods. For example, against LineVul, our method with genetic algorithm achieves an attack success rate of 49.0%, improving over DIP and CODA by 150.0% and 240.3%, respectively.
漏洞检测是保证软件安全的关键。尽管深度学习(DL)方法,特别是那些采用大型语言模型(llm)的方法,在自动化漏洞识别方面表现出色,但它们仍然容易受到对抗性示例的影响,对抗性示例是精心制作的输入,带有微妙的扰动,旨在逃避检测。现有的对抗性攻击方法通常需要访问模型架构或置信度评分,这使得它们对于现实世界的黑盒系统来说不切实际。在本文中,我们提出了SVulAttack,一种针对基于llm的漏洞检测器的新型纯标签对抗性攻击框架。我们的关键创新在于基于相似性的策略,该策略估计语句重要性和模型置信度,从而能够更有效地选择保持语义的代码扰动。SVulAttack将此策略与基于贪婪算法或遗传算法的转换组件和搜索组件相结合,以有效地识别和应用转换的最佳组合。我们在开源模型(LineVul、StagedVulBERT、Code Llama、Deepseek-Coder)和闭源模型(GPT-5 nano、gpt - 40、gpt - 40 -mini、Claude Sonnet 4)上对SVulAttack进行了评估。结果表明,SVulAttack明显优于现有的纯标签黑盒攻击方法。例如,针对LineVul,我们采用遗传算法的方法实现了49.0%的攻击成功率,比DIP和CODA分别提高了150.0%和240.3%。
{"title":"Shield Broken: Black-Box Adversarial Attacks on LLM-Based Vulnerability Detectors","authors":"Yuan Jiang;Shan Huang;Christoph Treude;Xiaohong Su;Tiantian Wang","doi":"10.1109/TSE.2025.3638998","DOIUrl":"10.1109/TSE.2025.3638998","url":null,"abstract":"Vulnerability detection is critical for ensuring software security. Although deep learning (DL) methods, particularly those employing large language models (LLMs), have shown strong performance in automating vulnerability identification, they remain susceptible to adversarial examples, which are carefully crafted inputs with subtle perturbations designed to evade detection. Existing adversarial attack methods often require access to model architectures or confidence scores, making them impractical for real-world black-box systems. In this paper, we propose SVulAttack, a novel label-only adversarial attack framework targeting LLM-based vulnerability detectors. Our key innovation lies in a similarity-based strategy that estimates statement importance and model confidence, thereby enabling more effective selection of semantic-preserving code perturbations. SVulAttack combines this strategy with a transformation component and a search component, based on either greedy or genetic algorithms, to effectively identify and apply optimal combinations of transformations. We evaluate SVulAttack on open-source models (LineVul, StagedVulBERT, Code Llama, Deepseek-Coder) and closed-source models (GPT-5 nano, GPT-4o, GPT-4o-mini, Claude Sonnet 4). Results show that SVulAttack significantly outperforms existing label-only black-box attack methods. For example, against LineVul, our method with genetic algorithm achieves an attack success rate of 49.0%, improving over DIP and CODA by 150.0% and 240.3%, respectively.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 1","pages":"246-265"},"PeriodicalIF":5.6,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145664434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Empirical Study of Parameter-Efficient Fine-Tuning in Code Change Learning and Beyond 代码变更学习中参数有效微调的实证研究
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-27 DOI: 10.1109/TSE.2025.3637335
Shuo Liu;Jacky Keung;Zhi Jin;Zhen Yang;Fang Liu;Hao Zhang
Compared to Full-Model Fine-Tuning (FMFT), Parameter-Efficient Fine-Tuning (PEFT) has demonstrated superior efficacy and efficiency in several code understanding tasks, owing to PEFT’s ability to alleviate the catastrophic forgetting issue of Pre-trained Language Models (PLMs) by updating only a small number of parameters. However, existing studies primarily involve static code comprehension, aligning with the pre-training paradigm of recent PLMs and facilitating knowledge transfer, but they do not account for dynamic code changes. Thus, it remains unclear whether PEFT outperforms FMFT in task-specific adaptation for code-change-related tasks. To address this question, we examine four prevalent PEFT methods (i.e., AT, LoRA, PT, and PreT) and compare their performance with FMFT across seven popular PLMs. In experiments, two widely studied code-change-related tasks, i.e., Just-In-Time Defect Prediction (JIT-DP) and Commit Message Generation (CMG) are involved, demonstrating that the four PEFT methods can surpass FMFT on JIT-DP but only exhibit comparable performances at best on CMG in common scenarios. While in cross-lingual and low-resource scenarios, they exhibit relative superiority. Afterward, a series of probing tasks from both static and dynamic perspectives are conducted in this paper, offering detailed explanations for the efficacy of PEFT and FMFT. Inspired by the distinctive advantages of PEFT and FMFT in their layer-wise probing results, we propose Pasta${}_{K}$, a self-adaPtive efficient layer-specific tuning framework for PLMs in code change learning, which combines FMFT and PEFT during the domain adaptation according to the guidance of probing results. Experiments in the CMG task demonstrate that Pasta${}_{K}$ surpasses diverse PEFT methods in effectiveness. Even, Pasta${}_{K}$ outperforms FMFT by 1.48%, 3.21%, and 1.87% at most in terms of BLEU, Meteor, and Rouge-L, while saving 26.26% and 20.65% in terms of training time and computational memory compared with FMFT.
与全模型微调(FMFT)相比,参数高效微调(PEFT)在一些代码理解任务中表现出优越的效果和效率,因为PEFT能够通过更新少量参数来缓解预训练语言模型(PLMs)的灾难性遗忘问题。然而,现有的研究主要涉及静态代码理解,与最近PLMs的预训练范例保持一致,并促进知识转移,但它们没有考虑动态代码更改。因此,对于代码变更相关的任务,PEFT是否优于FMFT还不清楚。为了解决这个问题,我们研究了四种流行的PEFT方法(即AT、LoRA、PT和PreT),并将它们的性能与七个流行的plm中的FMFT进行了比较。在实验中,涉及两个广泛研究的代码变更相关任务,即JIT-DP和提交消息生成(CMG),表明四种PEFT方法在JIT-DP上可以超越FMFT,但在常见场景下,在CMG上最多只能表现出相当的性能。而在跨语言和资源匮乏的情况下,他们表现出相对的优势。随后,本文从静态和动态两个角度进行了一系列探索性研究,对PEFT和FMFT的有效性进行了详细的解释。受PEFT和FMFT在分层探测结果上的独特优势的启发,我们提出了Pasta${}_{K}$,这是一种针对plm代码变化学习的自适应高效分层调优框架,该框架在探测结果的指导下将FMFT和PEFT结合在一起进行域适应。CMG任务实验表明,Pasta${}_{K}$的有效性优于各种PEFT方法。在BLEU、Meteor和Rouge-L方面,Pasta${}_{K}$的性能分别比FMFT高出1.48%、3.21%和1.87%,在训练时间和计算内存方面分别比FMFT节省26.26%和20.65%。
{"title":"An Empirical Study of Parameter-Efficient Fine-Tuning in Code Change Learning and Beyond","authors":"Shuo Liu;Jacky Keung;Zhi Jin;Zhen Yang;Fang Liu;Hao Zhang","doi":"10.1109/TSE.2025.3637335","DOIUrl":"10.1109/TSE.2025.3637335","url":null,"abstract":"Compared to Full-Model Fine-Tuning (FMFT), Parameter-Efficient Fine-Tuning (PEFT) has demonstrated superior efficacy and efficiency in several code understanding tasks, owing to PEFT’s ability to alleviate the catastrophic forgetting issue of Pre-trained Language Models (PLMs) by updating only a small number of parameters. However, existing studies primarily involve static code comprehension, aligning with the pre-training paradigm of recent PLMs and facilitating knowledge transfer, but they do not account for dynamic code changes. Thus, it remains unclear whether PEFT outperforms FMFT in task-specific adaptation for code-change-related tasks. To address this question, we examine four prevalent PEFT methods (i.e., AT, LoRA, PT, and PreT) and compare their performance with FMFT across seven popular PLMs. In experiments, two widely studied code-change-related tasks, i.e., Just-In-Time Defect Prediction (JIT-DP) and Commit Message Generation (CMG) are involved, demonstrating that the four PEFT methods can surpass FMFT on JIT-DP but only exhibit comparable performances at best on CMG in common scenarios. While in cross-lingual and low-resource scenarios, they exhibit relative superiority. Afterward, a series of probing tasks from both static and dynamic perspectives are conducted in this paper, offering detailed explanations for the efficacy of PEFT and FMFT. Inspired by the distinctive advantages of PEFT and FMFT in their layer-wise probing results, we propose Pasta<inline-formula><tex-math>${}_{K}$</tex-math></inline-formula>, a self-ada<u>P</u>tive efficient l<u>a</u>yer-<u>s</u>pecific <u>t</u>uning framework for PLMs in code ch<u>a</u>nge learning, which combines FMFT and PEFT during the domain adaptation according to the guidance of probing results. Experiments in the CMG task demonstrate that Pasta<inline-formula><tex-math>${}_{K}$</tex-math></inline-formula> surpasses diverse PEFT methods in effectiveness. Even, Pasta<inline-formula><tex-math>${}_{K}$</tex-math></inline-formula> outperforms FMFT by 1.48%, 3.21%, and 1.87% at most in terms of BLEU, Meteor, and Rouge-L, while saving 26.26% and 20.65% in terms of training time and computational memory compared with FMFT.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 1","pages":"3-21"},"PeriodicalIF":5.6,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145610935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Large-Scale Multi-Granularity Code Clone Detection via Clustering Search and Pre-Trained Models 基于聚类搜索和预训练模型的大规模多粒度代码克隆检测
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-24 DOI: 10.1109/TSE.2025.3635158
Yifan An;Yunlong Ma;Xiang Gao;Hailong Sun
Code cloning is a common phenomenon in software development, which reduces developers’ programming efforts but also poses risks of defect inheritance. Clone detection locates exact or similar pieces of code within or between software systems. With the amount of source code increasing steadily, efficient and large-scale clone detection has become a necessity. Moreover, code clones may occur at various levels of code granularity, e.g., file, function, and block level, which pose more challenges for efficient clone detection. Although numerous methods have been proposed to detect code clones at different granularities, they often suffer from low detection efficiency, false positive results and are typically limited to identifying clones at a specific granularity. In this paper, we introduce an efficient clone detection, named MGCD, to detect code clones among large-scale codebases. Specifically, we embed function-level code into vectors using a pre-trained model and perform clustering search with the IVF_Flat algorithm to identify clone candidates. These candidates are then filtered through an entropy-based method to enhance accuracy and avoid false positive results. Moreover, we leverage the information from function-level clone detection results to further conduct file and block level clone detection. We evaluate our approach on the BigCloneBench benchmark. Experimental results show that our approach only takes 0.23 ms to search clone results among 800,000 functions and achieves high precision and recall.
代码克隆是软件开发中的一种常见现象,它减少了开发人员的编程工作,但也带来了缺陷继承的风险。克隆检测定位软件系统内部或软件系统之间精确或相似的代码片段。随着源代码数量的不断增加,高效、大规模的克隆检测已成为一种必要。此外,代码克隆可能发生在不同级别的代码粒度上,如文件级、函数级和块级,这对高效的克隆检测提出了更多的挑战。虽然已经提出了许多方法来检测不同粒度的代码克隆,但它们往往存在检测效率低、假阳性结果以及通常仅限于识别特定粒度的克隆的问题。本文介绍了一种高效的克隆检测方法MGCD,用于大规模代码库中的代码克隆检测。具体来说,我们使用预训练的模型将函数级代码嵌入到向量中,并使用IVF_Flat算法进行聚类搜索以识别克隆候选对象。然后通过基于熵的方法过滤这些候选对象,以提高准确性并避免假阳性结果。此外,我们利用功能级克隆检测结果的信息进一步进行文件级和块级克隆检测。我们在BigCloneBench基准上评估我们的方法。实验结果表明,该方法在80万个函数中搜索克隆结果仅需0.23 ms,具有较高的查全率和查全率。
{"title":"Scalable Large-Scale Multi-Granularity Code Clone Detection via Clustering Search and Pre-Trained Models","authors":"Yifan An;Yunlong Ma;Xiang Gao;Hailong Sun","doi":"10.1109/TSE.2025.3635158","DOIUrl":"10.1109/TSE.2025.3635158","url":null,"abstract":"Code cloning is a common phenomenon in software development, which reduces developers’ programming efforts but also poses risks of defect inheritance. Clone detection locates exact or similar pieces of code within or between software systems. With the amount of source code increasing steadily, efficient and large-scale clone detection has become a necessity. Moreover, code clones may occur at various levels of code granularity, e.g., file, function, and block level, which pose more challenges for efficient clone detection. Although numerous methods have been proposed to detect code clones at different granularities, they often suffer from low detection efficiency, false positive results and are typically limited to identifying clones at a specific granularity. In this paper, we introduce an efficient clone detection, named <sc>MGCD</small>, to detect code clones among large-scale codebases. Specifically, we embed function-level code into vectors using a pre-trained model and perform clustering search with the IVF_Flat algorithm to identify clone candidates. These candidates are then filtered through an entropy-based method to enhance accuracy and avoid false positive results. Moreover, we leverage the information from function-level clone detection results to further conduct file and block level clone detection. We evaluate our approach on the BigCloneBench benchmark. Experimental results show that our approach only takes 0.23 ms to search clone results among 800,000 functions and achieves high precision and recall.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"546-561"},"PeriodicalIF":5.6,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145593072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring and Analyzing Software Architecture Refactoring in Practice 软件架构重构的实践探索与分析
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-24 DOI: 10.1109/TSE.2025.3636150
Wei Ding;Ran Mo;Chaochao Wu;Haopeng Song;Hang Fu;Xinya Mu
Software architecture is the abstraction of a software system, that significantly influences software development and maintenance. As software evolves, continuous changes could deviate its architecture from the original design, leading to architecture degradation that causes a decline in software quality. Architecture refactoring becomes necessary to address or mitigate architecture degradation for improving overall quality. Although researchers have developed various architecture refactoring tools and techniques, there has been limited research on how architecture refactoring is practiced in real-world scenarios. In this paper, we conducted an empirical study by analyzing posts from Stack Overflow to understand architecture refactoring in practice. Through our analysis of 694 posts with 3,468 discussion threads, we identified 12 types of architecture refactoring based on two classification dimensions. Additionally, we categorized architecture problems faced by practitioners and explored their corresponding refactoring solutions. Furthermore, we revealed six potential risks that may result from architecture refactoring. We believe that our study can provide valuable insights for practitioners to perform architecture refactoring effectively. The findings can serve as a foundation for future research and offer practical guidance to improve architecture quality.
软件体系结构是对软件系统的抽象,对软件开发和维护有重要影响。随着软件的发展,持续的变更可能会使其架构偏离原始设计,从而导致架构退化,从而导致软件质量的下降。架构重构对于解决或减轻架构退化以提高整体质量是必要的。尽管研究人员已经开发了各种架构重构工具和技术,但是关于架构重构如何在实际场景中实践的研究仍然有限。在本文中,我们通过分析Stack Overflow上的文章进行了实证研究,以了解实践中的架构重构。通过对3,468个讨论主题的694篇文章的分析,我们根据两个分类维度确定了12种架构重构类型。此外,我们对从业者面临的体系结构问题进行了分类,并探索了相应的重构解决方案。此外,我们还揭示了架构重构可能导致的六个潜在风险。我们相信我们的研究可以为从业者有效地执行架构重构提供有价值的见解。研究结果可以为今后的研究奠定基础,并为提高建筑质量提供实践指导。
{"title":"Exploring and Analyzing Software Architecture Refactoring in Practice","authors":"Wei Ding;Ran Mo;Chaochao Wu;Haopeng Song;Hang Fu;Xinya Mu","doi":"10.1109/TSE.2025.3636150","DOIUrl":"10.1109/TSE.2025.3636150","url":null,"abstract":"Software architecture is the abstraction of a software system, that significantly influences software development and maintenance. As software evolves, continuous changes could deviate its architecture from the original design, leading to architecture degradation that causes a decline in software quality. Architecture refactoring becomes necessary to address or mitigate architecture degradation for improving overall quality. Although researchers have developed various architecture refactoring tools and techniques, there has been limited research on how architecture refactoring is practiced in real-world scenarios. In this paper, we conducted an empirical study by analyzing posts from Stack Overflow to understand architecture refactoring in practice. Through our analysis of 694 posts with 3,468 discussion threads, we identified 12 types of architecture refactoring based on two classification dimensions. Additionally, we categorized architecture problems faced by practitioners and explored their corresponding refactoring solutions. Furthermore, we revealed six potential risks that may result from architecture refactoring. We believe that our study can provide valuable insights for practitioners to perform architecture refactoring effectively. The findings can serve as a foundation for future research and offer practical guidance to improve architecture quality.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 1","pages":"286-303"},"PeriodicalIF":5.6,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145593515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Cost Testing for Path Coverage of MPI Programs Using Surrogate-Assisted Changeable Multi-Objective Optimization 使用代理辅助可变多目标优化的MPI方案路径覆盖的低成本测试
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-24 DOI: 10.1109/TSE.2025.3635120
Baicai Sun;Lina Gong;Yinan Guo;Dunwei Gong;Gaige Wang
A target path of Message Passing Interface (MPI) programs typically consists of several target sub-paths. During solving a test case that cover the target path using an intelligent optimization algorithm, we often find that there are some hard-to-cover target sub-paths, which limit the testing efficiency of the entire target path. Therefore, this paper proposes an approach of low-cost testing for path coverage of MPI programs using surrogate-assisted changeable multi-objective optimization, which is used to further improve the effectiveness and efficiency of test case generation. The proposed approach first establishes a changeable multi-objective optimization model, which is used to guide the generation of test cases. During solving the changeable multi-objective optimization model using an intelligent optimization algorithm, we then determine each hard-to-cover target sub-path and form a corresponding sample set. Finally, we manage the surrogate model corresponding to each hard-to-cover target sub-path based on the formed sample set, and select superior evolutionary individuals to really execute the MPI program under test, thus reducing the cost and times of program execution. The proposed approach has been applied to path coverage testing of several benchmark MPI programs, and compared with several state-of-the-art approaches. The experimental results show that the proposed approach significantly improves the effectiveness and efficiency of generating test cases.
消息传递接口(MPI)程序的目标路径通常由几个目标子路径组成。在使用智能优化算法求解覆盖目标路径的测试用例过程中,我们经常发现存在一些难以覆盖的目标子路径,这限制了整个目标路径的测试效率。为此,本文提出了一种基于代理辅助可变多目标优化的MPI程序路径覆盖低成本测试方法,以进一步提高测试用例生成的有效性和效率。该方法首先建立了可变多目标优化模型,用于指导测试用例的生成;在使用智能优化算法求解可变多目标优化模型时,确定每个难以覆盖的目标子路径并形成相应的样本集。最后,基于形成的样本集对每个难以覆盖的目标子路径对应的代理模型进行管理,选择优秀的进化个体来真正执行被测MPI程序,从而降低了程序执行的成本和次数。所提出的方法已应用于几个基准MPI程序的路径覆盖测试,并与几种最先进的方法进行了比较。实验结果表明,该方法显著提高了生成测试用例的有效性和效率。
{"title":"Low-Cost Testing for Path Coverage of MPI Programs Using Surrogate-Assisted Changeable Multi-Objective Optimization","authors":"Baicai Sun;Lina Gong;Yinan Guo;Dunwei Gong;Gaige Wang","doi":"10.1109/TSE.2025.3635120","DOIUrl":"10.1109/TSE.2025.3635120","url":null,"abstract":"A target path of <bold>M</b>essage <bold>P</b>assing <bold>I</b>nterface (MPI) programs typically consists of several target sub-paths. During solving a test case that cover the target path using an intelligent optimization algorithm, we often find that there are some hard-to-cover target sub-paths, which limit the testing efficiency of the entire target path. Therefore, this paper proposes an approach of low-cost testing for path coverage of MPI programs using surrogate-assisted changeable multi-objective optimization, which is used to further improve the effectiveness and efficiency of test case generation. The proposed approach first establishes a changeable multi-objective optimization model, which is used to guide the generation of test cases. During solving the changeable multi-objective optimization model using an intelligent optimization algorithm, we then determine each hard-to-cover target sub-path and form a corresponding sample set. Finally, we manage the surrogate model corresponding to each hard-to-cover target sub-path based on the formed sample set, and select superior evolutionary individuals to really execute the MPI program under test, thus reducing the cost and times of program execution. The proposed approach has been applied to path coverage testing of several benchmark MPI programs, and compared with several state-of-the-art approaches. The experimental results show that the proposed approach significantly improves the effectiveness and efficiency of generating test cases.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 1","pages":"116-136"},"PeriodicalIF":5.6,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145593073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IssueCourier: Multi-Relational Heterogeneous Temporal Graph Neural Network for Open-Source Issue Assignment 面向开源问题分配的多关系异构时态图神经网络
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-20 DOI: 10.1109/TSE.2025.3634192
Chunying Zhou;Xiaoyuan Xie;Gong Chen;Peng He;Bing Li
Issue assignment plays a critical role in open-source software (OSS) maintenance, which involves recommending the most suitable developers to address the reported issues. Given the high volume of issue reports in large-scale projects, manually assigning issues is tedious and costly. Previous studies have proposed automated issue assignment approaches that primarily focus on modeling issue report textual information, developers’ expertise, or interactions between issues and developers based on historical issue-fixing records. However, these approaches often suffer from performance limitations due to the presence of incorrect and missing labels in OSS datasets, as well as the long tail of developer contributions and the changes in developer activity as the project evolves. To address these challenges, we propose IssueCourier, a novel Multi-Relational Heterogeneous Temporal Graph Neural Network approach for issue assignment. Specifically, we formalize five key relationships among issues, developers, and source code files to construct a heterogeneous graph. Then, we further adopt a temporal slicing technique that partitions the graph into a sequence of time-based subgraphs to learn stage-specific patterns. Furthermore, we provide a benchmark dataset with relabeled ground truth to address the problem of incorrect and missing labels in existing OSS datasets. Finally, to evaluate the performance of IssueCourier, we conduct extensive experiments on our benchmark dataset. The results show that IssueCourier can improve over the best baseline up to 45.49% in top-1 and 31.97% in MRR.
问题分配在开源软件(OSS)维护中起着至关重要的作用,它包括推荐最合适的开发人员来解决报告的问题。考虑到大型项目中大量的问题报告,手动分配问题既繁琐又昂贵。以前的研究提出了自动化问题分配方法,主要关注建模问题报告文本信息、开发人员的专业知识,或者基于历史问题修复记录的问题和开发人员之间的交互。然而,由于在OSS数据集中存在不正确和缺失的标签,以及开发人员贡献的长尾和随着项目的发展开发人员活动的变化,这些方法经常受到性能限制的影响。为了解决这些挑战,我们提出了IssueCourier,一种新颖的多关系异构时态图神经网络方法来分配问题。具体来说,我们形式化了问题、开发人员和源代码文件之间的五个关键关系,以构建异构图。然后,我们进一步采用时间切片技术,将图划分为一系列基于时间的子图,以学习特定阶段的模式。此外,我们提供了一个具有重新标记的基础真值的基准数据集,以解决现有OSS数据集中不正确和缺失标签的问题。最后,为了评估IssueCourier的性能,我们在我们的基准数据集上进行了大量的实验。结果表明,IssueCourier在最佳基线的基础上,top-1提高45.49%,MRR提高31.97%。
{"title":"IssueCourier: Multi-Relational Heterogeneous Temporal Graph Neural Network for Open-Source Issue Assignment","authors":"Chunying Zhou;Xiaoyuan Xie;Gong Chen;Peng He;Bing Li","doi":"10.1109/TSE.2025.3634192","DOIUrl":"10.1109/TSE.2025.3634192","url":null,"abstract":"Issue assignment plays a critical role in open-source software (OSS) maintenance, which involves recommending the most suitable developers to address the reported issues. Given the high volume of issue reports in large-scale projects, manually assigning issues is tedious and costly. Previous studies have proposed automated issue assignment approaches that primarily focus on modeling issue report textual information, developers’ expertise, or interactions between issues and developers based on historical issue-fixing records. However, these approaches often suffer from performance limitations due to the presence of incorrect and missing labels in OSS datasets, as well as the long tail of developer contributions and the changes in developer activity as the project evolves. To address these challenges, we propose IssueCourier, a novel Multi-Relational Heterogeneous Temporal Graph Neural Network approach for issue assignment. Specifically, we formalize five key relationships among issues, developers, and source code files to construct a heterogeneous graph. Then, we further adopt a temporal slicing technique that partitions the graph into a sequence of time-based subgraphs to learn stage-specific patterns. Furthermore, we provide a benchmark dataset with relabeled ground truth to address the problem of incorrect and missing labels in existing OSS datasets. Finally, to evaluate the performance of IssueCourier, we conduct extensive experiments on our benchmark dataset. The results show that IssueCourier can improve over the best baseline up to 45.49% in top-1 and 31.97% in MRR.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"527-545"},"PeriodicalIF":5.6,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145559535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProFuse: Test Case Prioritization Based on Multi Dimensional Feature Fusion for Logic Synthesis Tools Testing Acceleration 基于多维特征融合的逻辑综合工具测试用例优先级优化
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-20 DOI: 10.1109/TSE.2025.3634318
Peiyu Zou;Xiaochen Li;Xu Zhao;Shikai Guo;Zhide Zhou;Yue Ma;He Jiang
Logic synthesis tools translate Hardware Description Language (HDL) designs into hardware implementation. To test these tools, numerous test cases are usually executed on the tools, yet only a few of them can trigger faults, leading to inefficient testing. Since executing test cases on logic synthesis tools often requires significant cost on complicated synthesis and simulation, fault-triggering test cases should be prioritized to execute. However, existing prioritization methods face challenges in accurately predicting the fault-triggering capability of dynamically generated test cases and modeling the unique syntactic and structure complexities of these HDL-based programs. Therefore, we propose ProFuse, a multi-dimensional feature fusion method for logic synthesis tool test case prioritization. ProFuse leverages Abstract Syntax Trees (AST) and Data Flow Graphs (DFG) to extract novel syntactic and structure features from HDL designs. These features are processed by a joint model of Multilayer Perceptron (MLP) and Graph Convolutional Network (GCN) to rank fault-triggering test cases accurately. ProFuse achieves an Average Percentage of Fault Detection (APFD) score of 0.9285, outperforming the state-of-the-art prioritization methods by 11.38% to 82.49%. ProFuse can efficiently rank randomly generated test cases to discover 15 new faults in logic synthesis tools (i.e., Yosys and Vivado). The Vivado community acknowledged our work for improving their tool.
逻辑综合工具将硬件描述语言(HDL)设计转换为硬件实现。为了测试这些工具,通常会在工具上执行大量的测试用例,但是其中只有少数会触发错误,从而导致测试效率低下。由于在逻辑合成工具上执行测试用例通常需要在复杂的合成和模拟上花费大量的成本,因此应该优先执行故障触发测试用例。然而,现有的优先排序方法在准确预测动态生成的测试用例的故障触发能力和建模这些基于hdl的程序的独特语法和结构复杂性方面面临挑战。为此,我们提出了一种用于逻辑综合工具测试用例优先排序的多维特征融合方法ProFuse。ProFuse利用抽象语法树(AST)和数据流图(DFG)从HDL设计中提取新的语法和结构特征。这些特征通过多层感知器(MLP)和图卷积网络(GCN)的联合模型进行处理,以准确地对故障触发测试用例进行排序。ProFuse实现了0.9285的平均故障检测百分比(APFD)得分,比最先进的优先级方法高出11.38%至82.49%。ProFuse可以有效地对随机生成的测试用例进行排序,以发现逻辑合成工具(即Yosys和Vivado)中的15个新错误。Vivado社区承认我们为改进他们的工具所做的工作。
{"title":"ProFuse: Test Case Prioritization Based on Multi Dimensional Feature Fusion for Logic Synthesis Tools Testing Acceleration","authors":"Peiyu Zou;Xiaochen Li;Xu Zhao;Shikai Guo;Zhide Zhou;Yue Ma;He Jiang","doi":"10.1109/TSE.2025.3634318","DOIUrl":"10.1109/TSE.2025.3634318","url":null,"abstract":"Logic synthesis tools translate Hardware Description Language (HDL) designs into hardware implementation. To test these tools, numerous test cases are usually executed on the tools, yet only a few of them can trigger faults, leading to inefficient testing. Since executing test cases on logic synthesis tools often requires significant cost on complicated synthesis and simulation, fault-triggering test cases should be prioritized to execute. However, existing prioritization methods face challenges in accurately predicting the fault-triggering capability of dynamically generated test cases and modeling the unique syntactic and structure complexities of these HDL-based programs. Therefore, we propose ProFuse, a multi-dimensional feature fusion method for logic synthesis tool test case prioritization. ProFuse leverages Abstract Syntax Trees (AST) and Data Flow Graphs (DFG) to extract novel syntactic and structure features from HDL designs. These features are processed by a joint model of Multilayer Perceptron (MLP) and Graph Convolutional Network (GCN) to rank fault-triggering test cases accurately. ProFuse achieves an Average Percentage of Fault Detection (APFD) score of 0.9285, outperforming the state-of-the-art prioritization methods by 11.38% to 82.49%. ProFuse can efficiently rank randomly generated test cases to discover 15 new faults in logic synthesis tools (i.e., Yosys and Vivado). The Vivado community acknowledged our work for improving their tool.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 1","pages":"304-320"},"PeriodicalIF":5.6,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145559522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1