首页 > 最新文献

Automated Software Engineering最新文献

英文 中文
MalModel: hiding malicious payload in mobile deep learning models with black-box backdoor attack MalModel:利用黑盒后门攻击将恶意负载隐藏在移动深度学习模型中
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-29 DOI: 10.1007/s10515-025-00569-7
Jiayi Hua, Kailong Wang, Meizhen Wang, Guangdong Bai, Xiapu Luo, Haoyu Wang

Mobile malware has become one of the most critical security threats in the era of ubiquitous mobile computing. Despite the intensive efforts from security experts to counteract it, recent years have still witnessed a rapid growth of identified malware samples. This could be partly attributed to the newly-emerged technologies that may constantly open up under-studied attack surfaces for adversaries. One typical example is the recently-developed mobile machine learning (ML) framework that enables storing and running deep learning (DL) models on mobile devices. Despite obvious advantages, this new feature also inadvertently introduces potential vulnerabilities (e.g., on-device models may be modified for malicious purposes). In this work, we propose a method to generate or transform mobile malware by hiding malicious payloads inside DL models’ parameters based on a strategy considering four factors (layer type, layer number, layer coverage, and the number of bytes to replace). Utilizing the proposed method, we can run malware in DL mobile applications covertly with little impact on the model performance (i.e., as little as 0.35% drop in accuracy and at most 39ms latency overhead). We can successfully trigger malicious functions, such as getting SMS records and screenshots in a real-world application. The generated malware can evade state-of-the-art detection techniques (i.e., none detected by VirusTotal), and the malware-based attack exhibits high practical feasibility (i.e., successfully attack 41% of the apps with on-device DL models). Our work should alert security experts on malware injection attacks on mobile devices, and further raise more awareness towards the deep-learning-assisted attacks in the mobile ecosystem.

移动恶意软件已经成为无处不在的移动计算时代最关键的安全威胁之一。尽管安全专家们付出了巨大的努力来对抗它,但近年来,已识别的恶意软件样本仍然在快速增长。这可能部分归因于新出现的技术,这些技术可能不断为对手打开尚未研究的攻击面。一个典型的例子是最近开发的移动机器学习(ML)框架,它可以在移动设备上存储和运行深度学习(DL)模型。尽管有明显的优势,这个新功能也无意中引入了潜在的漏洞(例如,设备上的模型可能被恶意修改)。在这项工作中,我们提出了一种方法,通过在DL模型参数中隐藏恶意有效载荷来生成或转换移动恶意软件,该方法基于一种考虑四个因素(层类型、层数、层覆盖和要替换的字节数)的策略。利用所提出的方法,我们可以在DL移动应用程序中隐蔽地运行恶意软件,而对模型性能的影响很小(即精度下降0.35%,延迟开销最多39ms)。我们可以成功地触发恶意功能,例如在实际应用程序中获取SMS记录和屏幕截图。生成的恶意软件可以逃避最先进的检测技术(即,没有被VirusTotal检测到),基于恶意软件的攻击显示出很高的实际可行性(即,成功攻击41%的设备上DL模型的应用程序)。我们的工作应该提醒安全专家注意针对移动设备的恶意软件注入攻击,并进一步提高对移动生态系统中深度学习辅助攻击的认识。
{"title":"MalModel: hiding malicious payload in mobile deep learning models with black-box backdoor attack","authors":"Jiayi Hua,&nbsp;Kailong Wang,&nbsp;Meizhen Wang,&nbsp;Guangdong Bai,&nbsp;Xiapu Luo,&nbsp;Haoyu Wang","doi":"10.1007/s10515-025-00569-7","DOIUrl":"10.1007/s10515-025-00569-7","url":null,"abstract":"<div><p>Mobile malware has become one of the most critical security threats in the era of ubiquitous mobile computing. Despite the intensive efforts from security experts to counteract it, recent years have still witnessed a rapid growth of identified malware samples. This could be partly attributed to the newly-emerged technologies that may constantly open up under-studied attack surfaces for adversaries. One typical example is the recently-developed mobile machine learning (ML) framework that enables storing and running deep learning (DL) models on mobile devices. Despite obvious advantages, this new feature also inadvertently introduces potential vulnerabilities (e.g., on-device models may be modified for malicious purposes). In this work, we propose a method to generate or transform mobile malware by hiding malicious payloads inside DL models’ parameters based on a strategy considering four factors (layer type, layer number, layer coverage, and the number of bytes to replace). Utilizing the proposed method, we can run malware in DL mobile applications covertly with little impact on the model performance (i.e., as little as 0.35% drop in accuracy and at most 39ms latency overhead). We can successfully trigger malicious functions, such as getting SMS records and screenshots in a real-world application. The generated malware can evade state-of-the-art detection techniques (i.e., none detected by VirusTotal), and the malware-based attack exhibits high practical feasibility (i.e., successfully attack 41% of the apps with on-device DL models). Our work should alert security experts on malware injection attacks on mobile devices, and further raise more awareness towards the deep-learning-assisted attacks in the mobile ecosystem.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the effectiveness of recent closed-source large language models in fault localization and automated program repair 评估最近的闭源大型语言模型在故障定位和自动程序修复中的有效性
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-28 DOI: 10.1007/s10515-025-00549-x
Bo Wang, Ming Deng, Mingda Chen, Youfang Lin, Jianyi Zhou, Jie M. Zhang

Large Language Models (LLMs) have made significant advancements in code-related tasks. In the field of automated debugging, fault localization (FL) and automated program repair (APR) are two prevalent topics attracting significant research effort. Recently, in the field of FL and APR, many novel LLM-based approaches have emerged. However, most existing LLM-based studies primarily focus on the GPT models from OpenAI or open-source LLMs. With the rapid development of LLMs, various internet giants have introduced new closed-source models. In addition, due to policy restrictions, some regions can only access the commercial LLMs provided by specified companies. Despite the LLMs of OpenAI, the effectiveness of the other closed-source LLMs in FL and APR remains unknown. To better understand the effectiveness of contemporary closed-source models, we conduct a large-scale empirical study on their performance with respect to FL and APR. Specifically, our study involves 4 recent commercial closed-source LLMs (i.e., GPT-4o-Mini, Ernie-3.5, Qwen-turbo, and Doubao-pro) and 1 open-source LLM (i.e., DeepSeek-V3-chat). Note that only the GPT models have region restrictions among all LLMs we studied. We designed a total of 12 distinct prompt templates, 6 each for FL and APR, incorporating various formats and information sources. We conducted experiments to evaluate the effectiveness of FL and APR on 1036 real Java bugs from two datasets, Defects4J 2.0 and ConDefects. The key findings of the experiments indicate that (1) different LLMs tend to succeed on different sets of bugs in both FL and APR, with relatively little overlap among successful cases, implying the models possess distinct strengths in handling specific kinds of bugs, (2) the effectiveness of prompt templates varies across different models, and (3) the effectiveness of FL and APR capabilities of the studied models is significantly correlated with the bug type. We summarized all 14 findings obtained into 3 implications, which could help researchers further improve the performance of LLMs on FL and APR.

大型语言模型(llm)在代码相关任务方面取得了重大进展。在自动调试领域中,故障定位(FL)和自动程序修复(APR)是两个非常热门的研究课题。近年来,在FL和APR领域出现了许多新的基于llm的方法。然而,大多数现有的基于法学硕士的研究主要集中在OpenAI或开源法学硕士的GPT模型上。随着法学硕士的快速发展,各大互联网巨头纷纷推出新的闭源模式。此外,由于政策限制,部分地区只能访问指定公司提供的商业llm。尽管OpenAI的llm,其他闭源llm在FL和APR中的有效性仍然未知。为了更好地了解当代闭源模型的有效性,我们对其在FL和apr方面的性能进行了大规模的实证研究。具体而言,我们的研究涉及4个最近的商业闭源LLM(即ggt - 40 - mini, Ernie-3.5, Qwen-turbo和Doubao-pro)和1个开源LLM(即DeepSeek-V3-chat)。请注意,在我们研究的所有法学硕士中,只有GPT模型具有区域限制。我们总共设计了12个不同的提示模板,FL和APR各6个,结合了各种格式和信息源。我们进行了实验,以评估FL和APR对1036个真实Java错误的有效性,这些错误来自两个数据集,缺陷4j 2.0和ConDefects。实验的主要发现表明:(1)不同的llm在FL和APR中往往会成功处理不同的错误集,成功案例之间的重叠相对较少,表明模型在处理特定类型的错误方面具有不同的优势;(2)不同模型的提示模板的有效性存在差异;(3)所研究模型的FL和APR能力的有效性与错误类型显著相关。我们将这14项研究结果总结为3条启示,可以帮助研究人员进一步提高LLMs在FL和APR方面的表现。
{"title":"Assessing the effectiveness of recent closed-source large language models in fault localization and automated program repair","authors":"Bo Wang,&nbsp;Ming Deng,&nbsp;Mingda Chen,&nbsp;Youfang Lin,&nbsp;Jianyi Zhou,&nbsp;Jie M. Zhang","doi":"10.1007/s10515-025-00549-x","DOIUrl":"10.1007/s10515-025-00549-x","url":null,"abstract":"<div><p>Large Language Models (LLMs) have made significant advancements in code-related tasks. In the field of automated debugging, fault localization (FL) and automated program repair (APR) are two prevalent topics attracting significant research effort. Recently, in the field of FL and APR, many novel LLM-based approaches have emerged. However, most existing LLM-based studies primarily focus on the GPT models from OpenAI or open-source LLMs. With the rapid development of LLMs, various internet giants have introduced new closed-source models. In addition, due to policy restrictions, some regions can only access the commercial LLMs provided by specified companies. Despite the LLMs of OpenAI, the effectiveness of the other closed-source LLMs in FL and APR remains unknown. To better understand the effectiveness of contemporary closed-source models, we conduct a large-scale empirical study on their performance with respect to FL and APR. Specifically, our study involves 4 recent commercial closed-source LLMs (i.e., GPT-4o-Mini, Ernie-3.5, Qwen-turbo, and Doubao-pro) and 1 open-source LLM (i.e., DeepSeek-V3-chat). Note that only the GPT models have region restrictions among all LLMs we studied. We designed a total of 12 distinct prompt templates, 6 each for FL and APR, incorporating various formats and information sources. We conducted experiments to evaluate the effectiveness of FL and APR on 1036 real Java bugs from two datasets, Defects4J 2.0 and ConDefects. The key findings of the experiments indicate that (1) different LLMs tend to succeed on different sets of bugs in both FL and APR, with relatively little overlap among successful cases, implying the models possess distinct strengths in handling specific kinds of bugs, (2) the effectiveness of prompt templates varies across different models, and (3) the effectiveness of FL and APR capabilities of the studied models is significantly correlated with the bug type. We summarized all 14 findings obtained into 3 implications, which could help researchers further improve the performance of LLMs on FL and APR.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LMFuzz: Program repair fuzzing based on large language models LMFuzz:基于大型语言模型的程序修复模糊测试
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-28 DOI: 10.1007/s10515-025-00568-8
Renze Lin, Ran Wang, Guanghuan Hu, Xianghua Xu

Generating programs using large language models (LLMs) for fuzz testing has emerged as a significant testing methodology. While traditional fuzzers can produce correct programs, their effectiveness is limited by excessive constraints and restricted API combinations, resulting in insufficient coverage of the target system’s code and impacting testing efficiency. Unlike traditional methods, large language model based fuzzers can generate more diverse code, effectively addressing key issues of conventional fuzzers. However, the lack of constraints on API combinations during the generation process often leads to reduced program validity. Therefore, a crucial challenge is to enhance the validity of generated code while maintaining its diversity. To address this issue, we propose a novel and universal fuzzer, LMFuzz. To ensure the fuzzer’s generation capability, we utilize a large language model as the primary generator and model the operator selection problem within the fuzzing loop as a multi-armed bandit problem. We introduce the Thompson Sampling algorithm to enhance both the diversity and validity of program generation. To improve the validity of the generated code, we incorporate a program repair loop that iteratively corrects the generated programs, thereby reducing errors caused by the lack of API combination constraints. Experimental results demonstrate that LMFuzz significantly surpasses existing state-of-the-art large language model based fuzzers in terms of coverage and validity, and also exhibits notable advantages in generating diverse programs. Furthermore, LMFuzz has identified 24 bugs across five popular programming languages and their corresponding systems.

使用大型语言模型(llm)生成用于模糊测试的程序已经成为一种重要的测试方法。虽然传统的fuzzers可以生成正确的程序,但其有效性受到过多的约束和受限的API组合的限制,导致目标系统的代码覆盖率不足,影响测试效率。与传统方法不同,基于大型语言模型的模糊器可以生成更多样化的代码,有效地解决了传统模糊器的关键问题。然而,在生成过程中缺乏对API组合的约束往往会导致程序有效性降低。因此,一个关键的挑战是在保持其多样性的同时增强生成代码的有效性。为了解决这个问题,我们提出了一种新的通用模糊器LMFuzz。为了保证模糊器的生成能力,我们使用大型语言模型作为主生成器,并将模糊回路中的操作员选择问题建模为多臂强盗问题。为了提高程序生成的多样性和有效性,我们引入了汤普森采样算法。为了提高生成代码的有效性,我们合并了一个程序修复循环,迭代地纠正生成的程序,从而减少由于缺乏API组合约束而导致的错误。实验结果表明,LMFuzz在覆盖范围和有效性方面明显优于现有的基于大型语言模型的模糊器,并且在生成各种程序方面也具有显着的优势。此外,LMFuzz还在五种流行的编程语言及其相应的系统中发现了24个bug。
{"title":"LMFuzz: Program repair fuzzing based on large language models","authors":"Renze Lin,&nbsp;Ran Wang,&nbsp;Guanghuan Hu,&nbsp;Xianghua Xu","doi":"10.1007/s10515-025-00568-8","DOIUrl":"10.1007/s10515-025-00568-8","url":null,"abstract":"<div><p>Generating programs using large language models (LLMs) for fuzz testing has emerged as a significant testing methodology. While traditional fuzzers can produce correct programs, their effectiveness is limited by excessive constraints and restricted API combinations, resulting in insufficient coverage of the target system’s code and impacting testing efficiency. Unlike traditional methods, large language model based fuzzers can generate more diverse code, effectively addressing key issues of conventional fuzzers. However, the lack of constraints on API combinations during the generation process often leads to reduced program validity. Therefore, a crucial challenge is to enhance the validity of generated code while maintaining its diversity. To address this issue, we propose a novel and universal fuzzer, LMFuzz. To ensure the fuzzer’s generation capability, we utilize a large language model as the primary generator and model the operator selection problem within the fuzzing loop as a multi-armed bandit problem. We introduce the Thompson Sampling algorithm to enhance both the diversity and validity of program generation. To improve the validity of the generated code, we incorporate a program repair loop that iteratively corrects the generated programs, thereby reducing errors caused by the lack of API combination constraints. Experimental results demonstrate that LMFuzz significantly surpasses existing state-of-the-art large language model based fuzzers in terms of coverage and validity, and also exhibits notable advantages in generating diverse programs. Furthermore, LMFuzz has identified 24 bugs across five popular programming languages and their corresponding systems.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting software defects using an extreme gradient boosting model tuned with reinforcement learning based spider wasp optimizer 使用基于强化学习的黄蜂优化器的极端梯度增强模型预测软件缺陷
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-28 DOI: 10.1007/s10515-025-00572-y
Raja Oueslati, Mohamed Wajdi Ouertani, Ghaith Manita, Amit Chhabra

Software defect prediction (SDP) is a critical task for improving software quality and reducing development costs by identifying faults early. While machine learning models, particularly XGBoost, have been widely adopted for SDP, their performance is highly dependent on optimal hyperparameter tuning. Furthermore, existing state-of-the-art methods—including deep learning-based approaches leveraging semantic code features—often suffer from high computational complexity, and extensive training requirements. To address these challenges, this paper proposes a hybrid optimization approach, RL-SWO, which integrates the Spider Wasp Optimizer (SWO) with reinforcement learning (RL) to refine XGBoost’s parameters. RL-SWO was first validated on CEC’22 benchmark functions, where it outperformed several state-of-the-art metaheuristics. It was then applied to five defect prediction datasets from the AEEEM repository, demonstrating superior performance in detecting defective and non-defective instances, particularly in imbalanced data scenarios. Compared to traditional optimization methods, RL-SWO significantly improved XGBoost’s classification accuracy and robustness. Experimental results highlight RL-SWO’s potential in enhancing SDP models by balancing exploration and exploitation during optimization. This study advances automated defect prediction by leveraging metaheuristics and reinforcement learning, offering a promising approach for improving software reliability.

软件缺陷预测(SDP)是提高软件质量和降低软件开发成本的一项重要任务。虽然机器学习模型,特别是XGBoost,已被广泛应用于SDP,但它们的性能高度依赖于最优超参数调优。此外,现有的最先进的方法-包括利用语义代码特征的基于深度学习的方法-通常存在高计算复杂性和广泛的训练要求。为了解决这些挑战,本文提出了一种混合优化方法,RL-SWO,它将蜘蛛黄蜂优化器(SWO)与强化学习(RL)集成在一起,以优化XGBoost的参数。RL-SWO首先在CEC ' 22基准函数上进行了验证,其性能优于几种最先进的元启发式方法。然后将其应用于来自AEEEM存储库的五个缺陷预测数据集,展示了在检测缺陷和非缺陷实例方面的优越性能,特别是在不平衡数据场景中。与传统的优化方法相比,RL-SWO显著提高了XGBoost的分类精度和鲁棒性。实验结果表明,RL-SWO在优化过程中通过平衡勘探和开采来增强SDP模型的潜力。这项研究通过利用元启发式和强化学习来推进自动化缺陷预测,为提高软件可靠性提供了一种有前途的方法。
{"title":"Predicting software defects using an extreme gradient boosting model tuned with reinforcement learning based spider wasp optimizer","authors":"Raja Oueslati,&nbsp;Mohamed Wajdi Ouertani,&nbsp;Ghaith Manita,&nbsp;Amit Chhabra","doi":"10.1007/s10515-025-00572-y","DOIUrl":"10.1007/s10515-025-00572-y","url":null,"abstract":"<div><p>Software defect prediction (SDP) is a critical task for improving software quality and reducing development costs by identifying faults early. While machine learning models, particularly XGBoost, have been widely adopted for SDP, their performance is highly dependent on optimal hyperparameter tuning. Furthermore, existing state-of-the-art methods—including deep learning-based approaches leveraging semantic code features—often suffer from high computational complexity, and extensive training requirements. To address these challenges, this paper proposes a hybrid optimization approach, RL-SWO, which integrates the Spider Wasp Optimizer (SWO) with reinforcement learning (RL) to refine XGBoost’s parameters. RL-SWO was first validated on CEC’22 benchmark functions, where it outperformed several state-of-the-art metaheuristics. It was then applied to five defect prediction datasets from the AEEEM repository, demonstrating superior performance in detecting defective and non-defective instances, particularly in imbalanced data scenarios. Compared to traditional optimization methods, RL-SWO significantly improved XGBoost’s classification accuracy and robustness. Experimental results highlight RL-SWO’s potential in enhancing SDP models by balancing exploration and exploitation during optimization. This study advances automated defect prediction by leveraging metaheuristics and reinforcement learning, offering a promising approach for improving software reliability.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ByteEye: A smart contract vulnerability detection framework at bytecode level with graph neural networks ByteEye:基于图神经网络的字节码级智能合约漏洞检测框架
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-22 DOI: 10.1007/s10515-025-00559-9
Jinni Yang, Shuang Liu, Surong Dai, Yaozheng Fang, Kunpeng Xie, Ye Lu

Smart contract vulnerability detection has attracted increasing attention due to billions of economic losses caused by vulnerabilities. Existing smart contract vulnerability detection methods have high false negative and high false positive rates. To address these issues, we present ByteEye, a bytecode level smart contract vulnerability detection framework with Graph Neural Networks (GNNs). ByteEye first constructs an edge-enhanced Control Flow Graph (CFG) to maintain rich information from the low-level bytecode with low latency. ByteEye also designs and incorporates both general information and vulnerability-specific information into its detection method as bytecode level features. Furthermore, ByteEye flexibly supports machine/deep learning models, especially with graph neural networks, which can facilitate vulnerability detection precisely. The extensive experimental results highlight that ByteEye outperforms the state-of-the-art approaches on all three types of vulnerability detection. ByteEye can achieve an average of 35.29%, 43.95%, and 6.38% higher on F1 than the bytecode level best-performed baseline on reentrancy vulnerability, timestamp dependency vulnerability, and integer overflow/underflow vulnerability, respectively. Moreover, ByteEye can detect 361 new vulnerabilities in real-world smart contracts, which are reported for the first time. ByteEye enhances control flow information, designs general bytecode-level features with expert knowledge, and flexibly supports deep learning models, particularly GNNs, thus achieving high detection effectiveness.

由于智能合约漏洞造成的经济损失高达数十亿美元,智能合约漏洞检测越来越受到人们的关注。现有的智能合约漏洞检测方法存在假阴性和假阳性率高的问题。为了解决这些问题,我们提出了ByteEye,这是一个字节码级的智能合约漏洞检测框架,带有图神经网络(gnn)。ByteEye首先构建一个边缘增强的控制流图(CFG),以保持低延迟的低级字节码的丰富信息。ByteEye还设计并将通用信息和特定漏洞信息作为字节码级功能纳入其检测方法。此外,字节眼灵活支持机器/深度学习模型,特别是图神经网络,可以精确地进行漏洞检测。广泛的实验结果强调,ByteEye在所有三种类型的漏洞检测上都优于最先进的方法。ByteEye在可重入漏洞、时间戳依赖漏洞和整数溢出/下溢漏洞上,在F1上的平均性能分别比字节码级最佳基准高35.29%、43.95%和6.38%。此外,ByteEye可以在现实世界的智能合约中检测到361个新漏洞,这是首次报道。ByteEye增强控制流信息,利用专家知识设计通用字节码级特征,灵活支持深度学习模型,特别是gnn,从而实现高检测效率。
{"title":"ByteEye: A smart contract vulnerability detection framework at bytecode level with graph neural networks","authors":"Jinni Yang,&nbsp;Shuang Liu,&nbsp;Surong Dai,&nbsp;Yaozheng Fang,&nbsp;Kunpeng Xie,&nbsp;Ye Lu","doi":"10.1007/s10515-025-00559-9","DOIUrl":"10.1007/s10515-025-00559-9","url":null,"abstract":"<div><p>Smart contract vulnerability detection has attracted increasing attention due to billions of economic losses caused by vulnerabilities. Existing smart contract vulnerability detection methods have high false negative and high false positive rates. To address these issues, we present ByteEye, a bytecode level smart contract vulnerability detection framework with Graph Neural Networks (GNNs). ByteEye first constructs an edge-enhanced Control Flow Graph (CFG) to maintain rich information from the low-level bytecode with low latency. ByteEye also designs and incorporates both general information and vulnerability-specific information into its detection method as bytecode level features. Furthermore, ByteEye flexibly supports machine/deep learning models, especially with graph neural networks, which can facilitate vulnerability detection precisely. The extensive experimental results highlight that ByteEye outperforms the state-of-the-art approaches on all three types of vulnerability detection. ByteEye can achieve an average of 35.29%, 43.95%, and 6.38% higher on F1 than the bytecode level best-performed baseline on reentrancy vulnerability, timestamp dependency vulnerability, and integer overflow/underflow vulnerability, respectively. Moreover, ByteEye can detect 361 new vulnerabilities in real-world smart contracts, which are reported for the first time. ByteEye enhances control flow information, designs general bytecode-level features with expert knowledge, and flexibly supports deep learning models, particularly GNNs, thus achieving high detection effectiveness.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00559-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmenting software quality assurance with AI and automation using PyTest-BDD 使用PyTest-BDD通过AI和自动化增强软件质量保证
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-22 DOI: 10.1007/s10515-025-00566-w
Xiaofei Zhao, Hua Wang, JieQiong Ding, Zhiming Hu, Qingqing Tian, Ying Wang

This paper explores the integration of Artificial Intelligence (AI) and automation within the Behavior-Driven Development (BDD) paradigm, using the PyTest-BDD framework, to enhance Software Quality Assurance (SQA) processes. Traditional SQA methods struggle with the increasing complexity and rapid release cycles of modern software development. This research demonstrates how AI can address these challenges through intelligent test generation, prioritization, and anomaly detection. The proposed framework utilizes Natural Language Processing (NLP) to analyze requirements, machine learning (ML) to generate and prioritize test scenarios, and deep learning (DL) for anomaly detection, all within the PyTest-BDD ecosystem. This approach fosters a collaborative environment between human testers and AI agents, leading to more robust testing with reduced human overhead.The framework offers reduced human error, faster feedback loops, and increased team collaboration, thereby reducing development time and improving software reliability. AI-powered test prioritization and anomaly detection are shown to be effective in identifying subtle defects. The modular and extensible nature of the framework allows for a flexible and scalable testing system.

本文探讨了行为驱动开发(BDD)范式中人工智能(AI)和自动化的集成,使用PyTest-BDD框架来增强软件质量保证(SQA)过程。传统的SQA方法与现代软件开发日益增加的复杂性和快速的发布周期作斗争。这项研究展示了人工智能如何通过智能测试生成、优先级排序和异常检测来应对这些挑战。提议的框架利用自然语言处理(NLP)来分析需求,机器学习(ML)来生成和优先考虑测试场景,以及深度学习(DL)来进行异常检测,所有这些都在PyTest-BDD生态系统中。这种方法促进了人类测试人员和人工智能代理之间的协作环境,从而在减少人力开销的情况下实现更健壮的测试。该框架减少了人为错误,加快了反馈循环,增加了团队协作,从而减少了开发时间,提高了软件可靠性。人工智能驱动的测试优先级和异常检测在识别细微缺陷方面是有效的。框架的模块化和可扩展特性允许灵活和可扩展的测试系统。
{"title":"Augmenting software quality assurance with AI and automation using PyTest-BDD","authors":"Xiaofei Zhao,&nbsp;Hua Wang,&nbsp;JieQiong Ding,&nbsp;Zhiming Hu,&nbsp;Qingqing Tian,&nbsp;Ying Wang","doi":"10.1007/s10515-025-00566-w","DOIUrl":"10.1007/s10515-025-00566-w","url":null,"abstract":"<div><p>This paper explores the integration of Artificial Intelligence (AI) and automation within the Behavior-Driven Development (BDD) paradigm, using the PyTest-BDD framework, to enhance Software Quality Assurance (SQA) processes. Traditional SQA methods struggle with the increasing complexity and rapid release cycles of modern software development. This research demonstrates how AI can address these challenges through intelligent test generation, prioritization, and anomaly detection. The proposed framework utilizes Natural Language Processing (NLP) to analyze requirements, machine learning (ML) to generate and prioritize test scenarios, and deep learning (DL) for anomaly detection, all within the PyTest-BDD ecosystem. This approach fosters a collaborative environment between human testers and AI agents, leading to more robust testing with reduced human overhead.The framework offers reduced human error, faster feedback loops, and increased team collaboration, thereby reducing development time and improving software reliability. AI-powered test prioritization and anomaly detection are shown to be effective in identifying subtle defects. The modular and extensible nature of the framework allows for a flexible and scalable testing system.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automating software size measurement from python code using language models 使用语言模型从python代码自动化软件大小测量
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-18 DOI: 10.1007/s10515-025-00571-z
Samet Tenekeci, Hüseyin Ünlü, Bedir Arda Gül, Damla Keleş, Murat Küük, Onur Demirörs

Software size is a key input for project planning, effort estimation, and productivity analysis. While pre-trained language models have shown promise in deriving functional size from natural-language requirements, measuring size directly from source code remains under-explored. Yet, code-based size measurement is critical in modern workflows where requirement documents are often incomplete or unavailable, especially in Agile development environments. This exploratory study investigates the use of CodeBERT, a pre-trained bimodal transformer model, for measuring software size directly from Python source code according to two measurement methods: COSMIC Function Points and MicroM. We construct two curated datasets from the Python subset of the CodeSearchNet corpus, and manually annotate each function with its corresponding size. Our experimental results show that CodeBERT can successfully measure COSMIC data movements with up to 91.4% accuracy and generalize to the functional, architectural, and algorithmic event types defined in MicroM, reaching up to 81.5% accuracy. These findings highlight the potential of code-based language models for automated functional size measurement when requirement artifacts are absent or unreliable.

软件大小是项目计划、工作量估计和生产力分析的关键输入。虽然预训练的语言模型在从自然语言需求中获得功能大小方面显示出了希望,但直接从源代码中测量大小仍然有待探索。然而,在需求文档经常不完整或不可用的现代工作流中,尤其是在敏捷开发环境中,基于代码的大小度量是至关重要的。本探索性研究调查了CodeBERT的使用,CodeBERT是一种预训练的双峰变压器模型,用于根据两种测量方法直接从Python源代码测量软件大小:COSMIC功能点和MicroM。我们从CodeSearchNet语料库的Python子集中构建了两个精心策划的数据集,并用相应的大小手动注释每个函数。我们的实验结果表明,CodeBERT可以成功地测量COSMIC数据移动,准确率高达91.4%,并且可以推广到MicroM中定义的功能、架构和算法事件类型,准确率高达81.5%。这些发现突出了在需求工件缺失或不可靠的情况下,基于代码的语言模型用于自动化功能大小度量的潜力。
{"title":"Automating software size measurement from python code using language models","authors":"Samet Tenekeci,&nbsp;Hüseyin Ünlü,&nbsp;Bedir Arda Gül,&nbsp;Damla Keleş,&nbsp;Murat Küük,&nbsp;Onur Demirörs","doi":"10.1007/s10515-025-00571-z","DOIUrl":"10.1007/s10515-025-00571-z","url":null,"abstract":"<div><p>Software size is a key input for project planning, effort estimation, and productivity analysis. While pre-trained language models have shown promise in deriving functional size from natural-language requirements, measuring size directly from source code remains under-explored. Yet, code-based size measurement is critical in modern workflows where requirement documents are often incomplete or unavailable, especially in Agile development environments. This exploratory study investigates the use of CodeBERT, a pre-trained bimodal transformer model, for measuring software size directly from Python source code according to two measurement methods: COSMIC Function Points and MicroM. We construct two curated datasets from the Python subset of the CodeSearchNet corpus, and manually annotate each function with its corresponding size. Our experimental results show that CodeBERT can successfully measure COSMIC data movements with up to 91.4% accuracy and generalize to the functional, architectural, and algorithmic event types defined in MicroM, reaching up to 81.5% accuracy. These findings highlight the potential of code-based language models for automated functional size measurement when requirement artifacts are absent or unreliable.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145316540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic exploration of C-to-rust code translation based on large language models: prompt strategies and automated repair 基于大型语言模型的c到rust代码翻译的系统探索:提示策略和自动修复
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-18 DOI: 10.1007/s10515-025-00570-0
Ruxin Zhang, Shanxin Zhang, Linbo Xie

C is widely used in system programming due to its low-level flexibility. However, as demands for memory safety and code reliability grow, Rust has become a more favorable alternative owing to its modern design principles. Migrating existing C code to Rust has therefore emerged as a key approach for enhancing the security and maintainability of software systems. Nevertheless, automating such migrations remains challenging due to fundamental differences between the two languages in terms of language design philosophy, type systems, and levels of abstraction. Most current code transformation tools focus on mappings of basic data types and syntactic replacements, such as handling pointers or conversion of lock mechanisms. These approaches often fail to deeply model the semantic features and programming paradigms of the target language. To address this limitation, this paper proposes RustFlow, a C-to-Rust code translation framework based on large language models (LLMs), designed to generate idiomatic and semantically accurate Rust code. This framework employs a multi-stage progressive architecture, which decomposes the overall translation task into several sequential stages, namely translation, validation, and repair. During the translation phase, a collaborative prompting strategy is employed to guide the LLM in achieving cross-language semantic alignment, thereby improving the accuracy of the generated code. Subsequently, a validation mechanism is introduced to perform syntactic and semantic checks on the generated output, and a conversational iterative repair strategy is employed to further enhance the quality of the final result. Experimental results show that RustFlow outperforms most of the latest baseline approaches, achieving an average improvement of 50.67% in translation performance compared to the base LLM. This work offers a novel technical approach and practical support for efficient and reliable cross-language code migration.

由于C语言具有较低的灵活性,在系统编程中得到了广泛的应用。然而,随着对内存安全性和代码可靠性需求的增长,由于其现代设计原则,Rust已成为更有利的选择。因此,将现有的C代码迁移到Rust已经成为增强软件系统安全性和可维护性的关键方法。然而,由于两种语言在语言设计哲学、类型系统和抽象级别方面的根本差异,自动化这样的迁移仍然具有挑战性。大多数当前的代码转换工具侧重于基本数据类型的映射和语法替换,例如处理指针或锁机制的转换。这些方法往往不能对目标语言的语义特征和编程范式进行深入的建模。为了解决这一限制,本文提出了RustFlow,一个基于大型语言模型(llm)的c到Rust代码翻译框架,旨在生成习惯用语和语义准确的Rust代码。该框架采用多阶段递进体系结构,将整个翻译任务分解为几个连续的阶段,即翻译、验证和修复。在翻译阶段,采用协同提示策略指导LLM实现跨语言语义对齐,从而提高生成代码的准确性。随后,引入验证机制对生成的输出执行语法和语义检查,并采用会话迭代修复策略进一步提高最终结果的质量。实验结果表明,RustFlow优于大多数最新的基线方法,与基础LLM相比,其翻译性能平均提高了50.67%。这项工作为高效可靠的跨语言代码迁移提供了一种新颖的技术方法和实际支持。
{"title":"A systematic exploration of C-to-rust code translation based on large language models: prompt strategies and automated repair","authors":"Ruxin Zhang,&nbsp;Shanxin Zhang,&nbsp;Linbo Xie","doi":"10.1007/s10515-025-00570-0","DOIUrl":"10.1007/s10515-025-00570-0","url":null,"abstract":"<div><p>C is widely used in system programming due to its low-level flexibility. However, as demands for memory safety and code reliability grow, Rust has become a more favorable alternative owing to its modern design principles. Migrating existing C code to Rust has therefore emerged as a key approach for enhancing the security and maintainability of software systems. Nevertheless, automating such migrations remains challenging due to fundamental differences between the two languages in terms of language design philosophy, type systems, and levels of abstraction. Most current code transformation tools focus on mappings of basic data types and syntactic replacements, such as handling pointers or conversion of lock mechanisms. These approaches often fail to deeply model the semantic features and programming paradigms of the target language. To address this limitation, this paper proposes RustFlow, a C-to-Rust code translation framework based on large language models (LLMs), designed to generate idiomatic and semantically accurate Rust code. This framework employs a multi-stage progressive architecture, which decomposes the overall translation task into several sequential stages, namely translation, validation, and repair. During the translation phase, a collaborative prompting strategy is employed to guide the LLM in achieving cross-language semantic alignment, thereby improving the accuracy of the generated code. Subsequently, a validation mechanism is introduced to perform syntactic and semantic checks on the generated output, and a conversational iterative repair strategy is employed to further enhance the quality of the final result. Experimental results show that RustFlow outperforms most of the latest baseline approaches, achieving an average improvement of 50.67% in translation performance compared to the base LLM. This work offers a novel technical approach and practical support for efficient and reliable cross-language code migration.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145316543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards integrated dashboards for better management of human-centric issues in software development 朝着集成仪表板的方向发展,以便更好地管理软件开发中以人为中心的问题
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-18 DOI: 10.1007/s10515-025-00565-x
Liam Todd, Kashumi Madampe, Hourieh Khalajzadeh, Mojtaba Shahin, John Grundy

GitHub and Jira projects typically contain many issues and issue comments used to track project tasks and defects. An important class of issues that needs appropriate consideration is called “human-centric issues”. These issues relate to different human characteristics of end users that need to be identified, tracked and managed differently from traditional technical-related issues. Current management of these human-centric issues during defect management is limited. We introduce a novel dashboard – the (Human-centric Issue Visualiser – HCIV) that categorises and tags these HCIss. We built HCIV prototypes for the two platforms, GitHub and Jira. These tag issues and present them in various visual forms to software practitioners. Using the dashboard, human-centric issues can be prioritised and tracked, and machine learning-generated classifications can be overridden. To reflect these interactions, associated GitHub and Jira issue tags are updated while the user interacts with our dashboard. The user evaluations of our dashboard prototypes show their potential for human-centric issue management. A demo of the GitHub version of the tool being used can be viewed at https://youtu.be/v49aiRiDIPs, and the Jira version can be viewed at https://youtu.be/qQM72SErmqs.

GitHub和Jira项目通常包含许多问题,并发布用于跟踪项目任务和缺陷的注释。需要适当考虑的一类重要问题被称为“以人为本的问题”。这些问题涉及最终用户的不同人类特征,需要以不同于传统技术相关问题的方式进行识别、跟踪和管理。在缺陷管理期间,当前对这些以人为中心的问题的管理是有限的。我们介绍了一种新的仪表板——(以人为中心的问题可视化器- HCIV),它对这些hci进行分类和标记。我们为GitHub和Jira这两个平台构建了HCIV原型。这些标记问题,并以各种可视化的形式呈现给软件从业者。使用仪表板,可以对以人为中心的问题进行优先级排序和跟踪,并且可以覆盖机器学习生成的分类。为了反映这些交互,当用户与我们的仪表板交互时,相关的GitHub和Jira问题标签会更新。我们的仪表板原型的用户评估显示了它们在以人为中心的问题管理方面的潜力。使用的GitHub版本的演示可以在https://youtu.be/v49aiRiDIPs上查看,Jira版本可以在https://youtu.be/qQM72SErmqs上查看。
{"title":"Towards integrated dashboards for better management of human-centric issues in software development","authors":"Liam Todd,&nbsp;Kashumi Madampe,&nbsp;Hourieh Khalajzadeh,&nbsp;Mojtaba Shahin,&nbsp;John Grundy","doi":"10.1007/s10515-025-00565-x","DOIUrl":"10.1007/s10515-025-00565-x","url":null,"abstract":"<div><p>GitHub and Jira projects typically contain many issues and issue comments used to track project tasks and defects. An important class of issues that needs appropriate consideration is called “<i>human-centric issues</i>”. These issues relate to different human characteristics of end users that need to be identified, tracked and managed differently from traditional technical-related issues. Current management of these human-centric issues during defect management is limited. We introduce a novel dashboard – the (Human-centric Issue Visualiser – HCIV) that categorises and tags these HCIss. We built HCIV prototypes for the two platforms, GitHub and Jira. These tag issues and present them in various visual forms to software practitioners. Using the dashboard, human-centric issues can be prioritised and tracked, and machine learning-generated classifications can be overridden. To reflect these interactions, associated GitHub and Jira issue tags are updated while the user interacts with our dashboard. The user evaluations of our dashboard prototypes show their potential for human-centric issue management. A demo of the GitHub version of the tool being used can be viewed at https://youtu.be/v49aiRiDIPs, and the Jira version can be viewed at https://youtu.be/qQM72SErmqs.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145316539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
JDExtractor: an automated approach for efficient extraction of defect-related methods in Java projects JDExtractor:一种在Java项目中有效提取缺陷相关方法的自动化方法
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-18 DOI: 10.1007/s10515-025-00563-z
Tianyang Liu, Jiawei Ye, Weixing Ji

High-quality repositories containing real-world defects are essential for developing defect-related algorithms. Although plenty of defect repositories exist, they often fail to capture the context of inter-procedural defects, which include all methods in the propagation path from the defect-source method to the defect-triggering method. This limitation is particularly critical for the Null Pointer Exception (NPE), a common defect that often propagates across multiple methods in Java systems. To address this problem, we propose a novel and automatic approach, called JDExtractor, to extract defect-related methods from real applications. The main challenge is how to identify all defect-related methods efficiently and accurately. JDExtractor tackles this challenge by constructing a method-level data graph using the principle of Java type compatibility and simplifying the data graph using filtering criteria. Data flow analysis helps construct a coarse-grained method-level data graph, which reflects the potential patterns of inter-procedural data interaction, thereby ensuring analysis efficiency. Afterward, filtering analysis simplifies the data graph based on the propagation properties of inter-procedural defects, thus ensuring analysis accuracy. Evaluation results suggest that both the static slicing tool WALA and the dynamic slicing tool Slicer4J yield several false positives, whereas JDExtractor successfully extracts defect-related methods and defect propagation paths with fewer false positives in a short time. Moreover, JDExtractor has been applied to open source projects on GitHub, ultimately extracting defect-related methods for 67 defects from 319 compiled open source applications.

包含真实缺陷的高质量存储库对于开发与缺陷相关的算法至关重要。尽管存在大量缺陷存储库,但是它们经常不能捕获过程间缺陷的上下文,其中包括从缺陷源方法到缺陷触发方法传播路径中的所有方法。这个限制对于空指针异常(NPE)尤其重要,这是Java系统中经常跨多个方法传播的常见缺陷。为了解决这个问题,我们提出了一种新的自动方法,称为JDExtractor,用于从实际应用程序中提取与缺陷相关的方法。主要的挑战是如何有效而准确地识别所有与缺陷相关的方法。JDExtractor通过使用Java类型兼容性原则构建方法级数据图并使用过滤标准简化数据图来解决这一挑战。数据流分析有助于构建粗粒度的方法级数据图,反映过程间数据交互的潜在模式,从而保证分析效率。然后,过滤分析根据过程间缺陷的传播特性对数据图进行简化,从而保证了分析的准确性。评估结果表明,静态切片工具WALA和动态切片工具Slicer4J都产生了几个假阳性,而JDExtractor在短时间内成功地提取了与缺陷相关的方法和缺陷传播路径,假阳性较少。此外,JDExtractor已被应用于GitHub上的开源项目,最终从319个编译的开源应用程序中提取了针对67个缺陷的缺陷相关方法。
{"title":"JDExtractor: an automated approach for efficient extraction of defect-related methods in Java projects","authors":"Tianyang Liu,&nbsp;Jiawei Ye,&nbsp;Weixing Ji","doi":"10.1007/s10515-025-00563-z","DOIUrl":"10.1007/s10515-025-00563-z","url":null,"abstract":"<div><p>High-quality repositories containing real-world defects are essential for developing defect-related algorithms. Although plenty of defect repositories exist, they often fail to capture the context of inter-procedural defects, which include all methods in the propagation path from the defect-source method to the defect-triggering method. This limitation is particularly critical for the Null Pointer Exception (NPE), a common defect that often propagates across multiple methods in Java systems. To address this problem, we propose a novel and automatic approach, called <i>JDExtractor</i>, to extract defect-related methods from real applications. The main challenge is how to identify all defect-related methods efficiently and accurately. <i>JDExtractor</i> tackles this challenge by constructing a method-level data graph using the principle of Java type compatibility and simplifying the data graph using filtering criteria. Data flow analysis helps construct a coarse-grained method-level data graph, which reflects the potential patterns of inter-procedural data interaction, thereby ensuring analysis efficiency. Afterward, filtering analysis simplifies the data graph based on the propagation properties of inter-procedural defects, thus ensuring analysis accuracy. Evaluation results suggest that both the static slicing tool WALA and the dynamic slicing tool Slicer4J yield several false positives, whereas <i>JDExtractor</i> successfully extracts defect-related methods and defect propagation paths with fewer false positives in a short time. Moreover, <i>JDExtractor</i> has been applied to open source projects on GitHub, ultimately extracting defect-related methods for 67 defects from 319 compiled open source applications.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145316538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Automated Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1