首页 > 最新文献

IEEE Transactions on Software Engineering最新文献

英文 中文
Augmenting Smart Contract Decompiler Output through Fine-grained Dependency Analysis and LLM-facilitated Semantic Recovery 通过细粒度依赖分析和llm促进的语义恢复增强智能合约反编译器输出
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-20 DOI: 10.1109/tse.2025.3623325
Zeqin Liao, Yuhong Nan, Zixu Gao, Henglong Liang, Sicheng Hao, Peifan Ren, Zibin Zheng
{"title":"Augmenting Smart Contract Decompiler Output through Fine-grained Dependency Analysis and LLM-facilitated Semantic Recovery","authors":"Zeqin Liao, Yuhong Nan, Zixu Gao, Henglong Liang, Sicheng Hao, Peifan Ren, Zibin Zheng","doi":"10.1109/tse.2025.3623325","DOIUrl":"https://doi.org/10.1109/tse.2025.3623325","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145397749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating and Improving GPT-Based Expansion of Abbreviations 基于gpt的缩略语扩展评价与改进
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-20 DOI: 10.1109/tse.2025.3623625
Yanjie Jiang, Chenxu Li, Zixiao Zhao, Fu Fan, Lu Zhang, Hui Liu
{"title":"Evaluating and Improving GPT-Based Expansion of Abbreviations","authors":"Yanjie Jiang, Chenxu Li, Zixiao Zhao, Fu Fan, Lu Zhang, Hui Liu","doi":"10.1109/tse.2025.3623625","DOIUrl":"https://doi.org/10.1109/tse.2025.3623625","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"6 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145397753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Toxic Can You Get? Search-Based Toxicity Testing for Large Language Models 你的毒性有多大?基于搜索的大型语言模型毒性测试
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-17 DOI: 10.1109/TSE.2025.3607625
Simone Corbo;Luca Bancale;Valeria De Gennaro;Livia Lestingi;Vincenzo Scotti;Matteo Camilli
Language is a deep-rooted means of perpetration of stereotypes and discrimination. Large Language Models, now a pervasive technology in our everyday lives, can cause extensive harm when prone to generating toxic responses. The standard way to address this issue is to align the LLM, which, however, dampens the issue without constituting a definitive solution. Therefore, testing LLM even after alignment efforts remains crucial for detecting any residual deviations with respect to ethical standards. We present EvoTox, an automated testing framework for LLMs’ inclination to toxicity, providing a way to quantitatively assess how much LLMs can be pushed towards toxic responses even in the presence of alignment. The framework adopts an iterative evolution strategy that exploits the interplay between two LLMs, the System Under Test (SUT) and the Prompt Generator steering SUT responses toward higher toxicity. The toxicity level is assessed by an automated oracle based on an existing toxicity classifier. We conduct a quantitative and qualitative empirical evaluation using five state-of-the-art LLMs as evaluation subjects having increasing complexity (7–671B parameters). Our quantitative evaluation assesses the cost-effectiveness of four alternative versions of EvoTox against existing baseline methods, based on random search, curated datasets of toxic prompts, and adversarial attacks. Our qualitative assessment engages human evaluators to rate the fluency of the generated prompts and the perceived toxicity of the responses collected during the testing sessions. Results indicate that the effectiveness, in terms of detected toxicity level, is significantly higher than the selected baseline methods (effect size up to 1.0 against random search and up to 0.99 against adversarial attacks). Furthermore, EvoTox yields a limited cost overhead (from 22% to 35% on average). This work includes examples of toxic degeneration by LLMs, which may be considered profane or offensive to some readers. Reader discretion is advised.
语言是传播刻板印象和歧视的根深蒂固的手段。大型语言模型,现在在我们的日常生活中是一种普遍的技术,当容易产生有毒反应时,可能会造成广泛的伤害。解决这个问题的标准方法是调整LLM,然而,这在不构成最终解决方案的情况下抑制了问题。因此,即使在校准工作之后测试LLM对于检测有关道德标准的任何残余偏差仍然至关重要。我们提出了EvoTox,这是一个用于llm毒性倾向的自动化测试框架,提供了一种定量评估llm在存在对齐的情况下可以被推向毒性反应的程度的方法。该框架采用迭代进化策略,利用两个llm之间的相互作用,即被测试系统(SUT)和提示生成器,将SUT响应转向更高的毒性。毒性水平由基于现有毒性分类器的自动oracle评估。我们使用五个最先进的法学硕士作为评估对象进行了定量和定性的实证评估,这些评估对象的复杂性越来越高(7-671B参数)。我们的定量评估评估了四种替代版本的EvoTox相对于现有基线方法的成本效益,基于随机搜索,有毒提示和对抗性攻击的策划数据集。我们的定性评估涉及人类评估人员,以评估生成提示的流畅性和在测试期间收集的响应的感知毒性。结果表明,就检测到的毒性水平而言,有效性显著高于所选的基线方法(随机搜索的效应大小为1.0,对抗性攻击的效应大小为0.99)。此外,EvoTox产生有限的成本开销(平均从22%到35%)。这项工作包括法学硕士毒性变性的例子,这可能被认为是亵渎或冒犯一些读者。读者应谨慎行事。
{"title":"How Toxic Can You Get? Search-Based Toxicity Testing for Large Language Models","authors":"Simone Corbo;Luca Bancale;Valeria De Gennaro;Livia Lestingi;Vincenzo Scotti;Matteo Camilli","doi":"10.1109/TSE.2025.3607625","DOIUrl":"10.1109/TSE.2025.3607625","url":null,"abstract":"Language is a deep-rooted means of perpetration of stereotypes and discrimination. Large Language Models, now a pervasive technology in our everyday lives, can cause extensive harm when prone to generating toxic responses. The standard way to address this issue is to align the LLM, which, however, dampens the issue without constituting a definitive solution. Therefore, testing LLM even after alignment efforts remains crucial for detecting any residual deviations with respect to ethical standards. We present EvoTox, an automated testing framework for LLMs’ inclination to toxicity, providing a way to quantitatively assess how much LLMs can be pushed towards toxic responses even in the presence of alignment. The framework adopts an iterative evolution strategy that exploits the interplay between two LLMs, the System Under Test (SUT) and the Prompt Generator steering SUT responses toward higher toxicity. The toxicity level is assessed by an automated oracle based on an existing toxicity classifier. We conduct a quantitative and qualitative empirical evaluation using five state-of-the-art LLMs as evaluation subjects having increasing complexity (7–671B parameters). Our quantitative evaluation assesses the cost-effectiveness of four alternative versions of EvoTox against existing baseline methods, based on random search, curated datasets of toxic prompts, and adversarial attacks. Our qualitative assessment engages human evaluators to rate the fluency of the generated prompts and the perceived toxicity of the responses collected during the testing sessions. Results indicate that the effectiveness, in terms of detected toxicity level, is significantly higher than the selected baseline methods (effect size up to 1.0 against random search and up to 0.99 against adversarial attacks). Furthermore, EvoTox yields a limited cost overhead (from 22% to 35% on average). This work includes examples of toxic degeneration by LLMs, which may be considered profane or offensive to some readers. Reader discretion is advised.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 11","pages":"3056-3071"},"PeriodicalIF":5.6,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Who Is Pulling the Strings: Unveiling Smart Contract State Manipulation Attacks Through State-Aware Dataflow Analysis 谁在幕后操纵:通过状态感知数据流分析揭示智能合约状态操纵攻击
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-17 DOI: 10.1109/TSE.2025.3605145
Shuo Yang;Jiachi Chen;Lei Xiao;Jinyuan Hu;Dan Lin;Jiajing Wu;Tao Zhang;Zibin Zheng
Recently, the increasing complexity of smart contracts and their interactions has led to more sophisticated strategies for executing attacks. Hackers often need to deploy attacker contracts as delegators to automate these attacks on their behalf. Existing identification methods for attacker contracts either rely on simple patterns (e.g., recursive callback control flow) that suffer from high false-positive rates and limited extraction of interaction and call information, or lack fully automated detection capabilities. Consequently, these limitations reduce the effectiveness of current solutions in identifying modern, intricate attacks. To overcome these challenges, we introduce the concept of state manipulation attacks, which abstracts the exploitation of problematic state dependencies arising from contract interactions. During these attacks, hackers first alter the storage state of one contract (the manipulated contract), which determines the profit they can gain. They then call another contract (the victim contract) to exploit its dependency on the altered state and maximize their profits. We present SMAsher, a tool designed to automatically identify state manipulation attacker contracts. SMAsher leverages fine-grained state-aware dataflow analysis to detect exploitation traces and exploited state dependencies among contracts, focusing on recovering the call path and interaction semantics. Our extensive experiments on 1.38 million real-world contracts demonstrate that SMAsher successfully identifies 311 state manipulation attacker contracts with 100% precision, resulting in $ 6.95 million in losses. Our findings also reveal some notable malicious characteristics of hackers’ accounts through their deployed attacker contracts. Additionally, we have provided 10 PoCs (Proof-of-Concepts) for previously unidentified attacks, all of which have been confirmed and released to the community.
最近,智能合约及其交互的复杂性日益增加,导致了执行攻击的更复杂策略。黑客通常需要将攻击者契约作为委托来部署,以代表他们自动执行这些攻击。现有的攻击者契约识别方法要么依赖于简单的模式(例如,递归回调控制流),这些模式存在高误报率和有限的交互和调用信息提取,要么缺乏完全自动化的检测能力。因此,这些限制降低了当前解决方案识别现代复杂攻击的有效性。为了克服这些挑战,我们引入了状态操纵攻击的概念,它抽象了对契约交互产生的有问题的状态依赖的利用。在这些攻击中,黑客首先改变一个合约(被操纵的合约)的存储状态,这决定了他们可以获得的利润。然后,他们调用另一个合同(受害者合同)来利用其对改变状态的依赖,并最大化他们的利润。我们介绍SMAsher,一个用于自动识别状态操纵攻击者合约的工具。SMAsher利用细粒度的状态感知数据流分析来检测契约之间的利用痕迹和被利用的状态依赖,重点是恢复调用路径和交互语义。我们对138万个真实世界的合约进行了广泛的实验,结果表明SMAsher以100%的准确率成功识别了311个状态操纵攻击者合约,造成了695万美元的损失。我们的研究结果还通过部署的攻击者合约揭示了黑客账户的一些显著恶意特征。此外,我们还提供了10个poc(概念验证),用于先前未识别的攻击,所有这些攻击都已被确认并发布给社区。
{"title":"Who Is Pulling the Strings: Unveiling Smart Contract State Manipulation Attacks Through State-Aware Dataflow Analysis","authors":"Shuo Yang;Jiachi Chen;Lei Xiao;Jinyuan Hu;Dan Lin;Jiajing Wu;Tao Zhang;Zibin Zheng","doi":"10.1109/TSE.2025.3605145","DOIUrl":"10.1109/TSE.2025.3605145","url":null,"abstract":"Recently, the increasing complexity of smart contracts and their interactions has led to more sophisticated strategies for executing attacks. Hackers often need to deploy attacker contracts as delegators to automate these attacks on their behalf. Existing identification methods for attacker contracts either rely on simple patterns (e.g., recursive callback control flow) that suffer from high false-positive rates and limited extraction of interaction and call information, or lack fully automated detection capabilities. Consequently, these limitations reduce the effectiveness of current solutions in identifying modern, intricate attacks. To overcome these challenges, we introduce the concept of <italic>state manipulation attacks</i>, which abstracts the exploitation of problematic state dependencies arising from contract interactions. During these attacks, hackers first alter the storage state of one contract (the manipulated contract), which determines the profit they can gain. They then call another contract (the victim contract) to exploit its dependency on the altered state and maximize their profits. We present SMAsher, a tool designed to automatically identify state manipulation attacker contracts. SMAsher leverages fine-grained state-aware dataflow analysis to detect exploitation traces and exploited state dependencies among contracts, focusing on recovering the call path and interaction semantics. Our extensive experiments on 1.38 million real-world contracts demonstrate that SMAsher successfully identifies 311 state manipulation attacker contracts with 100% precision, resulting in $ 6.95 million in losses. Our findings also reveal some notable malicious characteristics of hackers’ accounts through their deployed attacker contracts. Additionally, we have provided 10 PoCs (Proof-of-Concepts) for previously unidentified attacks, all of which have been confirmed and released to the community.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 10","pages":"2942-2956"},"PeriodicalIF":5.6,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DT4LM: Differential Testing for Reliable Language Model Updates in Classification Tasks DT4LM:分类任务中可靠语言模型更新的差分测试
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-16 DOI: 10.1109/tse.2025.3622251
Xinyue Zuo, Yan Xiao, Xiaochun Cao, Wenya Wang, Jin Song Dong
{"title":"DT4LM: Differential Testing for Reliable Language Model Updates in Classification Tasks","authors":"Xinyue Zuo, Yan Xiao, Xiaochun Cao, Wenya Wang, Jin Song Dong","doi":"10.1109/tse.2025.3622251","DOIUrl":"https://doi.org/10.1109/tse.2025.3622251","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"91 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hydra-Reviewer: A holistic multi-agent system for automatic code review comment generation Hydra-Reviewer:一个整体的多代理系统,用于自动生成代码审查注释
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-14 DOI: 10.1109/tse.2025.3621462
Xiaoxue Ren, Chaoqun Dai, Qiao Huang, Ye Wang, Chao Liu, Bo Jiang
{"title":"Hydra-Reviewer: A holistic multi-agent system for automatic code review comment generation","authors":"Xiaoxue Ren, Chaoqun Dai, Qiao Huang, Ye Wang, Chao Liu, Bo Jiang","doi":"10.1109/tse.2025.3621462","DOIUrl":"https://doi.org/10.1109/tse.2025.3621462","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"102 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145289299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Software Architecture Recovery Augmented With Semantics 增强语义的软件架构恢复
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-14 DOI: 10.1109/TSE.2025.3620670
Wenting Zhao;Wuxia Jin;Yiran Zhang;Ming Fan;Haijun Wang;Li Li;Yang Liu;Ting Liu
The architecture of software systems evolves along with their upgrades and maintenance, inevitably creating a gap between the defact architecture and the designed one. To perceive and fix the discrepancy, clustering-based architecture recovery methods have been developed to re-engineer the real-time system architecture from the code implementation. However, existing solutions still face several limitations. They underutilize both code-level and architecture-level semantics underlying the source code. Moreover, they overlook implicit structural dependencies that complement explicit ones to reflect code interactions. To address these challenges, we propose SemArc, an architecture recovery method that utilizes large language models to comprehend both implementation-level and architecture-level semantics, supported by well-established canonical architectural patterns as a knowledge base. SemArc also incorporates both implicit and explicit dependencies to complete the system behavior representations. Additionally, SemArc introduces a component-as-anchor guided clustering algorithm to improve the clustering process. We evaluated SemArc on 15 software systems written in C/C++, Java, and Python, using five different metrics. The results demonstrate that SemArc outperforms seven baseline methods by an average of 32 percentage points. We also examined how three factors—code semantics, architectural semantics, and implicit dependencies—as well as different levels of architectural semantic descriptions, influence recovery accuracy. A case study on the Bash project indicates that SemArc has the potential to yield even more precise recovery results than those labeled by humans.
软件系统的体系结构随着它们的升级和维护而发展,不可避免地在实际体系结构和设计的体系结构之间产生了差距。为了发现和修复这种差异,基于集群的体系结构恢复方法被开发出来,从代码实现重新设计实时系统体系结构。然而,现有的解决方案仍然面临一些限制。它们没有充分利用源代码底层的代码级和体系结构级语义。此外,它们忽略了隐式的结构依赖,而这些结构依赖是对显式依赖的补充,以反映代码交互。为了应对这些挑战,我们提出了SemArc,这是一种体系结构恢复方法,它利用大型语言模型来理解实现级和体系结构级语义,并由完善的规范体系结构模式作为知识库提供支持。SemArc还结合了隐式和显式依赖关系来完成系统行为表示。此外,SemArc还引入了组件作为锚的引导聚类算法来改进聚类过程。我们在用C/ c++、Java和Python编写的15个软件系统上评估了SemArc,使用了5个不同的指标。结果表明,SemArc比7种基准方法平均高出32个百分点。我们还研究了三个因素——代码语义、体系结构语义和隐式依赖关系——以及不同层次的体系结构语义描述如何影响恢复准确性。对Bash项目的一个案例研究表明,SemArc有可能产生比人类标记的更精确的恢复结果。
{"title":"Software Architecture Recovery Augmented With Semantics","authors":"Wenting Zhao;Wuxia Jin;Yiran Zhang;Ming Fan;Haijun Wang;Li Li;Yang Liu;Ting Liu","doi":"10.1109/TSE.2025.3620670","DOIUrl":"10.1109/TSE.2025.3620670","url":null,"abstract":"The architecture of software systems evolves along with their upgrades and maintenance, inevitably creating a gap between the defact architecture and the designed one. To perceive and fix the discrepancy, clustering-based architecture recovery methods have been developed to re-engineer the real-time system architecture from the code implementation. However, existing solutions still face several limitations. They underutilize both code-level and architecture-level semantics underlying the source code. Moreover, they overlook implicit structural dependencies that complement explicit ones to reflect code interactions. To address these challenges, we propose SemArc, an architecture recovery method that utilizes large language models to comprehend both implementation-level and architecture-level semantics, supported by well-established canonical architectural patterns as a knowledge base. SemArc also incorporates both implicit and explicit dependencies to complete the system behavior representations. Additionally, SemArc introduces a component-as-anchor guided clustering algorithm to improve the clustering process. We evaluated SemArc on 15 software systems written in C/C++, Java, and Python, using five different metrics. The results demonstrate that SemArc outperforms seven baseline methods by an average of 32 percentage points. We also examined how three factors—code semantics, architectural semantics, and implicit dependencies—as well as different levels of architectural semantic descriptions, influence recovery accuracy. A case study on the Bash project indicates that SemArc has the potential to yield even more precise recovery results than those labeled by humans.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 1","pages":"338-359"},"PeriodicalIF":5.6,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145289302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MirrorFuzz: Leveraging LLM and Shared Bugs for Deep Learning Framework APIs Fuzzing MirrorFuzz:利用LLM和共享bug进行深度学习框架api模糊测试
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-10 DOI: 10.1109/TSE.2025.3619966
Shiwen Ou;Yuwei Li;Lu Yu;Chengkun Wei;Tingke Wen;Qiangpu Chen;Yu Chen;Haizhi Tang;Zulie Pan
Deep learning (DL) frameworks serve as the backbone for a wide range of artificial intelligence applications. However, bugs within DL frameworks can cascade into critical issues in higher-level applications, jeopardizing reliability and security. While numerous techniques have been proposed to detect bugs in DL frameworks, research exploring common API patterns across frameworks and the potential risks they entail remains limited. Notably, many DL frameworks expose similar APIs with overlapping input parameters and functionalities, rendering them vulnerable to shared bugs, where a flaw in one API may extend to analogous APIs in other frameworks. To address this challenge, we propose MirrorFuzz, an automated API fuzzing solution to discover shared bugs in DL frameworks. MirrorFuzz operates in three stages: First, MirrorFuzz collects historical bug data for each API within a DL framework to identify potentially buggy APIs. Second, it matches each buggy API in a specific framework with similar APIs within and across other DL frameworks. Third, it employs large language models (LLMs) to synthesize code for the API under test, leveraging the historical bug data of similar APIs to trigger analogous bugs across APIs. We implement MirrorFuzz and evaluate it on four popular DL frameworks (TensorFlow, PyTorch, OneFlow, and Jittor). Extensive evaluation demonstrates that MirrorFuzz improves code coverage by 39.92% and 98.20% compared to state-of-the-art methods on TensorFlow and PyTorch, respectively. Moreover, MirrorFuzz discovers 315 bugs, 262 of which are newly found, and 80 bugs are fixed, with 52 of these bugs assigned CNVD IDs.
深度学习(DL)框架是广泛的人工智能应用的支柱。然而,深度学习框架中的错误可能会导致高级应用程序中的关键问题,从而危及可靠性和安全性。虽然已经提出了许多技术来检测DL框架中的错误,但探索跨框架的通用API模式及其潜在风险的研究仍然有限。值得注意的是,许多DL框架暴露了具有重叠输入参数和功能的类似API,使它们容易受到共享错误的影响,其中一个API中的缺陷可能扩展到其他框架中的类似API。为了应对这一挑战,我们提出了MirrorFuzz,这是一个自动化的API模糊测试解决方案,用于发现DL框架中的共享错误。MirrorFuzz分为三个阶段:首先,MirrorFuzz收集DL框架内每个API的历史错误数据,以识别潜在的错误API。其次,它将特定框架中的每个有bug的API与其他DL框架内部和跨框架的类似API进行匹配。第三,它使用大型语言模型(llm)来合成被测API的代码,利用类似API的历史错误数据来触发API之间的类似错误。我们实现了MirrorFuzz,并在四种流行的深度学习框架(TensorFlow, PyTorch, onflow和Jittor)上对其进行了评估。广泛的评估表明,与TensorFlow和PyTorch上最先进的方法相比,MirrorFuzz分别将代码覆盖率提高了39.92%和98.20%。此外,MirrorFuzz发现了315个bug,其中262个是新发现的,80个bug得到了修复,其中52个bug被分配了CNVD id。
{"title":"MirrorFuzz: Leveraging LLM and Shared Bugs for Deep Learning Framework APIs Fuzzing","authors":"Shiwen Ou;Yuwei Li;Lu Yu;Chengkun Wei;Tingke Wen;Qiangpu Chen;Yu Chen;Haizhi Tang;Zulie Pan","doi":"10.1109/TSE.2025.3619966","DOIUrl":"10.1109/TSE.2025.3619966","url":null,"abstract":"Deep learning (DL) frameworks serve as the backbone for a wide range of artificial intelligence applications. However, bugs within DL frameworks can cascade into critical issues in higher-level applications, jeopardizing reliability and security. While numerous techniques have been proposed to detect bugs in DL frameworks, research exploring common API patterns across frameworks and the potential risks they entail remains limited. Notably, many DL frameworks expose similar APIs with overlapping input parameters and functionalities, rendering them vulnerable to shared bugs, where a flaw in one API may extend to analogous APIs in other frameworks. To address this challenge, we propose MirrorFuzz, an automated API fuzzing solution to discover shared bugs in DL frameworks. MirrorFuzz operates in three stages: First, MirrorFuzz collects historical bug data for each API within a DL framework to identify potentially buggy APIs. Second, it matches each buggy API in a specific framework with similar APIs within and across other DL frameworks. Third, it employs large language models (LLMs) to synthesize code for the API under test, leveraging the historical bug data of similar APIs to trigger analogous bugs across APIs. We implement MirrorFuzz and evaluate it on four popular DL frameworks (TensorFlow, PyTorch, OneFlow, and Jittor). Extensive evaluation demonstrates that MirrorFuzz improves code coverage by 39.92% and 98.20% compared to state-of-the-art methods on TensorFlow and PyTorch, respectively. Moreover, MirrorFuzz discovers 315 bugs, 262 of which are newly found, and 80 bugs are fixed, with 52 of these bugs assigned CNVD IDs.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 1","pages":"360-375"},"PeriodicalIF":5.6,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11201027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145260849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Condor: A Code Discriminator Integrating General Semantics with Code Details 秃鹰:一个集成了一般语义和代码细节的代码鉴别器
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-10 DOI: 10.1109/tse.2025.3620145
Qingyuan Liang, Zhao Zhang, Chen Liu, Zeyu Sun, Wenjie Zhang, Yizhou Chen, Zixiao Zhao, Qi Luo, Wentao Wang, Yanjie Jiang, Yingfei Xiong, Lu Zhang
{"title":"Condor: A Code Discriminator Integrating General Semantics with Code Details","authors":"Qingyuan Liang, Zhao Zhang, Chen Liu, Zeyu Sun, Wenjie Zhang, Yizhou Chen, Zixiao Zhao, Qi Luo, Wentao Wang, Yanjie Jiang, Yingfei Xiong, Lu Zhang","doi":"10.1109/tse.2025.3620145","DOIUrl":"https://doi.org/10.1109/tse.2025.3620145","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"10 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145260848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficiently Testing Distributed Systems via Abstract State Space Prioritization 基于抽象状态空间优先级的分布式系统高效测试
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-09 DOI: 10.1109/TSE.2025.3618976
Yu Gao;Dong Wang;Wensheng Dou;Wenhan Feng;Yu Liang;Jun Wei
The last five years have seen a rise of model checking guided testing (MCGT) approaches for systematically testing distributed systems. MCGT approaches generate test cases for distributed systems by traversing their verified abstract state spaces, simultaneously solving the three key problems faced in testing distributed systems, i.e., test input generation, test oracle construction and execution space enumeration. However, existing MCGT approaches struggle with traversing the huge state space of distributed systems, which can contain billions of system states. This makes the process of finding bugs time-consuming and expensive, often taking several weeks. In this paper, we propose Mosso to speed up model checking guided testing for distributed systems. We observe that there exist lots of redundant test scenarios in the abstract state space of distributed systems. Considering the characteristics of these redundant test scenarios, we propose three strategies: action independence, node symmetry and scenario equivalence, to identify and prioritize unique test scenarios when traversing the state space. We have applied Mosso on three real-world distributed systems. By employing the three strategies, our approach has achieved an average speedup of 56X (up to 208X) compared to the state-of-art MCGT approach. Additionally, our approach has successfully uncovered 2 previously-unknown bugs.
在过去的五年中,用于系统测试分布式系统的模型检查引导测试(MCGT)方法出现了增长。MCGT方法通过遍历经过验证的抽象状态空间,为分布式系统生成测试用例,同时解决了测试分布式系统面临的三个关键问题,即测试输入生成、测试oracle构建和执行空间枚举。然而,现有的MCGT方法难以遍历分布式系统的巨大状态空间,这可能包含数十亿个系统状态。这使得查找bug的过程既耗时又昂贵,通常需要几个星期的时间。在本文中,我们提出了Mosso来加速分布式系统的模型检查引导测试。研究发现,分布式系统的抽象状态空间中存在大量冗余的测试场景。针对这些冗余测试场景的特点,我们提出了动作独立、节点对称和场景等价三种策略,在遍历状态空间时识别唯一测试场景并对其进行优先级排序。我们已经在三个真实的分布式系统上应用了Mosso。通过采用这三种策略,与最先进的MCGT方法相比,我们的方法实现了56X(最高208X)的平均加速。此外,我们的方法还成功地发现了2个以前未知的bug。
{"title":"Efficiently Testing Distributed Systems via Abstract State Space Prioritization","authors":"Yu Gao;Dong Wang;Wensheng Dou;Wenhan Feng;Yu Liang;Jun Wei","doi":"10.1109/TSE.2025.3618976","DOIUrl":"10.1109/TSE.2025.3618976","url":null,"abstract":"The last five years have seen a rise of model checking guided testing (MCGT) approaches for systematically testing distributed systems. MCGT approaches generate test cases for distributed systems by traversing their verified abstract state spaces, simultaneously solving the three key problems faced in testing distributed systems, i.e., test input generation, test oracle construction and execution space enumeration. However, existing MCGT approaches struggle with traversing the huge state space of distributed systems, which can contain billions of system states. This makes the process of finding bugs time-consuming and expensive, often taking several weeks. In this paper, we propose <monospace>Mosso</monospace> to speed up model checking guided testing for distributed systems. We observe that there exist lots of redundant test scenarios in the abstract state space of distributed systems. Considering the characteristics of these redundant test scenarios, we propose three strategies: action independence, node symmetry and scenario equivalence, to identify and prioritize unique test scenarios when traversing the state space. We have applied <monospace>Mosso</monospace> on three real-world distributed systems. By employing the three strategies, our approach has achieved an average speedup of 56X (up to 208X) compared to the state-of-art MCGT approach. Additionally, our approach has successfully uncovered 2 previously-unknown bugs.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"395-410"},"PeriodicalIF":5.6,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1