首页 > 最新文献

IEEE Transactions on Software Engineering最新文献

英文 中文
Weighted Community Division for Automated Software Architecture Refactoring 自动化软件架构重构的加权社区划分
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-29 DOI: 10.1109/tse.2025.3648996
Sirong Zhao, Jialing Yang, Jiao Xie, Kaiwei Fan, Jianmei Lei, Guoqi Xie
{"title":"Weighted Community Division for Automated Software Architecture Refactoring","authors":"Sirong Zhao, Jialing Yang, Jiao Xie, Kaiwei Fan, Jianmei Lei, Guoqi Xie","doi":"10.1109/tse.2025.3648996","DOIUrl":"https://doi.org/10.1109/tse.2025.3648996","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"387 1","pages":"1-13"},"PeriodicalIF":7.4,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding Docker Refactorings: Expanded Taxonomy, Operational Trade-offs, and Role-Aware Recommendations 理解Docker重构:扩展分类法、操作权衡和角色感知建议
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-29 DOI: 10.1109/tse.2025.3649731
Emna Ksontini, Thiago Ferreira, Rania Khalsi, Wael Kessentini
{"title":"Understanding Docker Refactorings: Expanded Taxonomy, Operational Trade-offs, and Role-Aware Recommendations","authors":"Emna Ksontini, Thiago Ferreira, Rania Khalsi, Wael Kessentini","doi":"10.1109/tse.2025.3649731","DOIUrl":"https://doi.org/10.1109/tse.2025.3649731","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"8 1","pages":"1-23"},"PeriodicalIF":7.4,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PCART: Automated Repair of Python API Parameter Compatibility Issues 自动修复Python API参数兼容性问题
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-22 DOI: 10.1109/tse.2025.3646150
Shuai Zhang, Guanping Xiao, Jun Wang, Huashan Lei, Gangqiang He, Yepang Liu, Zheng Zheng
{"title":"PCART: Automated Repair of Python API Parameter Compatibility Issues","authors":"Shuai Zhang, Guanping Xiao, Jun Wang, Huashan Lei, Gangqiang He, Yepang Liu, Zheng Zheng","doi":"10.1109/tse.2025.3646150","DOIUrl":"https://doi.org/10.1109/tse.2025.3646150","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"22 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145807714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation RepoTransBench:用于存储库级代码翻译的真实多语言基准测试
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-17 DOI: 10.1109/TSE.2025.3645056
Yanli Wang;Yanlin Wang;Suiquan Wang;Daya Guo;Jiachi Chen;John Grundy;Xilin Liu;Yuchi Ma;Mingzhi Mao;Hongyu Zhang;Zibin Zheng
Repository-level code translation refers to translating an entire code repository from one programming language to another while preserving the functionality of the source repository. Many benchmarks have been proposed to evaluate the performance of such code translators. However, previous benchmarks mostly provide fine-grained samples, focusing at either code snippet, function, or file-level code translation. Such benchmarks do not accurately reflect real-world demands, where entire repositories often need to be translated, involving longer code length and more complex functionalities. To address this gap, we propose a new benchmark, named RepoTransBench, which is a real-world multilingual repository-level code translation benchmark featuring 1,897 real-world repository samples across 13 language pairs with automatically executable test suites. Besides, we introduce RepoTransAgent, a general agent framework to perform repository-level code translation. We evaluate both our benchmark’s challenges and agent’s effectiveness using several methods and backbone LLMs, revealing that repository-level translation remains challenging, where the best-performing method achieves only a 32.8% success rate. Furthermore, our analysis reveals that translation difficulty varies significantly by language pair direction, with dynamic-to-static language translation being much more challenging than the reverse direction (achieving below 10% vs. static-to-dynamic at 45-63%). Finally, we conduct a detailed error analysis and highlight current LLMs’ deficiencies in repository-level code translation, which could provide a reference for further improvements. We provide the code and data at https://github.com/DeepSoftwareAnalytics/RepoTransBench.
存储库级别的代码转换是指将整个代码存储库从一种编程语言转换为另一种编程语言,同时保留源存储库的功能。已经提出了许多基准测试来评估此类代码翻译器的性能。但是,以前的基准测试大多提供细粒度的示例,重点放在代码段、函数或文件级代码转换上。这样的基准测试不能准确地反映现实世界的需求,因为通常需要翻译整个存储库,涉及更长的代码长度和更复杂的功能。为了解决这个差距,我们提出了一个新的基准测试,名为RepoTransBench,它是一个真实世界的多语言存储库级别的代码翻译基准测试,具有1897个真实世界的存储库样本,跨越13种语言对,具有自动执行的测试套件。此外,我们还介绍了RepoTransAgent,这是一个通用的代理框架,用于执行库级代码转换。我们使用几种方法和主干llm评估了基准测试的挑战和代理的有效性,发现存储库级别的翻译仍然具有挑战性,其中表现最好的方法仅达到32.8%的成功率。此外,我们的分析表明,翻译难度因语言对的方向而异,动态到静态的语言翻译比相反的方向更具挑战性(实现低于10%,而静态到动态的翻译为45-63%)。最后,我们进行了详细的错误分析,并指出了当前llm在库级代码翻译方面的不足,为进一步改进提供参考。我们在https://github.com/DeepSoftwareAnalytics/RepoTransBench上提供代码和数据。
{"title":"RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation","authors":"Yanli Wang;Yanlin Wang;Suiquan Wang;Daya Guo;Jiachi Chen;John Grundy;Xilin Liu;Yuchi Ma;Mingzhi Mao;Hongyu Zhang;Zibin Zheng","doi":"10.1109/TSE.2025.3645056","DOIUrl":"10.1109/TSE.2025.3645056","url":null,"abstract":"Repository-level code translation refers to translating an entire code repository from one programming language to another while preserving the functionality of the source repository. Many benchmarks have been proposed to evaluate the performance of such code translators. However, previous benchmarks mostly provide fine-grained samples, focusing at either code snippet, function, or file-level code translation. Such benchmarks do not accurately reflect real-world demands, where entire repositories often need to be translated, involving longer code length and more complex functionalities. To address this gap, we propose a new benchmark, named <bold>RepoTransBench</b>, which is a real-world multilingual repository-level code translation benchmark featuring 1,897 real-world repository samples across 13 language pairs with automatically executable test suites. Besides, we introduce <bold>RepoTransAgent</b>, a general agent framework to perform repository-level code translation. We evaluate both our benchmark’s challenges and agent’s effectiveness using several methods and backbone LLMs, revealing that repository-level translation remains challenging, where the best-performing method achieves only a 32.8% success rate. Furthermore, our analysis reveals that translation difficulty varies significantly by language pair direction, with dynamic-to-static language translation being much more challenging than the reverse direction (achieving below 10% vs. static-to-dynamic at 45-63%). Finally, we conduct a detailed error analysis and highlight current LLMs’ deficiencies in repository-level code translation, which could provide a reference for further improvements. We provide the code and data at <uri>https://github.com/DeepSoftwareAnalytics/RepoTransBench</uri>.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"675-690"},"PeriodicalIF":5.6,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CARE: Context Aware Root Cause Identification Using Distributed Traces and Profiling Metrics 使用分布式跟踪和分析指标进行上下文感知的根本原因识别
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-17 DOI: 10.1109/tse.2025.3645143
Mahsa Panahandeh, Naser Ezzati Jivan, Abdelwahab Hamou-Lhadj, James Miller
{"title":"CARE: Context Aware Root Cause Identification Using Distributed Traces and Profiling Metrics","authors":"Mahsa Panahandeh, Naser Ezzati Jivan, Abdelwahab Hamou-Lhadj, James Miller","doi":"10.1109/tse.2025.3645143","DOIUrl":"https://doi.org/10.1109/tse.2025.3645143","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality 软件工程中的基准人工智能模型:提高基准质量的回顾、搜索工具和统一方法
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-16 DOI: 10.1109/TSE.2025.3644183
Roham Koohestani;Philippe de Bekker;Begüm Koç;Maliheh Izadi
Benchmarks are essential for unified evaluation and reproducibility. The rapid rise of Artificial Intelligence for Software Engineering (AI4SE) has produced numerous benchmarks for tasks such as code generation and bug repair. However, this proliferation has led to major challenges: (1) fragmented knowledge across tasks, (2) difficulty in selecting contextually relevant benchmarks, (3) lack of standardization in benchmark creation, and (4) flaws that limit utility. Addressing these requires a dual approach: systematically mapping existing benchmarks for informed selection and defining unified guidelines for robust, adaptable benchmark development. We conduct a review of 247 studies, identifying 273 AI4SE benchmarks since 2014. We categorize them, analyze limitations, and expose gaps in current practices. Building on these insights, we introduce BenchScout, an extensible semantic search tool for locating suitable benchmarks. BenchScout employs automated clustering with contextual embeddings of benchmark-related studies, followed by dimensionality reduction. In a user study with 22 participants, BenchScout achieved usability, effectiveness, and intuitiveness scores of 4.5, 4.0, and 4.1 out of 5. To improve benchmarking standards, we propose BenchFrame, a unified approach to improve benchmark quality. Applying BenchFrame to HumanEval yielded HumanEvalNext, which features corrected errors, improved language conversion, higher test coverage, and greater difficulty. Evaluating 10 state-of-the-art code models on HumanEval, HumanEvalPlus, and HumanEvalNext revealed average pass-at-1 drops of 31.22% and 19.94%, respectively, underscoring the need for continuous benchmark refinement. We further examine BenchFrame’s scalability through an agentic pipeline and confirm its generalizability on the MBPP dataset. Lastly, we publicly release the material of our review, user study, and the enhanced benchmark.1

https://github.com/AISE-TUDelft/AI4SE-benchmarks

基准对于统一评估和再现性至关重要。软件工程人工智能(AI4SE)的迅速崛起为代码生成和错误修复等任务提供了许多基准。然而,这种扩散导致了主要的挑战:(1)跨任务的碎片化知识,(2)选择上下文相关基准的困难,(3)基准创建缺乏标准化,以及(4)限制效用的缺陷。解决这些问题需要双重方法:系统地绘制现有基准以进行知情选择,并为稳健、适应性强的基准开发定义统一的指导方针。我们对247项研究进行了回顾,确定了自2014年以来的273个AI4SE基准。我们对它们进行分类,分析局限性,并揭示当前实践中的差距。在这些见解的基础上,我们介绍了BenchScout,这是一个可扩展的语义搜索工具,用于定位合适的基准。BenchScout采用自动聚类与基准相关研究的上下文嵌入,其次是降维。在一项有22名参与者的用户研究中,BenchScout的可用性、有效性和直观性得分分别为4.5、4.0和4.1(满分为5分)。为了提高基准标准,我们提出了提高基准质量的统一方法——基准框架。将BenchFrame应用于HumanEval产生了HumanEvalNext,它具有纠正错误、改进语言转换、更高的测试覆盖率和更大的难度的特点。在HumanEval、HumanEvalPlus和HumanEvalNext上评估10个最先进的代码模型,分别显示平均通过率为31.22%和19.94%,强调了对持续基准改进的需要。我们通过一个代理管道进一步检验了BenchFrame的可扩展性,并确认了它在MBPP数据集上的通用性。最后,我们公开发布我们的审查、用户研究和增强基准的材料。11https://github.com/AISE-TUDelft/AI4SE-benchmarks
{"title":"Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality","authors":"Roham Koohestani;Philippe de Bekker;Begüm Koç;Maliheh Izadi","doi":"10.1109/TSE.2025.3644183","DOIUrl":"10.1109/TSE.2025.3644183","url":null,"abstract":"Benchmarks are essential for unified evaluation and reproducibility. The rapid rise of Artificial Intelligence for Software Engineering (AI4SE) has produced numerous benchmarks for tasks such as code generation and bug repair. However, this proliferation has led to major challenges: (1) fragmented knowledge across tasks, (2) difficulty in selecting contextually relevant benchmarks, (3) lack of standardization in benchmark creation, and (4) flaws that limit utility. Addressing these requires a dual approach: systematically mapping existing benchmarks for informed selection and defining unified guidelines for robust, adaptable benchmark development. We conduct a review of 247 studies, identifying 273 AI4SE benchmarks since 2014. We categorize them, analyze limitations, and expose gaps in current practices. Building on these insights, we introduce BenchScout, an extensible semantic search tool for locating suitable benchmarks. BenchScout employs automated clustering with contextual embeddings of benchmark-related studies, followed by dimensionality reduction. In a user study with 22 participants, BenchScout achieved usability, effectiveness, and intuitiveness scores of 4.5, 4.0, and 4.1 out of 5. To improve benchmarking standards, we propose BenchFrame, a unified approach to improve benchmark quality. Applying BenchFrame to HumanEval yielded HumanEvalNext, which features corrected errors, improved language conversion, higher test coverage, and greater difficulty. Evaluating 10 state-of-the-art code models on HumanEval, HumanEvalPlus, and HumanEvalNext revealed average pass-at-1 drops of 31.22% and 19.94%, respectively, underscoring the need for continuous benchmark refinement. We further examine BenchFrame’s scalability through an agentic pipeline and confirm its generalizability on the MBPP dataset. Lastly, we publicly release the material of our review, user study, and the enhanced benchmark.<xref><sup>1</sup></xref><fn><label><sup>1</sup></label><p><uri>https://github.com/AISE-TUDelft/AI4SE-benchmarks</uri></p></fn>","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"651-674"},"PeriodicalIF":5.6,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation AdaCoder:用于功能级代码生成的自适应规划和多代理框架
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-12 DOI: 10.1109/TSE.2025.3642621
Yueheng Zhu;Chao Liu;Xuan He;Xiaoxue Ren;Zhongxin Liu;Ruwei Pan;Hongyu Zhang
Recently, researchers have proposed many multi-agent frameworks for function-level code generation, which aim to improve software development productivity by automatically generating function-level source code based on task descriptions. A typical multi-agent framework consists of Large Language Model (LLM)-based agents that are responsible for task planning, code generation, testing, debugging, etc. Studies have shown that existing multi-agent code generation frameworks perform well on ChatGPT. However, their generalizability across other foundation LLMs remains unexplored systematically. In this paper, we report an empirical study on the generalizability of four state-of-the-art multi-agent code generation frameworks across 12 open-source LLMs with varying code generation and instruction-following capabilities. Our study reveals the unstable generalizability of existing frameworks on diverse foundation LLMs. Based on the findings obtained from the empirical study, we propose AdaCoder, a novel adaptive planning, multi-agent framework for function-level code generation. AdaCoder has two phases. Phase-1 is an initial code generation step without planning, which uses an LLM-based coding agent and a script-based testing agent to unleash LLM’s native power, identify cases beyond LLM’s power, and determine the errors hindering execution. Phase-2 adds a rule-based debugging agent and an LLM-based planning agent for iterative code generation with planning. Our evaluation shows that AdaCoder achieves higher generalizability on diverse LLMs. Compared to the best baseline MapCoder, AdaCoder is on average 27.69% higher in Pass@1, 16 times faster in inference, and 12 times lower in token consumption.
近年来,研究人员提出了许多用于功能级代码生成的多智能体框架,其目的是通过基于任务描述自动生成功能级源代码来提高软件开发效率。典型的多代理框架由基于大语言模型(LLM)的代理组成,这些代理负责任务规划、代码生成、测试、调试等。研究表明,现有的多智能体代码生成框架在ChatGPT上表现良好。然而,它们在其他基金会法学硕士中的普遍性仍未得到系统的探索。在本文中,我们报告了一项关于四个最先进的多代理代码生成框架的通用性的实证研究,这些框架跨越12个具有不同代码生成和指令遵循能力的开源llm。我们的研究揭示了现有框架在不同基础法学硕士上的不稳定泛化性。基于实证研究的结果,我们提出了一种新的自适应规划多智能体框架AdaCoder,用于功能级代码生成。AdaCoder有两个阶段。阶段1是没有计划的初始代码生成步骤,它使用基于LLM的编码代理和基于脚本的测试代理来释放LLM的原生功能,识别超出LLM功能的情况,并确定阻碍执行的错误。阶段2添加了一个基于规则的调试代理和一个基于llm的规划代理,用于迭代代码生成和规划。我们的评估表明,AdaCoder在不同的llm上具有更高的泛化性。与最佳基准MapCoder相比,AdaCoder在Pass@1上的平均速度提高了27.69%,推理速度提高了16倍,令牌消耗降低了12倍。
{"title":"AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation","authors":"Yueheng Zhu;Chao Liu;Xuan He;Xiaoxue Ren;Zhongxin Liu;Ruwei Pan;Hongyu Zhang","doi":"10.1109/TSE.2025.3642621","DOIUrl":"10.1109/TSE.2025.3642621","url":null,"abstract":"Recently, researchers have proposed many multi-agent frameworks for function-level code generation, which aim to improve software development productivity by automatically generating function-level source code based on task descriptions. A typical multi-agent framework consists of Large Language Model (LLM)-based agents that are responsible for task planning, code generation, testing, debugging, etc. Studies have shown that existing multi-agent code generation frameworks perform well on ChatGPT. However, their generalizability across other foundation LLMs remains unexplored systematically. In this paper, we report an empirical study on the generalizability of four state-of-the-art multi-agent code generation frameworks across 12 open-source LLMs with varying code generation and instruction-following capabilities. Our study reveals the unstable generalizability of existing frameworks on diverse foundation LLMs. Based on the findings obtained from the empirical study, we propose AdaCoder, a novel adaptive planning, multi-agent framework for function-level code generation. AdaCoder has two phases. Phase-1 is an initial code generation step without planning, which uses an LLM-based coding agent and a script-based testing agent to unleash LLM’s native power, identify cases beyond LLM’s power, and determine the errors hindering execution. Phase-2 adds a rule-based debugging agent and an LLM-based planning agent for iterative code generation with planning. Our evaluation shows that AdaCoder achieves higher generalizability on diverse LLMs. Compared to the best baseline MapCoder, AdaCoder is on average 27.69% higher in Pass@1, 16 times faster in inference, and 12 times lower in token consumption.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"631-650"},"PeriodicalIF":5.6,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145731225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
State of the Journal 期刊现状
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-11 DOI: 10.1109/tse.2025.3639694
Sebastian Uchitel
{"title":"State of the Journal","authors":"Sebastian Uchitel","doi":"10.1109/tse.2025.3639694","DOIUrl":"https://doi.org/10.1109/tse.2025.3639694","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"13 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145728829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
C2SaferRust: Transforming C Projects Into Safer Rust With NeuroSymbolic Techniques 使用神经符号技术将C项目转化为更安全的Rust
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-09 DOI: 10.1109/TSE.2025.3641486
Vikram Nitin;Rahul Krishna;Luiz Lemos do Valle;Baishakhi Ray
In recent years, there has been a lot of interest in converting C code to Rust, to benefit from the memory and thread safety guarantees of Rust. C2Rust is a rule-based system that can automatically convert C code to functionally identical Rust, but the Rust code that it produces is non-idiomatic, i.e., makes extensive use of unsafe Rust, a subset of the language that doesn’t have memory or thread safety guarantees. At the other end of the spectrum are LLMs, which produce idiomatic Rust code, but these have the potential to make mistakes and are constrained in the length of code they can process. In this paper, we present C2SaferRust, a novel approach to translate C to Rust that combines the strengths of C2Rust and LLMs. We first use C2Rust to convert C code to non-idiomatic, unsafe Rust. We then decompose the unsafe Rust code into slices that can be individually translated to safer Rust by an LLM. After processing each slice, we run end-to-end test cases to verify that the code still functions as expected. We also contribute a benchmark of 7 real-world programs, translated from C to unsafe Rust using C2Rust. Each of these programs also comes with end-to-end test cases. On this benchmark, we are able to reduce the number of raw pointers by up to 38%, and reduce the amount of unsafe code by up to 28%, indicating an increase in safety. The resulting programs still pass all test cases. C2SaferRust also shows convincing gains in performance against two previous techniques for making Rust code safer.
近年来,人们对将C代码转换为Rust非常感兴趣,以便从Rust的内存和线程安全保证中获益。C2Rust是一个基于规则的系统,它可以自动将C代码转换为功能相同的Rust,但它生成的Rust代码是非惯用的,也就是说,它大量使用了不安全的Rust,这是该语言的一个子集,没有内存或线程安全保证。在这个范围的另一端是llm,它产生习惯的Rust代码,但是它们有可能犯错误,并且它们可以处理的代码长度受到限制。在本文中,我们提出了C2SaferRust,这是一种将C语言转换为Rust的新方法,它结合了C2Rust和llm的优势。我们首先使用C2Rust将C代码转换为非惯用的、不安全的Rust。然后,我们将不安全的Rust代码分解为可以由LLM单独转换为更安全的Rust的片段。在处理完每个片段之后,我们运行端到端测试用例来验证代码是否仍按预期运行。我们还提供了7个真实世界程序的基准测试,使用C2Rust从C语言转换为不安全的Rust。这些程序中的每一个都附带端到端测试用例。在这个基准测试中,我们能够将原始指针的数量减少多达38%,并将不安全代码的数量减少多达28%,这表明安全性有所提高。生成的程序仍然可以通过所有的测试用例。C2SaferRust还显示了与前两种使Rust代码更安全的技术相比在性能上的令人信服的提高。
{"title":"C2SaferRust: Transforming C Projects Into Safer Rust With NeuroSymbolic Techniques","authors":"Vikram Nitin;Rahul Krishna;Luiz Lemos do Valle;Baishakhi Ray","doi":"10.1109/TSE.2025.3641486","DOIUrl":"10.1109/TSE.2025.3641486","url":null,"abstract":"In recent years, there has been a lot of interest in converting C code to Rust, to benefit from the memory and thread safety guarantees of Rust. C2Rust is a rule-based system that can automatically convert C code to functionally identical Rust, but the Rust code that it produces is non-idiomatic, i.e., makes extensive use of unsafe Rust, a subset of the language that <italic>doesn’t</i> have memory or thread safety guarantees. At the other end of the spectrum are LLMs, which produce idiomatic Rust code, but these have the potential to make mistakes and are constrained in the length of code they can process. In this paper, we present <sc>C2SaferRust</small>, a novel approach to translate C to Rust that combines the strengths of C2Rust and LLMs. We first use C2Rust to convert C code to non-idiomatic, unsafe Rust. We then decompose the unsafe Rust code into slices that can be individually translated to safer Rust by an LLM. After processing each slice, we run end-to-end test cases to verify that the code still functions as expected. We also contribute a benchmark of 7 real-world programs, translated from C to unsafe Rust using C2Rust. Each of these programs also comes with end-to-end test cases. On this benchmark, we are able to reduce the number of raw pointers by up to 38%, and reduce the amount of unsafe code by up to 28%, indicating an increase in safety. The resulting programs still pass all test cases. <sc>C2SaferRust</small> also shows convincing gains in performance against two previous techniques for making Rust code safer.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"618-630"},"PeriodicalIF":5.6,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145717968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reaching Software Quality for Bioinformatics Applications: How Far Are We? 达到生物信息学应用的软件质量:我们走了多远?
IF 5.6 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-08 DOI: 10.1109/TSE.2025.3641225
Xiaoyan Zhu;Tianxiang Xu;Xin Lai;Xin Lian;Hangyu Cheng;Jiayin Wang
With the rapid advancements in medicine, biology, and information technology, their deep integration has given rise to the emerging field of bioinformatics. In this process, high-throughput technologies such as genomics, transcriptomics, and proteomics have generated massive volumes of biological data. The biological significance of these data heavily relies on bioinformatics software for analysis and processing. Therefore, it is crucial for both scientific research and clinical applications to ensure the quality of bioinformatics software and avoiding errors or hidden defects. However, to date, no dedicated study has systematically analyzed the quality of bioinformatics software. We conduct a comprehensive empirical study that aggregates, synthesizes, and analyzes findings from 167 bioinformatics software projects. Following the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) protocol, we extract and evaluate quality-related data to answer our research questions (RQs). Our analysis reveals several key findings. The quality of bioinformatics software requires significant improvement, with an average defect density approximately 11.8× higher than that of general-purpose software. Additionally, unlike traditional software domains, a considerable proportion of defects in bioinformatics software are related to annotations. These issues can lead developers to overlook potential security vulnerabilities or make incorrect fixes, thereby increasing the cost and complexity of subsequent code maintenance. Based on these findings, we further discuss the challenges faced by bioinformatics software and propose potential solutions. This paper lays a foundation for further research on software quality in the bioinformatics domain and offers actionable insights for researchers and practitioners alike.
随着医学、生物学和信息技术的快速发展,它们的深度融合催生了生物信息学这一新兴领域。在这一过程中,基因组学、转录组学和蛋白质组学等高通量技术产生了大量的生物数据。这些数据的生物学意义在很大程度上依赖于生物信息学软件的分析和处理。因此,确保生物信息学软件的质量,避免错误或隐藏缺陷,对科学研究和临床应用都至关重要。然而,到目前为止,还没有专门的研究系统地分析了生物信息学软件的质量。我们进行了一项全面的实证研究,汇总、综合和分析了167个生物信息学软件项目的发现。根据系统评价和荟萃分析的首选报告项目(PRISMA)协议,我们提取和评估与质量相关的数据来回答我们的研究问题(RQs)。我们的分析揭示了几个关键发现。生物信息学软件质量有待显著提高,其平均缺陷密度约为通用软件的11.8倍。此外,与传统的软件领域不同,生物信息学软件中相当大比例的缺陷与注释有关。这些问题可能导致开发人员忽略潜在的安全漏洞或进行不正确的修复,从而增加后续代码维护的成本和复杂性。基于这些发现,我们进一步讨论了生物信息学软件面临的挑战,并提出了可能的解决方案。本文为进一步研究生物信息学领域的软件质量奠定了基础,并为研究人员和从业人员提供了可操作的见解。
{"title":"Reaching Software Quality for Bioinformatics Applications: How Far Are We?","authors":"Xiaoyan Zhu;Tianxiang Xu;Xin Lai;Xin Lian;Hangyu Cheng;Jiayin Wang","doi":"10.1109/TSE.2025.3641225","DOIUrl":"10.1109/TSE.2025.3641225","url":null,"abstract":"With the rapid advancements in medicine, biology, and information technology, their deep integration has given rise to the emerging field of bioinformatics. In this process, high-throughput technologies such as genomics, transcriptomics, and proteomics have generated massive volumes of biological data. The biological significance of these data heavily relies on bioinformatics software for analysis and processing. Therefore, it is crucial for both scientific research and clinical applications to ensure the quality of bioinformatics software and avoiding errors or hidden defects. However, to date, no dedicated study has systematically analyzed the quality of bioinformatics software. We conduct a comprehensive empirical study that aggregates, synthesizes, and analyzes findings from 167 bioinformatics software projects. Following the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) protocol, we extract and evaluate quality-related data to answer our research questions (RQs). Our analysis reveals several key findings. The quality of bioinformatics software requires significant improvement, with an average defect density approximately 11.8× higher than that of general-purpose software. Additionally, unlike traditional software domains, a considerable proportion of defects in bioinformatics software are related to annotations. These issues can lead developers to overlook potential security vulnerabilities or make incorrect fixes, thereby increasing the cost and complexity of subsequent code maintenance. Based on these findings, we further discuss the challenges faced by bioinformatics software and propose potential solutions. This paper lays a foundation for further research on software quality in the bioinformatics domain and offers actionable insights for researchers and practitioners alike.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"52 2","pages":"595-617"},"PeriodicalIF":5.6,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145704004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1