首页 > 最新文献

Journal of Systems and Software最新文献

英文 中文
Exploring quality aspects of customer self-service in IT service provision: A case study 探索IT服务提供中客户自助服务的质量方面:一个案例研究
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-12 DOI: 10.1016/j.jss.2025.112725
Marko Jäntti , Henri Lindström
Customer self-service (CSS) plays a critical role in IT service provider organizations’ service business. Self-service can remarkably reduce the volume of IT support tickets, such as service requests and incidents, and provides customers and service users with 24/7 availability to access IT services. Self-service channels can also be used to offer solutions to service-related problems and answers to common issues. Modern customer self-service is implemented in practice by aggregating various self-service technologies and service management practices together. While technologies evolve, this also affects the concept of quality in the context of customer self-service. The purpose of the research is to explore quality aspects related to modern IT customer support, especially a customer self-service portal (SSP). The research problem of the study is: How quality of customer self-service can be improved in the context of IT service provision? This paper presents a unique case study addressing the quality of modern self-service technologies. The cross-case synthesis from four IT self-service portal deployment cases was used to derive a novel multi-dimensional self-service quality model with three dimensions: technology, management and organization.
客户自助服务(CSS)在IT服务提供商组织的服务业务中起着至关重要的作用。自助服务可以显著减少IT支持票据的数量,例如服务请求和事件,并为客户和服务用户提供24/7可用性来访问IT服务。自助服务渠道还可用于提供与服务相关的问题的解决方案和常见问题的答案。现代客户自助服务是将各种自助服务技术和服务管理实践结合在一起实现的。随着技术的发展,这也影响了客户自助服务环境中的质量概念。本研究的目的是探讨与现代IT客户支持相关的质量方面,特别是客户自助服务门户(SSP)。本研究的研究问题是:在IT服务提供的背景下,如何提高客户自助服务的质量?本文提出了一个独特的案例研究解决现代自助服务技术的质量。利用来自四个IT自助服务门户部署案例的跨案例综合,推导出具有技术、管理和组织三个维度的新型多维自助服务质量模型。
{"title":"Exploring quality aspects of customer self-service in IT service provision: A case study","authors":"Marko Jäntti ,&nbsp;Henri Lindström","doi":"10.1016/j.jss.2025.112725","DOIUrl":"10.1016/j.jss.2025.112725","url":null,"abstract":"<div><div>Customer self-service (CSS) plays a critical role in IT service provider organizations’ service business. Self-service can remarkably reduce the volume of IT support tickets, such as service requests and incidents, and provides customers and service users with 24/7 availability to access IT services. Self-service channels can also be used to offer solutions to service-related problems and answers to common issues. Modern customer self-service is implemented in practice by aggregating various self-service technologies and service management practices together. While technologies evolve, this also affects the concept of quality in the context of customer self-service. The purpose of the research is to explore quality aspects related to modern IT customer support, especially a customer self-service portal (SSP). The research problem of the study is: How quality of customer self-service can be improved in the context of IT service provision? This paper presents a unique case study addressing the quality of modern self-service technologies. The cross-case synthesis from four IT self-service portal deployment cases was used to derive a novel multi-dimensional self-service quality model with three dimensions: technology, management and organization.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112725"},"PeriodicalIF":4.1,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An architecture framework for architecting IoT applications: From design to deployment 构建物联网应用的架构框架:从设计到部署
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-10 DOI: 10.1016/j.jss.2025.112728
Moamin Abughazala , Mohammad Sharaf , Mai Abusair , Henry Muccini
Context - The Internet of Things (IoT) refers to a distributed network of smart, connected devices that collaboratively sense, process, and act upon real-world environments. Designing such systems requires managing complex architectural concerns spanning software logic, hardware configuration, and spatial deployment, as well as validating non-functional properties like energy consumption and communication efficiency. Objective - To provide a unified, architecture-centric framework that supports the description, simulation, and automated code generation of IoT applications across software, hardware, and physical space dimensions. Method - We use Model Driven Engineering(MDE) approaches to develop CAPS, a framework that uniquely integrates multi-view architectural modeling, energy- and traffic-aware simulation via CupCarbon, and seamless generation of deployable Arduino code from high-level design models. Result - CAPS enables a traceable and cohesive development process from architectural design to physical deployment. Case studies from diverse domains demonstrate its ability to improve modeling expressiveness, maintain transformation fidelity, and reduce development time through automation. Conclusion - CAPS unifies architectural modeling, simulation, and code generation into a novel, end-to-end toolchain, addressing fragmentation in the IoT development lifecycle and enhancing early validation and traceability.
背景—物联网(IoT)指的是一个由智能连接设备组成的分布式网络,这些设备可以协同感知、处理和处理现实环境。设计这样的系统需要管理复杂的架构关注点,包括软件逻辑、硬件配置和空间部署,以及验证非功能属性,如能耗和通信效率。目标-提供一个统一的、以架构为中心的框架,支持跨软件、硬件和物理空间维度的物联网应用的描述、模拟和自动代码生成。方法-我们使用模型驱动工程(MDE)方法来开发CAPS,这是一个独特的框架,通过CupCarbon集成了多视图架构建模,能源和交通感知仿真,以及从高级设计模型无缝生成可部署的Arduino代码。结果——CAPS支持从架构设计到物理部署的可跟踪和内聚的开发过程。来自不同领域的案例研究证明了它提高建模表达能力、维护转换保真度以及通过自动化减少开发时间的能力。CAPS将架构建模、仿真和代码生成统一到一个新颖的端到端工具链中,解决了物联网开发生命周期中的碎片化问题,并增强了早期验证和可追溯性。
{"title":"An architecture framework for architecting IoT applications: From design to deployment","authors":"Moamin Abughazala ,&nbsp;Mohammad Sharaf ,&nbsp;Mai Abusair ,&nbsp;Henry Muccini","doi":"10.1016/j.jss.2025.112728","DOIUrl":"10.1016/j.jss.2025.112728","url":null,"abstract":"<div><div><strong>Context</strong> - The Internet of Things (IoT) refers to a distributed network of smart, connected devices that collaboratively sense, process, and act upon real-world environments. Designing such systems requires managing complex architectural concerns spanning software logic, hardware configuration, and spatial deployment, as well as validating non-functional properties like energy consumption and communication efficiency. <strong>Objective</strong> - To provide a unified, architecture-centric framework that supports the description, simulation, and automated code generation of IoT applications across software, hardware, and physical space dimensions. <strong>Method</strong> - We use Model Driven Engineering(MDE) approaches to develop CAPS, a framework that uniquely integrates multi-view architectural modeling, energy- and traffic-aware simulation via CupCarbon, and seamless generation of deployable Arduino code from high-level design models. <strong>Result</strong> - CAPS enables a traceable and cohesive development process from architectural design to physical deployment. Case studies from diverse domains demonstrate its ability to improve modeling expressiveness, maintain transformation fidelity, and reduce development time through automation. <strong>Conclusion</strong> - CAPS unifies architectural modeling, simulation, and code generation into a novel, end-to-end toolchain, addressing fragmentation in the IoT development lifecycle and enhancing early validation and traceability.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112728"},"PeriodicalIF":4.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging syntactic dual-graph representations for security patch identification via structural latent alignment 利用句法双图表示,通过结构潜在对齐进行安全补丁识别
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-10 DOI: 10.1016/j.jss.2025.112742
Jiajun Tong, Zhixiao Wang, Xiaobin Rui
Security patch identification is to automatically detect security-relevant commits from a vast number of code diffs. Previous works significantly improve the performance of automatic security patch identification by introducing pre-trained models. However, most existing approaches fail to capture the structural evolution across code diffs which is essential for understanding how code behavior changes during patching. Furthermore, the changes in code diffs are usually subtles, resulting in sparse representation of structural changes and making difficulty in distinguishing the effects of different structural changes. Finally, none of existing methods can fully capture the syntactic and semantic information from code diffs and commit message. In this work, we propose DualGraphPatcher, a framework designed to learn a well-structured and discriminative feature in the latent space by jointly modeling semantic information and structural code evolution. To capture fine-grained code structural changes, instead of relying on a single snapshot of a function, we construct two independent Abstract Syntax Tree (AST) structures for the pre-change and post-change versions from the original code diffs to model the entire graph topology of each version, enabling explicit modeling of structural evolution across code diffs. To alleviate the sparse representation of structural changes, we propose a latent structural alignment module, which performs soft clustering over the representations of pre-change and post-change ASTs and minimizes their distributional divergence in the shared latent space. To jointly learn syntactic and semantic information from code diffs and commit messages, we design a tri-encoder architecture, where CodeBERT and BERT extract semantic embeddings from code diffs and commit messages, and GCN encodes the syntactic structures of pre-change and post-change ASTs. Experiments on three real-world datasets demonstrate that DualGraphPatcher consistently outperforms state-of-the-art baselines in security patch identification, validating the effectiveness of both dual-graph modeling, latent structural alignment and tri-encoder. The code and data are shared in https://github.com/AppleMax1992/DualGraphPatcher.
安全补丁识别是从大量代码差异中自动检测与安全相关的提交。先前的工作通过引入预训练模型,显著提高了安全补丁自动识别的性能。然而,大多数现有的方法都无法捕获代码差异之间的结构演变,这对于理解代码行为在打补丁期间如何变化是至关重要的。此外,代码差异的变化通常是微妙的,导致结构变化的稀疏表示,难以区分不同结构变化的影响。最后,现有的方法都不能从代码差异和提交消息中完全捕获语法和语义信息。在这项工作中,我们提出了DualGraphPatcher框架,该框架旨在通过联合建模语义信息和结构代码进化来学习潜在空间中结构良好的判别特征。为了捕获细粒度的代码结构变化,而不是依赖于单个功能快照,我们为原始代码差异的更改前和更改后版本构建了两个独立的抽象语法树(AST)结构,以对每个版本的整个图拓扑进行建模,从而实现跨代码差异的结构演变的显式建模。为了缓解结构变化的稀疏表示,我们提出了一个潜在结构对齐模块,该模块对变化前和变化后的ast表示进行软聚类,并最小化它们在共享潜在空间中的分布差异。为了从代码差异和提交消息中共同学习语法和语义信息,我们设计了一个三编码器架构,其中CodeBERT和BERT从代码差异和提交消息中提取语义嵌入,GCN对更改前和更改后ast的语法结构进行编码。在三个真实数据集上的实验表明,DualGraphPatcher在安全补丁识别方面始终优于最先进的基线,验证了双图建模、潜在结构对齐和三编码器的有效性。代码和数据在https://github.com/AppleMax1992/DualGraphPatcher中共享。
{"title":"Leveraging syntactic dual-graph representations for security patch identification via structural latent alignment","authors":"Jiajun Tong,&nbsp;Zhixiao Wang,&nbsp;Xiaobin Rui","doi":"10.1016/j.jss.2025.112742","DOIUrl":"10.1016/j.jss.2025.112742","url":null,"abstract":"<div><div>Security patch identification is to automatically detect security-relevant commits from a vast number of code diffs. Previous works significantly improve the performance of automatic security patch identification by introducing pre-trained models. However, most existing approaches fail to capture the structural evolution across code diffs which is essential for understanding how code behavior changes during patching. Furthermore, the changes in code diffs are usually subtles, resulting in sparse representation of structural changes and making difficulty in distinguishing the effects of different structural changes. Finally, none of existing methods can fully capture the syntactic and semantic information from code diffs and commit message. In this work, we propose DualGraphPatcher, a framework designed to learn a well-structured and discriminative feature in the latent space by jointly modeling semantic information and structural code evolution. To capture fine-grained code structural changes, instead of relying on a single snapshot of a function, we construct two independent Abstract Syntax Tree (AST) structures for the pre-change and post-change versions from the original code diffs to model the entire graph topology of each version, enabling explicit modeling of structural evolution across code diffs. To alleviate the sparse representation of structural changes, we propose a latent structural alignment module, which performs soft clustering over the representations of pre-change and post-change ASTs and minimizes their distributional divergence in the shared latent space. To jointly learn syntactic and semantic information from code diffs and commit messages, we design a tri-encoder architecture, where CodeBERT and BERT extract semantic embeddings from code diffs and commit messages, and GCN encodes the syntactic structures of pre-change and post-change ASTs. Experiments on three real-world datasets demonstrate that DualGraphPatcher consistently outperforms state-of-the-art baselines in security patch identification, validating the effectiveness of both dual-graph modeling, latent structural alignment and tri-encoder. The code and data are shared in <span><span>https://github.com/AppleMax1992/DualGraphPatcher</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112742"},"PeriodicalIF":4.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiFlaky: Hierarchy-aware flakiness classification HiFlaky:层次感知的片状分类
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-09 DOI: 10.1016/j.jss.2025.112741
Zheyuan Li , Zhenyu Wu , Yan Lei , Huan Xie , Maojin Li , Jian Hu
Flaky tests present a major challenge in software testing, as they intermittently pass or fail without changes to the source code, leading to developer frustration and wasted resources. Current methods for detecting and classifying flaky tests often overlook the hierarchical dependencies among the root causes of flakiness in test code. Additionally, their tools fail to handle complex cases involving multiple flaky causes and lack the capability for fine-grained classification.
To this end, we propose HiFlaky , a hierarchy-aware multi-label classification method for flaky tests that identifies multiple co-occurring root causes along different hierarchical paths, such as diagnosing a single test with both “Flaky/NOD/Network” and “Flaky/NOD/Concurrency/Async Wait”. HiFlaky leverages static semantic features from test code and hierarchy features, without dependency on dynamic execution features. To overcome the shortage of multi-root annotated data, we build a new dataset by expanding and labeling existing single-root datasets (IDoFT and FlakeFlagger) and 335 flaky tests from 145 Java projects on GitHub. Empirical evaluations demonstrate the effectiveness of HiFlaky in addressing both single and multi-root cause scenarios. In single-root cause scenarios, HiFlaky exhibits enhanced prediction accuracy compared to state-of-the-art methods, achieving a 30 % increase in Precision and an F1 score of 79 %. Furthermore, the fine-grained classification offered by HiFlaky provides useful insights that can facilitate root cause analysis, reduce debugging efforts, and contribute to improved software reliability. For complex multi-root cause scenarios, HiFlaky attains a Micro-F1 score of 0.812 and a Macro-F1 score of 0.414 across 29 categories.
不稳定的测试是软件测试中的一个主要挑战,因为它们在不更改源代码的情况下间歇性地通过或失败,导致开发人员沮丧并浪费资源。当前用于检测和分类片状测试的方法经常忽略测试代码中片状根源之间的层次依赖关系。此外,他们的工具无法处理涉及多个不稳定原因的复杂情况,并且缺乏细粒度分类的能力。为此,我们提出了HiFlaky,这是一种分层感知的片状测试多标签分类方法,它可以识别沿不同分层路径共同发生的多个根本原因,例如诊断单个测试同时具有“片状/NOD/网络”和“片状/NOD/并发/异步等待”。HiFlaky利用了测试代码中的静态语义特性和层次结构特性,而不依赖于动态执行特性。为了克服多根注释数据的不足,我们通过扩展和标记现有的单根数据集(IDoFT和FlakeFlagger)和GitHub上145个Java项目的335个flaky测试来构建一个新的数据集。经验评估证明了HiFlaky在解决单一和多根源问题方面的有效性。在单根原因情况下,与最先进的方法相比,HiFlaky的预测精度提高了30%,F1得分提高了79%。此外,HiFlaky提供的细粒度分类提供了有用的见解,可以促进根本原因分析,减少调试工作,并有助于提高软件可靠性。对于复杂的多根本原因场景,HiFlaky在29个类别中获得了Micro-F1得分为0.812,Macro-F1得分为0.414。
{"title":"HiFlaky: Hierarchy-aware flakiness classification","authors":"Zheyuan Li ,&nbsp;Zhenyu Wu ,&nbsp;Yan Lei ,&nbsp;Huan Xie ,&nbsp;Maojin Li ,&nbsp;Jian Hu","doi":"10.1016/j.jss.2025.112741","DOIUrl":"10.1016/j.jss.2025.112741","url":null,"abstract":"<div><div>Flaky tests present a major challenge in software testing, as they intermittently pass or fail without changes to the source code, leading to developer frustration and wasted resources. Current methods for detecting and classifying flaky tests often overlook the hierarchical dependencies among the root causes of flakiness in test code. Additionally, their tools fail to handle complex cases involving multiple flaky causes and lack the capability for fine-grained classification.</div><div>To this end, we propose HiFlaky , a hierarchy-aware multi-label classification method for flaky tests that identifies multiple co-occurring root causes along different hierarchical paths, such as diagnosing a single test with both “Flaky/NOD/Network” and “Flaky/NOD/Concurrency/Async Wait”. HiFlaky leverages static semantic features from test code and hierarchy features, without dependency on dynamic execution features. To overcome the shortage of multi-root annotated data, we build a new dataset by expanding and labeling existing single-root datasets (IDoFT and FlakeFlagger) and 335 flaky tests from 145 Java projects on GitHub. Empirical evaluations demonstrate the effectiveness of HiFlaky in addressing both single and multi-root cause scenarios. In single-root cause scenarios, HiFlaky exhibits enhanced prediction accuracy compared to state-of-the-art methods, achieving a 30 % increase in Precision and an F1 score of 79 %. Furthermore, the fine-grained classification offered by HiFlaky provides useful insights that can facilitate root cause analysis, reduce debugging efforts, and contribute to improved software reliability. For complex multi-root cause scenarios, HiFlaky attains a Micro-F1 score of 0.812 and a Macro-F1 score of 0.414 across 29 categories.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112741"},"PeriodicalIF":4.1,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEzzer: Efficient Fuzzing Mutation Scheduling Based on Differential Evolution DEzzer:基于差分进化的高效模糊突变调度
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-09 DOI: 10.1016/j.jss.2025.112740
Jinfu Chen , Wenjun Feng , Saihua Cai , Xingquan Mao , Zhixiang Zhang , Jingyi Chen , Yisong Liu
In recent years, mutation-based fuzzing approaches have gained widespread attention in the field of software security research. The effectiveness of such approaches relies on mutation scheduling, where classical strategies utilize random mutation operators on test cases to produce a varied range of mutated test cases. To boost these strategies, rules and methods have been proposed to guide the selection of mutation operators, thereby exploring a wide range of input spaces. However, test cases generated by these approaches tend to exhibit bias toward specific paths or code regions, making it challenging to effectively explore other areas and identify potential vulnerabilities. In this paper, we propose a novel mutation scheduling approach, which is called DEzzer. Using a customized differential evolution strategy and an effective balance point of multiple feedback signals, it optimizes the probability distribution of mutation operators. We evaluated DEzzer on the GNU Binutils suite, five independent programs in the real world, and the LAVA-M dataset. The results showed that DEzzer outperformed advanced mutation schedulers and the AFL baseline in terms of coverage in FuzzBench, successfully identifying more unique crashes. Furthermore, we conduct an analysis of these crashes, pinpointing the specific locations of vulnerabilities.
近年来,基于突变的模糊方法在软件安全研究领域受到了广泛关注。这种方法的有效性依赖于突变调度,其中经典策略利用测试用例上的随机突变操作符来产生不同范围的突变测试用例。为了促进这些策略,已经提出了指导突变算子选择的规则和方法,从而探索了广泛的输入空间。然而,由这些方法生成的测试用例倾向于表现出对特定路径或代码区域的偏见,使得有效地探索其他区域和识别潜在漏洞变得具有挑战性。本文提出了一种新的突变调度方法,称为DEzzer。利用自定义的差分进化策略和多个反馈信号的有效平衡点,优化变异算子的概率分布。我们在GNU Binutils套件、现实世界中的五个独立程序和LAVA-M数据集上对DEzzer进行了评估。结果表明,DEzzer在FuzzBench的覆盖率方面优于高级突变调度器和AFL基线,成功地识别了更多独特的崩溃。此外,我们对这些崩溃进行分析,确定漏洞的具体位置。
{"title":"DEzzer: Efficient Fuzzing Mutation Scheduling Based on Differential Evolution","authors":"Jinfu Chen ,&nbsp;Wenjun Feng ,&nbsp;Saihua Cai ,&nbsp;Xingquan Mao ,&nbsp;Zhixiang Zhang ,&nbsp;Jingyi Chen ,&nbsp;Yisong Liu","doi":"10.1016/j.jss.2025.112740","DOIUrl":"10.1016/j.jss.2025.112740","url":null,"abstract":"<div><div>In recent years, mutation-based fuzzing approaches have gained widespread attention in the field of software security research. The effectiveness of such approaches relies on mutation scheduling, where classical strategies utilize random mutation operators on test cases to produce a varied range of mutated test cases. To boost these strategies, rules and methods have been proposed to guide the selection of mutation operators, thereby exploring a wide range of input spaces. However, test cases generated by these approaches tend to exhibit bias toward specific paths or code regions, making it challenging to effectively explore other areas and identify potential vulnerabilities. In this paper, we propose a novel mutation scheduling approach, which is called DEzzer. Using a customized differential evolution strategy and an effective balance point of multiple feedback signals, it optimizes the probability distribution of mutation operators. We evaluated DEzzer on the GNU Binutils suite, five independent programs in the real world, and the LAVA-M dataset. The results showed that DEzzer outperformed advanced mutation schedulers and the AFL baseline in terms of coverage in FuzzBench, successfully identifying more unique crashes. Furthermore, we conduct an analysis of these crashes, pinpointing the specific locations of vulnerabilities.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112740"},"PeriodicalIF":4.1,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Peeking inside the black box: Training data exposure in code language models 窥视黑盒子内部:代码语言模型中的训练数据暴露
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-09 DOI: 10.1016/j.jss.2025.112729
Angelica Spina, Marco Russodivito, Simone Scalabrino, Rocco Oliveto
Large Language Models (LLMs) have demonstrated effective in tackling coding tasks, leading to their growing popularity in commercial solutions like GitHub Copilot and ChatGPT. These models, however, may be trained on proprietary code, raising concerns about potential leaks of intellectual property. A recent study indicates that LLMs can memorize parts of the source code, rendering them vulnerable to extraction attacks. However, it used white-box attacks which assume that adversaries have partial knowledge of the training set.
In this paper, we present a pioneering effort to conduct a black-box reconstruction attack on an LLM – CodeT5+ – trained to tackle a specific coding task – code summarization. We assume the adversary has no knowledge about the training set. We train an inverse model, i.e., a model that, given a comment, aims to reconstruct the source code from the training set. Then, we try to understand to what extent such a model can reconstruct the code in the training set. Our results show that the attack through the inverse model does not allow an adversary to fully reconstruct training code instances, except for a minority of cases. On the other hand, an in-depth manual analysis of the reconstructed code reveals that some important information (such as the APIs adopted) can be extracted in several cases, showing the potential vulnerability of such models.
大型语言模型(llm)在处理编码任务方面已经证明是有效的,这使得它们在GitHub Copilot和ChatGPT等商业解决方案中越来越受欢迎。然而,这些模型可能会接受专有代码的训练,这引发了人们对潜在的知识产权泄露的担忧。最近的一项研究表明,llm可以记住部分源代码,使它们容易受到提取攻击。然而,它使用了白盒攻击,这种攻击假设对手对训练集有部分了解。在本文中,我们提出了一项开创性的努力,对LLM - CodeT5+ -进行黑盒重构攻击,以解决特定的编码任务-代码汇总。我们假设对手对训练集一无所知。我们训练一个逆模型,即一个给定注释的模型,旨在从训练集中重建源代码。然后,我们尝试理解这样的模型在多大程度上可以重构训练集中的代码。我们的结果表明,除了少数情况外,通过逆模型的攻击不允许对手完全重构训练代码实例。另一方面,通过对重构代码进行深入的手工分析,可以发现在一些情况下可以提取出一些重要的信息(例如所采用的api),从而显示出此类模型的潜在漏洞。
{"title":"Peeking inside the black box: Training data exposure in code language models","authors":"Angelica Spina,&nbsp;Marco Russodivito,&nbsp;Simone Scalabrino,&nbsp;Rocco Oliveto","doi":"10.1016/j.jss.2025.112729","DOIUrl":"10.1016/j.jss.2025.112729","url":null,"abstract":"<div><div>Large Language Models (LLMs) have demonstrated effective in tackling coding tasks, leading to their growing popularity in commercial solutions like GitHub Copilot and ChatGPT. These models, however, may be trained on proprietary code, raising concerns about potential leaks of intellectual property. A recent study indicates that LLMs can memorize parts of the source code, rendering them vulnerable to extraction attacks. However, it used <em>white-box</em> attacks which assume that adversaries have partial knowledge of the training set.</div><div>In this paper, we present a pioneering effort to conduct a <em>black-box</em> reconstruction attack on an LLM – CodeT5+ – trained to tackle a specific coding task – code summarization. We assume the adversary has no knowledge about the training set. We train an <em>inverse model, i.e.,</em> a model that, given a comment, aims to reconstruct the source code from the training set. Then, we try to understand to what extent such a model can reconstruct the code in the training set. Our results show that the attack through the inverse model does not allow an adversary to fully reconstruct training code instances, except for a minority of cases. On the other hand, an in-depth manual analysis of the reconstructed code reveals that some important information (such as the APIs adopted) can be extracted in several cases, showing the potential vulnerability of such models.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112729"},"PeriodicalIF":4.1,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexInstru: A flexible instrumentation framework for tracing long-running native workloads FlexInstru:一个灵活的工具框架,用于跟踪长时间运行的本机工作负载
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-08 DOI: 10.1016/j.jss.2025.112739
Wenlong Mu, Ning Li, Zimo Ji, Jianmei Guo, Bo Huang
Understanding program runtime characteristics is crucial for tasks such as optimization and workload characterization. For long-running server-side workloads that execute as native binaries, effective profiling is essential to trace their complex runtime behaviors, enabling further optimizations to improve the reliability and efficiency of the delivered services. Widely adopted techniques for profiling these workloads include binary instrumentation and hardware-based profiling. Binary instrumentation is typically accurate but incurs high overhead and lacks flexibility for tracing long-running native workloads. Hardware-based profiling brings low overhead while requiring hardware support. To overcome these limitations, we present FlexInstru, a hardware-independent dynamic instrumentation framework based on the process attachment/detachment mechanism. FlexInstru can flexibly instrument a native application at any time and for any duration when the application is running, and achieve a good balance between instrumentation accuracy and overhead, which makes it particularly effective in tracing long-running native workloads.
FlexInstru provides a process attachment/detachment mechanism on Linux, allowing attaching an instrumentation engine to a long-running native workload and detaching it at any time. To mitigate overhead, FlexInstru also enables flexible control of instrumentation through multiple attachments/detachments, allowing the workload to alternate between instrumented execution and native execution. Moreover, during instrumented execution, FlexInstru supports a sampling mechanism to collect data only during the sampling period, further reducing the overhead. We evaluate FlexInstru on AArch64 and X86-64 using real-world workloads. For MySQL’s branch recording tasks, FlexInstru substantially reduces instrumentation overhead, with reductions of 415.60 ×  on AArch64 and 1223.02 ×  on X86-64 compared to traditional dynamic instrumentation, while maintaining sufficient accuracy.
理解程序运行时特征对于优化和工作负载表征等任务至关重要。对于作为本机二进制文件执行的长时间运行的服务器端工作负载,有效的分析对于跟踪其复杂的运行时行为至关重要,从而支持进一步优化以提高交付服务的可靠性和效率。用于分析这些工作负载的广泛采用的技术包括二进制工具和基于硬件的分析。二进制插装通常是准确的,但会带来很高的开销,并且缺乏跟踪长时间运行的本机工作负载的灵活性。基于硬件的分析在需要硬件支持的同时带来了较低的开销。为了克服这些限制,我们提出了FlexInstru,一个基于进程附加/分离机制的硬件无关的动态仪器框架。当应用程序运行时,FlexInstru可以在任何时间和任何持续时间灵活地检测本机应用程序,并在检测精度和开销之间实现良好的平衡,这使得它在跟踪长时间运行的本机工作负载方面特别有效。FlexInstru在Linux上提供了一种进程附加/分离机制,允许将工具引擎附加到长时间运行的本机工作负载上,并随时将其分离。为了减少开销,FlexInstru还支持通过多个附件/分离来灵活控制插装,允许工作负载在插装执行和本机执行之间交替进行。此外,在仪表化执行期间,FlexInstru支持采样机制,仅在采样期间收集数据,从而进一步减少了开销。我们使用实际工作负载在AArch64和X86-64上评估FlexInstru。对于MySQL的分支记录任务,FlexInstru大大减少了仪器开销,与传统的动态仪器相比,在AArch64上减少了415.60 × ,在X86-64上减少了1223.02 × ,同时保持了足够的准确性。
{"title":"FlexInstru: A flexible instrumentation framework for tracing long-running native workloads","authors":"Wenlong Mu,&nbsp;Ning Li,&nbsp;Zimo Ji,&nbsp;Jianmei Guo,&nbsp;Bo Huang","doi":"10.1016/j.jss.2025.112739","DOIUrl":"10.1016/j.jss.2025.112739","url":null,"abstract":"<div><div>Understanding program runtime characteristics is crucial for tasks such as optimization and workload characterization. For long-running server-side workloads that execute as native binaries, effective profiling is essential to trace their complex runtime behaviors, enabling further optimizations to improve the reliability and efficiency of the delivered services. Widely adopted techniques for profiling these workloads include binary instrumentation and hardware-based profiling. Binary instrumentation is typically accurate but incurs high overhead and lacks flexibility for tracing long-running native workloads. Hardware-based profiling brings low overhead while requiring hardware support. To overcome these limitations, we present FlexInstru, a hardware-independent dynamic instrumentation framework based on the process attachment/detachment mechanism. FlexInstru can flexibly instrument a native application at any time and for any duration when the application is running, and achieve a good balance between instrumentation accuracy and overhead, which makes it particularly effective in tracing long-running native workloads.</div><div>FlexInstru provides a process attachment/detachment mechanism on Linux, allowing attaching an instrumentation engine to a long-running native workload and detaching it at any time. To mitigate overhead, FlexInstru also enables flexible control of instrumentation through multiple attachments/detachments, allowing the workload to alternate between instrumented execution and native execution. Moreover, during instrumented execution, FlexInstru supports a sampling mechanism to collect data only during the sampling period, further reducing the overhead. We evaluate FlexInstru on AArch64 and X86-64 using real-world workloads. For MySQL’s branch recording tasks, FlexInstru substantially reduces instrumentation overhead, with reductions of 415.60 ×  on AArch64 and 1223.02 ×  on X86-64 compared to traditional dynamic instrumentation, while maintaining sufficient accuracy.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112739"},"PeriodicalIF":4.1,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EISM: an interactive and collaborative approach for software modularization 软件模块化的交互和协作方法
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-07 DOI: 10.1016/j.jss.2025.112726
Chenxing Zhong , Chao Li , He Zhang
To address the increasing complexity of modern software systems, modularization techniques are used to restructure a system into meaningful modules, helping relieve the burden of understanding the system (and thus build trust between different modules). However, many current modularization approaches may suffer from two limitations due to restricted information. First, they often derive modules simply based on structural and semantic dependencies from the source code without sufficient attention to evolutionary information, by which the resulting modules might be difficult to evolve independently. Second, although some previous researchers have explored integrating architect knowledge into the modularization process to address the issue of limited information, they often overlooked the possibility that this integration might compromise the quality of the final solutions.
To bridge these gaps, we propose EISM (Evolutionary dependencies-based Interactive Software Modularization), an approach to modularize a tangled software system that takes advantage not only algorithm efficiency but developers’ knowledge. Our approach enables an effective collaboration between the modularization algorithm and developers; where the former takes the responsibility for (re-)optimizing the solution quality via the structural, semantic, and evolutionary dependencies, the latter is responsible for adjusting the solutions in terms of reasonability. To evaluate the effectiveness of our approach, we conducted a series of controlled experiments on five diverse open-source projects. The results show that EISM could improve the evolvability of software modules by at least 95 % compared to the existing techniques, and effectively help developers to interactively adjust the solutions by proactively and significantly re-optimizing the adjusted solutions with an average of 12.87 % in quality improvement.
为了解决现代软件系统日益增加的复杂性,模块化技术被用于将系统重构为有意义的模块,帮助减轻理解系统的负担(从而在不同模块之间建立信任)。然而,由于信息的限制,许多当前的模块化方法可能受到两方面的限制。首先,他们通常仅仅基于源代码的结构和语义依赖派生模块,而没有充分关注进化信息,由此产生的模块可能难以独立进化。其次,尽管之前的一些研究者已经探索了将架构师知识集成到模块化过程中,以解决有限信息的问题,但他们经常忽略了这种集成可能会损害最终解决方案的质量的可能性。为了弥合这些差距,我们提出了EISM(基于进化依赖的交互式软件模块化),这是一种模块化复杂软件系统的方法,不仅利用算法效率,而且利用开发人员的知识。我们的方法实现了模块化算法和开发人员之间的有效协作;前者负责通过结构、语义和进化依赖关系(重新)优化解决方案质量,后者负责根据合理性调整解决方案。为了评估我们方法的有效性,我们在五个不同的开源项目上进行了一系列的对照实验。结果表明,与现有技术相比,EISM可将软件模块的可演化性提高至少95%,并能有效地帮助开发人员通过对调整后的解决方案进行主动、显著的再优化来进行交互式调整,平均质量改进率为12.87%。
{"title":"EISM: an interactive and collaborative approach for software modularization","authors":"Chenxing Zhong ,&nbsp;Chao Li ,&nbsp;He Zhang","doi":"10.1016/j.jss.2025.112726","DOIUrl":"10.1016/j.jss.2025.112726","url":null,"abstract":"<div><div>To address the increasing complexity of modern software systems, modularization techniques are used to restructure a system into meaningful <em>modules</em>, helping relieve the burden of understanding the system (and thus build trust between different modules). However, many current modularization approaches may suffer from two limitations due to restricted information. First, they often derive modules simply based on structural and semantic dependencies from the source code without sufficient attention to evolutionary information, by which the resulting modules might be difficult to evolve independently. Second, although some previous researchers have explored integrating architect knowledge into the modularization process to address the issue of limited information, they often overlooked the possibility that this integration might compromise the quality of the final solutions.</div><div>To bridge these gaps, we propose EISM (<u><strong>E</strong></u>volutionary dependencies-based <u><strong>I</strong></u>nteractive <u><strong>S</strong></u>oftware <u><strong>M</strong></u>odularization), an approach to modularize a tangled software system that takes advantage not only algorithm efficiency but developers’ knowledge. Our approach enables an effective collaboration between the modularization algorithm and developers; where the former takes the responsibility for (re-)optimizing the solution quality via the structural, semantic, and evolutionary dependencies, the latter is responsible for adjusting the solutions in terms of reasonability. To evaluate the effectiveness of our approach, we conducted a series of controlled experiments on five diverse open-source projects. The results show that EISM could improve the evolvability of software modules by at least 95 % compared to the existing techniques, and effectively help developers to interactively adjust the solutions by proactively and significantly re-optimizing the adjusted solutions with an average of 12.87 % in quality improvement.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112726"},"PeriodicalIF":4.1,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying generative artificial intelligence for vulnerability fixing in a proprietary software ecosystem 在专有软件生态系统中应用生成式人工智能进行漏洞修复
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-07 DOI: 10.1016/j.jss.2025.112723
Luiz Alexandre Costa , Awdren Fontão , Rodrigo Pereira dos Santos , Alexander Serebrenik
Context: Large organizations often operate within proprietary software ecosystems (PSECO), composed of interdependent software artifacts, multiple actors, and centralized governance. In such environments, addressing security vulnerabilities is particularly challenging due to strict quality standards, regulatory constraints, and the risk of cascading failures. These challenges can compromise delivery schedules, increase operational risk, and force teams to interrupt planned work to address urgent issues. Goal: In response to these challenges, recent advances in Generative Artificial Intelligence (GenAI), particularly large language models (LLM), have emerged as a promising avenue to support software engineering tasks such as code generation and automated repair. This study investigates how a GenAI-based approach can support vulnerability remediation in PSECO while preserving delivery cadence in Continuous Integration and Continuous Delivery (CI/CD) pipelines. Methods: We conducted a participative case study in a large global organization to design and evaluate PSECO-SafePatch, an approach composed of: (i) a structured remediation process tailored to enterprise constraints; and (ii) a web-based tool integrating Fortify static analysis with GenAI-based patch generation. The approach includes human-in-the-loop validation and aligns with DevOps practices. Results: PSECO-SafePatch reduced mean remediation time by 84 % with an 89 % patch success rate. It also reduced cognitive overload by guiding developers through structured validation, fostering trust without overreliance. Conclusion: The findings show that GenAI-supported remediation is feasible and effective in PSECO. Human-in-the-loop validation preserved critical thinking, addressing concerns about blind automation and knowledge erosion, and reinforcing the value of responsible automation at scale.
上下文:大型组织通常在专有软件生态系统(PSECO)中运行,该生态系统由相互依赖的软件工件、多个参与者和集中治理组成。在这样的环境中,由于严格的质量标准、监管约束和级联故障的风险,解决安全漏洞尤其具有挑战性。这些挑战可能危及交付进度,增加操作风险,并迫使团队中断计划的工作以解决紧急问题。目标:为了应对这些挑战,生成式人工智能(GenAI)的最新进展,特别是大型语言模型(LLM),已经成为支持软件工程任务(如代码生成和自动修复)的有前途的途径。本研究探讨了基于genai的方法如何支持PSECO中的漏洞修复,同时保持持续集成和持续交付(CI/CD)管道中的交付节奏。方法:我们在一家大型全球性组织中进行了参与性案例研究,以设计和评估PSECO-SafePatch,该方法包括:(i)根据企业约束量身定制的结构化修复过程;以及(ii)将Fortify静态分析与基于genai的补丁生成集成在一起的基于web的工具。该方法包括人在循环验证,并与DevOps实践保持一致。结果:PSECO-SafePatch将平均修复时间缩短了84%,修复成功率为89%。它还通过引导开发人员进行结构化验证来减少认知超载,在不过度依赖的情况下培养信任。结论:genai支持的PSECO修复是可行和有效的。人在环验证保留了批判性思维,解决了对盲目自动化和知识侵蚀的担忧,并在规模上强化了负责任的自动化的价值。
{"title":"Applying generative artificial intelligence for vulnerability fixing in a proprietary software ecosystem","authors":"Luiz Alexandre Costa ,&nbsp;Awdren Fontão ,&nbsp;Rodrigo Pereira dos Santos ,&nbsp;Alexander Serebrenik","doi":"10.1016/j.jss.2025.112723","DOIUrl":"10.1016/j.jss.2025.112723","url":null,"abstract":"<div><div><strong>Context:</strong> Large organizations often operate within proprietary software ecosystems (PSECO), composed of interdependent software artifacts, multiple actors, and centralized governance. In such environments, addressing security vulnerabilities is particularly challenging due to strict quality standards, regulatory constraints, and the risk of cascading failures. These challenges can compromise delivery schedules, increase operational risk, and force teams to interrupt planned work to address urgent issues. <strong>Goal:</strong> In response to these challenges, recent advances in Generative Artificial Intelligence (GenAI), particularly large language models (LLM), have emerged as a promising avenue to support software engineering tasks such as code generation and automated repair. This study investigates how a GenAI-based approach can support vulnerability remediation in PSECO while preserving delivery cadence in Continuous Integration and Continuous Delivery (CI/CD) pipelines. <strong>Methods:</strong> We conducted a participative case study in a large global organization to design and evaluate PSECO-SafePatch, an approach composed of: (i) a structured remediation process tailored to enterprise constraints; and (ii) a web-based tool integrating Fortify static analysis with GenAI-based patch generation. The approach includes human-in-the-loop validation and aligns with DevOps practices. <strong>Results:</strong> PSECO-SafePatch reduced mean remediation time by 84 % with an 89 % patch success rate. It also reduced cognitive overload by guiding developers through structured validation, fostering trust without overreliance. <strong>Conclusion:</strong> The findings show that GenAI-supported remediation is feasible and effective in PSECO. Human-in-the-loop validation preserved critical thinking, addressing concerns about blind automation and knowledge erosion, and reinforcing the value of responsible automation at scale.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112723"},"PeriodicalIF":4.1,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the potential and limitations of large language models for novice program fault localization 探索大型语言模型在新手程序故障定位中的潜力和局限性
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-06 DOI: 10.1016/j.jss.2025.112731
Hexiang Xu , Hengyuan Liu , Yonghao Wu , Xiaolan Kang , Xiang Chen , Yong Liu
Novice programmers often face challenges in fault localization due to their limited experience and understanding of programming syntax and logic. Traditional methods like Spectrum-Based Fault Localization (SBFL) and Mutation-Based Fault Localization (MBFL) help identify faults but often lack the ability to understand code context, making them less effective for beginners. In recent years, Large Language Models (LLMs) have shown promise in overcoming these limitations by utilizing their ability to understand program syntax and semantics. LLM-based fault localization provides more accurate and context-aware results than traditional techniques. This study evaluates six closed-source and seven open-source LLMs using the Codeflaws, Condefects, and BugT datasets, with BugT being a newly constructed dataset specifically designed to mitigate data leakage concerns. Advanced models with reasoning capabilities, such as OpenAI o3 and DeepSeekR1, achieve superior accuracy with minimal reliance on prompt engineering. In contrast, models without reasoning capabilities, like GPT-4, require carefully designed prompts to maintain performance. While LLMs perform well in simple fault localization, their accuracy decreases as problem difficulty increases, though top models maintain robust performance in the BugT dataset. Over-reasoning is another challenge, where some models generate excessive explanations that hinder fault localization clarity. Additionally, the computational cost of deploying LLMs remains a significant barrier for real-time debugging. LLM’s explanations demonstrate significant value for novice programmer assistance, with one-year experience participants consistently rating them highly. Our findings demonstrate the potential of LLMs to improve debugging efficiency while stressing the need for further refinement in their reasoning and computational efficiency for practical adoption.
由于编程新手的经验和对编程语法和逻辑的理解有限,他们经常在错误定位方面面临挑战。传统的方法,如基于频谱的故障定位(SBFL)和基于突变的故障定位(MBFL)有助于识别故障,但往往缺乏理解代码上下文的能力,使得它们对初学者来说效率较低。近年来,大型语言模型(llm)通过利用其理解程序语法和语义的能力,在克服这些限制方面显示出了希望。基于llm的故障定位提供了比传统技术更准确和上下文感知的结果。本研究使用Codeflaws、Condefects和BugT数据集评估了6个闭源法学硕士和7个开源法学硕士,其中BugT是一个专门设计用于减轻数据泄漏问题的新构建的数据集。具有推理能力的高级模型,如OpenAI o3和DeepSeekR1,在对即时工程的依赖最小的情况下实现了卓越的准确性。相比之下,没有推理能力的模型,比如GPT-4,需要精心设计提示来维持性能。虽然llm在简单故障定位方面表现良好,但随着问题难度的增加,其精度会下降,尽管顶级模型在BugT数据集中保持了稳健的性能。过度推理是另一个挑战,其中一些模型产生了过多的解释,阻碍了故障定位的清晰度。此外,部署llm的计算成本仍然是实时调试的一个重大障碍。LLM的解释对新手程序员的帮助显示了显著的价值,有一年经验的参与者一致给予很高的评价。我们的研究结果证明了llm在提高调试效率方面的潜力,同时强调需要进一步改进其推理和计算效率以供实际采用。
{"title":"Exploring the potential and limitations of large language models for novice program fault localization","authors":"Hexiang Xu ,&nbsp;Hengyuan Liu ,&nbsp;Yonghao Wu ,&nbsp;Xiaolan Kang ,&nbsp;Xiang Chen ,&nbsp;Yong Liu","doi":"10.1016/j.jss.2025.112731","DOIUrl":"10.1016/j.jss.2025.112731","url":null,"abstract":"<div><div>Novice programmers often face challenges in fault localization due to their limited experience and understanding of programming syntax and logic. Traditional methods like Spectrum-Based Fault Localization (SBFL) and Mutation-Based Fault Localization (MBFL) help identify faults but often lack the ability to understand code context, making them less effective for beginners. In recent years, Large Language Models (LLMs) have shown promise in overcoming these limitations by utilizing their ability to understand program syntax and semantics. LLM-based fault localization provides more accurate and context-aware results than traditional techniques. This study evaluates six closed-source and seven open-source LLMs using the Codeflaws, Condefects, and BugT datasets, with BugT being a newly constructed dataset specifically designed to mitigate data leakage concerns. Advanced models with reasoning capabilities, such as OpenAI o3 and DeepSeekR1, achieve superior accuracy with minimal reliance on prompt engineering. In contrast, models without reasoning capabilities, like GPT-4, require carefully designed prompts to maintain performance. While LLMs perform well in simple fault localization, their accuracy decreases as problem difficulty increases, though top models maintain robust performance in the BugT dataset. Over-reasoning is another challenge, where some models generate excessive explanations that hinder fault localization clarity. Additionally, the computational cost of deploying LLMs remains a significant barrier for real-time debugging. LLM’s explanations demonstrate significant value for novice programmer assistance, with one-year experience participants consistently rating them highly. Our findings demonstrate the potential of LLMs to improve debugging efficiency while stressing the need for further refinement in their reasoning and computational efficiency for practical adoption.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112731"},"PeriodicalIF":4.1,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Systems and Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1