Pub Date : 2026-04-01Epub Date: 2026-01-09DOI: 10.1016/j.infsof.2026.108021
Zhisheng Yang , Xiaofei Xu , Ke Deng , Li Li
Context:
Large Language Models (LLMs) have been applied to recommendation tasks, giving rise to the new paradigm of LLM-as-Recommendation Systems (LLM-as-RS). Existing methods fall into two categories: tuning and non-tuning. While tuning strategies offer better task alignment, they are expensive and require specialized training. Non-tuning strategies are easier to deploy but often lack task-specific knowledge, limiting their effectiveness.
Objective:
This study aims to enhance the recommendation quality of non-tuning LLM-based systems by addressing their lack of task awareness.
Method:
We propose a novel approach, Critique-based LLMs as Recommendation Systems (Critic-LLM-RS), which introduces an independent machine learning model—the Recommendation Critic—to provide feedback on LLM-generated recommendations and guide the LLM toward improved recommendation strategies.
Results:
Experiments on multiple real-world datasets demonstrate that Critic-LLM-RS significantly outperforms existing non-tuning approaches, regardless of whether open-source or proprietary LLMs are used.
Conclusion:
Critic-LLM-RS enhances the task adaptability of non-tuning LLMs through a collaborative feedback mechanism, offering a new solution for building efficient and easily deployable recommendation systems.
{"title":"Wise recommender: LLMs refined by iterative critics","authors":"Zhisheng Yang , Xiaofei Xu , Ke Deng , Li Li","doi":"10.1016/j.infsof.2026.108021","DOIUrl":"10.1016/j.infsof.2026.108021","url":null,"abstract":"<div><h3>Context:</h3><div>Large Language Models (LLMs) have been applied to recommendation tasks, giving rise to the new paradigm of LLM-as-Recommendation Systems (LLM-as-RS). Existing methods fall into two categories: tuning and non-tuning. While tuning strategies offer better task alignment, they are expensive and require specialized training. Non-tuning strategies are easier to deploy but often lack task-specific knowledge, limiting their effectiveness.</div></div><div><h3>Objective:</h3><div>This study aims to enhance the recommendation quality of non-tuning LLM-based systems by addressing their lack of task awareness.</div></div><div><h3>Method:</h3><div>We propose a novel approach, Critique-based LLMs as Recommendation Systems (Critic-LLM-RS), which introduces an independent machine learning model—the Recommendation Critic—to provide feedback on LLM-generated recommendations and guide the LLM toward improved recommendation strategies.</div></div><div><h3>Results:</h3><div>Experiments on multiple real-world datasets demonstrate that Critic-LLM-RS significantly outperforms existing non-tuning approaches, regardless of whether open-source or proprietary LLMs are used.</div></div><div><h3>Conclusion:</h3><div>Critic-LLM-RS enhances the task adaptability of non-tuning LLMs through a collaborative feedback mechanism, offering a new solution for building efficient and easily deployable recommendation systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108021"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-13DOI: 10.1016/j.infsof.2026.108037
Xuanye Wang , Lu Lu
Context:
Vulnerability detection leveraging pre-trained models has achieved notable success, but its coarse-grained outputs fail to provide security engineers with vulnerability type information. Recent type-aware Software Vulnerability Classification (SVC) methods mitigate this gap, but often neglect inter-type semantic relationships and exhibit limited knowledge transfer, resulting in suboptimal learned representations.
Objective:
To address these limitations, this study proposes VulATMHD, a novel type-aware SVC framework that integrates adaptive triplet mining with hybrid distillation.
Methods:
VulATMHD first groups vulnerability types based on common weakness enumeration abstract types. It then constructs a multi-teacher architecture, with each teacher assigned to a specific group. Adaptive triplet mining is introduced to guide feature learning, yielding feature representations that are intra-class compact and inter-class separable. Since each teacher is optimized for intra-group performance, VulATMHD further introduces a hybrid distillation strategy to transfer both feature representations and label distributions from the teacher ensemble to a pre-trained student.
Results:
Empirical evaluations on the BigVul dataset show that, compared to baseline methods, VulATMHD improves Accuracy and weighted F1-score by 4.7%–29.9% and 5.7%–34.1%, respectively. Moreover, VulATMHD is compatible with various pre-trained models, such as CodeBERT, CodeT5+, and GraphCodeBERT.
Conclusion:
The proposed VulATMHD outperforms state-of-the-art SVC methods and exhibits superior robustness and scalability in downstream tasks, highlighting its potential for practical applications.
{"title":"VulATMHD: Joint adaptive triplet mining and hybrid distillation for type-aware vulnerability classification","authors":"Xuanye Wang , Lu Lu","doi":"10.1016/j.infsof.2026.108037","DOIUrl":"10.1016/j.infsof.2026.108037","url":null,"abstract":"<div><h3>Context:</h3><div>Vulnerability detection leveraging pre-trained models has achieved notable success, but its coarse-grained outputs fail to provide security engineers with vulnerability type information. Recent type-aware Software Vulnerability Classification (SVC) methods mitigate this gap, but often neglect inter-type semantic relationships and exhibit limited knowledge transfer, resulting in suboptimal learned representations.</div></div><div><h3>Objective:</h3><div>To address these limitations, this study proposes VulATMHD, a novel type-aware SVC framework that integrates adaptive triplet mining with hybrid distillation.</div></div><div><h3>Methods:</h3><div>VulATMHD first groups vulnerability types based on common weakness enumeration abstract types. It then constructs a multi-teacher architecture, with each teacher assigned to a specific group. Adaptive triplet mining is introduced to guide feature learning, yielding feature representations that are intra-class compact and inter-class separable. Since each teacher is optimized for intra-group performance, VulATMHD further introduces a hybrid distillation strategy to transfer both feature representations and label distributions from the teacher ensemble to a pre-trained student.</div></div><div><h3>Results:</h3><div>Empirical evaluations on the BigVul dataset show that, compared to baseline methods, VulATMHD improves Accuracy and weighted F1-score by 4.7%–29.9% and 5.7%–34.1%, respectively. Moreover, VulATMHD is compatible with various pre-trained models, such as CodeBERT, CodeT5<span>+</span>, and GraphCodeBERT.</div></div><div><h3>Conclusion:</h3><div>The proposed VulATMHD outperforms state-of-the-art SVC methods and exhibits superior robustness and scalability in downstream tasks, highlighting its potential for practical applications.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108037"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-05DOI: 10.1016/j.infsof.2026.108019
Phan The Duy, Nguyen Manh Cuong, Ha Trieu Yen Vy, Le Tuan Luong, Nguyen Tran Duc Anh, Nghi Hoang Khoa, Van-Hau Pham
Malware continues to evolve, exposing weaknesses in conventional detectors and motivating realistic adversarial evaluations. Prior RL-based evasion methods often rely on partial model access or feature-level perturbations, limiting realism under strict black-box constraints. We propose xPriMES, a dual-environment reinforcement learning framework that generates functionality-preserving binary mutations for malware evasion in black-box settings. A LightGBM surrogate provides continuous confidence feedback for dense reward shaping, while the real target detector supplies binary feedback — used both for episode termination and for issuing the final reward — ensuring learning remains grounded in real evasion outcomes. The agent employs Thompson sampling and SHAP-guided prioritized replay to focus exploration on feature-relevant mutations and accelerate convergence. Experiments on multiple static detectors (LightGBM, RF+CNN, MalConv, CNN, KNN) demonstrate up to 97.4% evasion success, surpassing PSP-Mal under equivalent conditions. Further tests on VirusTotal confirm the transferability and real-world impact of the adversarial samples. These findings show that integrating explainable guidance with surrogate-assisted RL yields interpretable and effective black-box evasion while preserving functionality. We conclude with implications for defensive hardening and discuss limitations related to surrogate fidelity and the focus on static detection.
{"title":"xPriMES: Explainable reinforcement learning-guided mutation strategy with dual-environment interaction for evading black-box malware detectors","authors":"Phan The Duy, Nguyen Manh Cuong, Ha Trieu Yen Vy, Le Tuan Luong, Nguyen Tran Duc Anh, Nghi Hoang Khoa, Van-Hau Pham","doi":"10.1016/j.infsof.2026.108019","DOIUrl":"10.1016/j.infsof.2026.108019","url":null,"abstract":"<div><div>Malware continues to evolve, exposing weaknesses in conventional detectors and motivating realistic adversarial evaluations. Prior RL-based evasion methods often rely on partial model access or feature-level perturbations, limiting realism under strict black-box constraints. We propose xPriMES, a dual-environment reinforcement learning framework that generates functionality-preserving binary mutations for malware evasion in black-box settings. A LightGBM surrogate provides continuous confidence feedback for dense reward shaping, while the real target detector supplies binary feedback — used both for episode termination and for issuing the final reward — ensuring learning remains grounded in real evasion outcomes. The agent employs Thompson sampling and SHAP-guided prioritized replay to focus exploration on feature-relevant mutations and accelerate convergence. Experiments on multiple static detectors (LightGBM, RF+CNN, MalConv, CNN, KNN) demonstrate up to 97.4% evasion success, surpassing PSP-Mal under equivalent conditions. Further tests on VirusTotal confirm the transferability and real-world impact of the adversarial samples. These findings show that integrating explainable guidance with surrogate-assisted RL yields interpretable and effective black-box evasion while preserving functionality. We conclude with implications for defensive hardening and discuss limitations related to surrogate fidelity and the focus on static detection.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108019"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-17DOI: 10.1016/j.infsof.2026.108020
Li Liu, Shen Wang, Xunzhi Jiang
Context:
The widespread adoption of Internet of Things (IoT) devices has amplified the impact of vulnerabilities in embedded firmware. Binary code similarity detection (BCSD), a static analysis technique that compares functions without source code, plays an important role in firmware vulnerability detection. However, existing control flow graph (CFG)-based methods directly aggregate basic block features to learn structural information, neglecting the rich semantics in control transfers between basic blocks (i.e., CFG edges). This limitation leads to degraded performance under diverse compilation settings.
Objective:
To address the limitations of existing CFG-based similarity detection methods, this paper proposes a novel binary similarity detection method, EdgeSim, which extracts and utilizes control transfer information between basic blocks for the first time in CFG-based BCSD.
Method:
EdgeSim employs a language model to extract semantic features of both basic blocks and the control transfer relationships between them. Basic block semantics are used as node features, while control transfer semantics are incorporated as edge features in CFGs. Furthermore, we design a novel edge feature-enhanced graph neural network (EGNN) to aggregate features of nodes and edges in CFG, leveraging control transfer information between basic blocks to learn more comprehensive graph embeddings of functions.
Results:
Experimental evaluations on datasets covering diverse architectures, optimization levels, and compilers demonstrate that EdgeSim improves the Recall@1 by over 25% compared to baseline approaches in one-to-many function search tasks under cross-compilation conditions. Additionally, in real-world firmware vulnerability search experiments, EdgeSim outperforms baselines in identifying all vulnerability functions while maintaining the highest mean reciprocal rank (MRR) metric and the lowest false positive rate (FPR).
Conclusion:
The experimental results indicate that integrating control transfer semantics substantially enhances CFG-based function representations. EdgeSim consistently delivers superior performance in binary similarity detection and firmware vulnerability discovery across diverse compilation environments.
{"title":"EdgeSim: Firmware vulnerability detection with control transfer-enhanced binary code similarity detection","authors":"Li Liu, Shen Wang, Xunzhi Jiang","doi":"10.1016/j.infsof.2026.108020","DOIUrl":"10.1016/j.infsof.2026.108020","url":null,"abstract":"<div><h3>Context:</h3><div>The widespread adoption of Internet of Things (IoT) devices has amplified the impact of vulnerabilities in embedded firmware. Binary code similarity detection (BCSD), a static analysis technique that compares functions without source code, plays an important role in firmware vulnerability detection. However, existing control flow graph (CFG)-based methods directly aggregate basic block features to learn structural information, neglecting the rich semantics in control transfers between basic blocks (i.e., CFG edges). This limitation leads to degraded performance under diverse compilation settings.</div></div><div><h3>Objective:</h3><div>To address the limitations of existing CFG-based similarity detection methods, this paper proposes a novel binary similarity detection method, EdgeSim, which extracts and utilizes control transfer information between basic blocks for the first time in CFG-based BCSD.</div></div><div><h3>Method:</h3><div>EdgeSim employs a language model to extract semantic features of both basic blocks and the control transfer relationships between them. Basic block semantics are used as node features, while control transfer semantics are incorporated as edge features in CFGs. Furthermore, we design a novel edge feature-enhanced graph neural network (EGNN) to aggregate features of nodes and edges in CFG, leveraging control transfer information between basic blocks to learn more comprehensive graph embeddings of functions.</div></div><div><h3>Results:</h3><div>Experimental evaluations on datasets covering diverse architectures, optimization levels, and compilers demonstrate that EdgeSim improves the Recall@1 by over 25% compared to baseline approaches in one-to-many function search tasks under cross-compilation conditions. Additionally, in real-world firmware vulnerability search experiments, EdgeSim outperforms baselines in identifying all vulnerability functions while maintaining the highest mean reciprocal rank (MRR) metric and the lowest false positive rate (FPR).</div></div><div><h3>Conclusion:</h3><div>The experimental results indicate that integrating control transfer semantics substantially enhances CFG-based function representations. EdgeSim consistently delivers superior performance in binary similarity detection and firmware vulnerability discovery across diverse compilation environments.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108020"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-12-31DOI: 10.1016/j.infsof.2025.108006
Jiajun Tong , Zhixiao Wang , Xiaobin Rui
Context:
Security patch identification is an important task in continuous integration and deployment, which helps software developers detect security issues and code vulnerabilities. Recent studies have confirmed that using both commit message and code diff information are beneficial to identification performance. However, existing works still face the problems of poor model representation ability and low model robustness, both of which affect the quality of commit representation, resulting in bad identification performance.
Objective:
We propose a gated transformer network for multivariate security patch identification with mixture-of-experts.
Method:
To improve the representation capability of the model and the quality of the commit representations, we provided a bi-encoder to utilize prior knowledge to enhance distinctive features for commit message and code diff respectively. To improve the robustness of the model and further improve the quality of commit representations, we designed a gated layer to learn the weight of each expert, and dynamically assign weights to different features.
Results:
Extensive experiments show that our framework has effectively improved the model representation ability, and the robustness of the model, providing high-quality commit representations, and achieves the state-of-the-art performance.
Conclusion:
Our approach provides a bi-encoder to obtain the embedding of each feature by two experts, and then explore the difference between them, by setting different weights through the gated layer. It not only improves the model representation ability but also improves the robustness of the model, thus having favorable applicability in real-world scenarios. The code and data are shared in https://github.com/AppleMax1992/ensemble_commit.
{"title":"Gated transformer network for multivariate security patch identification with mixture-of-experts","authors":"Jiajun Tong , Zhixiao Wang , Xiaobin Rui","doi":"10.1016/j.infsof.2025.108006","DOIUrl":"10.1016/j.infsof.2025.108006","url":null,"abstract":"<div><h3>Context:</h3><div>Security patch identification is an important task in continuous integration and deployment, which helps software developers detect security issues and code vulnerabilities. Recent studies have confirmed that using both commit message and code diff information are beneficial to identification performance. However, existing works still face the problems of poor model representation ability and low model robustness, both of which affect the quality of commit representation, resulting in bad identification performance.</div></div><div><h3>Objective:</h3><div>We propose a gated transformer network for multivariate security patch identification with mixture-of-experts.</div></div><div><h3>Method:</h3><div>To improve the representation capability of the model and the quality of the commit representations, we provided a bi-encoder to utilize prior knowledge to enhance distinctive features for commit message and code diff respectively. To improve the robustness of the model and further improve the quality of commit representations, we designed a gated layer to learn the weight of each expert, and dynamically assign weights to different features.</div></div><div><h3>Results:</h3><div>Extensive experiments show that our framework has effectively improved the model representation ability, and the robustness of the model, providing high-quality commit representations, and achieves the state-of-the-art performance.</div></div><div><h3>Conclusion:</h3><div>Our approach provides a bi-encoder to obtain the embedding of each feature by two experts, and then explore the difference between them, by setting different weights through the gated layer. It not only improves the model representation ability but also improves the robustness of the model, thus having favorable applicability in real-world scenarios. The code and data are shared in <span><span>https://github.com/AppleMax1992/ensemble_commit</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108006"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145891160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-12-31DOI: 10.1016/j.infsof.2025.108007
Wenjing Cai , Xin Liu , Lipeng Gao
In the field of software security, the detection of vulnerabilities in source code has become increasingly important. Traditional methods based on feature engineering and statistical models are inefficient when dealing with complex code structures and large-scale data, while deep learning approaches have shown significant potential. Many detection methods involve converting source code into images for analysis. Although scalable, convolutional neural networks often fail to fully comprehend the complex structure and semantic relationships in the code, resulting in inadequate capture of high-level semantic features, which affects the accuracy of detection. This study introduces an innovative vulnerability detection framework, VulSEG, which significantly improves detection accuracy while maintaining high scalability. We combine the Program Dependence Graph (PDG), Control Flow Graph (CFG), and Context Dependency Graph (CDG) to create a context-enhanced graph representation. Additionally, we develop a composite feature encoding strategy that integrates Syntax Tree (AST) encoding with deep semantic security coding (Word2Vec + Complexity- and Security-Weighted TF-IDF, CSW-TF-IDF) to enhance the understanding of code complexity and the accuracy of predicting potential vulnerabilities. By incorporating the Text Convolutional Neural Network (TextCNN) and Bidirectional Long Short-Term Memory (BiLSTM) models, we further enhance feature extraction and long-sequence dependency handling capabilities. The experimental results show that, compared to state-of-the-art methods, our approach improves accuracy by 11.8%.
{"title":"VulSEG: Enhanced graph-based vulnerability detection system with advanced text embedding","authors":"Wenjing Cai , Xin Liu , Lipeng Gao","doi":"10.1016/j.infsof.2025.108007","DOIUrl":"10.1016/j.infsof.2025.108007","url":null,"abstract":"<div><div>In the field of software security, the detection of vulnerabilities in source code has become increasingly important. Traditional methods based on feature engineering and statistical models are inefficient when dealing with complex code structures and large-scale data, while deep learning approaches have shown significant potential. Many detection methods involve converting source code into images for analysis. Although scalable, convolutional neural networks often fail to fully comprehend the complex structure and semantic relationships in the code, resulting in inadequate capture of high-level semantic features, which affects the accuracy of detection. This study introduces an innovative vulnerability detection framework, <em>VulSEG</em>, which significantly improves detection accuracy while maintaining high scalability. We combine the <em>Program Dependence Graph (PDG)</em>, <em>Control Flow Graph (CFG)</em>, and <em>Context Dependency Graph (CDG)</em> to create a context-enhanced graph representation. Additionally, we develop a composite feature encoding strategy that integrates <em>Syntax Tree (AST)</em> encoding with deep semantic security coding <em>(Word2Vec + Complexity- and Security-Weighted TF-IDF, CSW-TF-IDF)</em> to enhance the understanding of code complexity and the accuracy of predicting potential vulnerabilities. By incorporating the <em>Text Convolutional Neural Network (TextCNN)</em> and <em>Bidirectional Long Short-Term Memory (BiLSTM)</em> models, we further enhance feature extraction and long-sequence dependency handling capabilities. The experimental results show that, compared to state-of-the-art methods, our approach improves accuracy by 11.8%.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108007"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-06DOI: 10.1016/j.infsof.2026.108022
Dae-Kyoo Kim
Context:
The integration of large language models (LLMs) into software engineering has advanced toward agent-based automation across the development lifecycle. However, the comparative effectiveness of different multi-agent orchestration strategies remains underexplored.
Objective:
This study examines how three agent configuration strategies – Task-Specialized (TS), Phase-Specialized (PS), and Process-Generalist (PG) – impact the validity of software artifacts generated across key development tasks.
Methods:
Using a unified LLM backend within a structured orchestration framework, we evaluate the three configurations across nine core software engineering tasks – covering requirements analysis, design modeling, implementation, and testing – within three application domains: Tour Reservation System (TORS), Smart Wallet System (SWS), and Food Order and Delivery System (FODS). Artifact validity is measured using structural and semantic criteria.
Result:
No configuration consistently outperforms the others across all tasks. The overall average validity score is 0.56, with zero standard deviation, indicating uniformly constrained performance. Validity is highest in early requirements tasks (0.63–0.85), moderate in implementation and testing (0.61), and lowest in modeling tasks (0.25–0.42). TS agents perform well in modeling tasks due to focused specialization; PS agents benefit from contextual continuity in tasks like operation identification and test design, though performance varies; PG agents offer stable but less tailored performance across the pipeline. All configurations perform best in the TORS domain, which features simple and modular requirements.
Conclusions:
Artifact quality appears more influenced by the LLM’s capabilities than orchestration strategy alone. However, task- and domain-specific variations suggest that adaptive or hybrid orchestration strategies – tailored to both task type and domain context – can enhance the effectiveness of agent-assisted software development. These findings support the need for more targeted specialization strategies and possibly domain-adapted LLMs.
{"title":"Artifact validity under varying agent configurations in LLM-assisted software development: A comparative analysis","authors":"Dae-Kyoo Kim","doi":"10.1016/j.infsof.2026.108022","DOIUrl":"10.1016/j.infsof.2026.108022","url":null,"abstract":"<div><h3>Context:</h3><div>The integration of large language models (LLMs) into software engineering has advanced toward agent-based automation across the development lifecycle. However, the comparative effectiveness of different multi-agent orchestration strategies remains underexplored.</div></div><div><h3>Objective:</h3><div>This study examines how three agent configuration strategies – Task-Specialized (TS), Phase-Specialized (PS), and Process-Generalist (PG) – impact the validity of software artifacts generated across key development tasks.</div></div><div><h3>Methods:</h3><div>Using a unified LLM backend within a structured orchestration framework, we evaluate the three configurations across nine core software engineering tasks – covering requirements analysis, design modeling, implementation, and testing – within three application domains: Tour Reservation System (TORS), Smart Wallet System (SWS), and Food Order and Delivery System (FODS). Artifact validity is measured using structural and semantic criteria.</div></div><div><h3>Result:</h3><div>No configuration consistently outperforms the others across all tasks. The overall average validity score is 0.56, with zero standard deviation, indicating uniformly constrained performance. Validity is highest in early requirements tasks (0.63–0.85), moderate in implementation and testing (0.61), and lowest in modeling tasks (0.25–0.42). TS agents perform well in modeling tasks due to focused specialization; PS agents benefit from contextual continuity in tasks like operation identification and test design, though performance varies; PG agents offer stable but less tailored performance across the pipeline. All configurations perform best in the TORS domain, which features simple and modular requirements.</div></div><div><h3>Conclusions:</h3><div>Artifact quality appears more influenced by the LLM’s capabilities than orchestration strategy alone. However, task- and domain-specific variations suggest that adaptive or hybrid orchestration strategies – tailored to both task type and domain context – can enhance the effectiveness of agent-assisted software development. These findings support the need for more targeted specialization strategies and possibly domain-adapted LLMs.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108022"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-06DOI: 10.1016/j.infsof.2025.108010
Aitor Arrieta , Pablo Valle , Shaukat Ali
Context:
Stateflow models are widely used in the industry to model the high-level control logic of Cyber–Physical Systems (CPSs) in Simulink. Many approaches exist to test Simulink models, but once a fault is detected, the process to repair it remains manual. Such a manual process increases the software development cost. Automated Program Repair (APR) techniques can significantly reduce this cost by automatically generating patches that fix bugs. However, current approaches face scalability issues to be applicable in the CPS context.
Objectives:
The goal of this paper is to propose an APR method which is scalable for Stateflow models.
Method:
We propose an automated search-based approach called FlowRepair, explicitly designed to repair Stateflow models. The novelty of FlowRepair includes, (1) a new algorithm that combines global and local search for patch generation; (2) a definition of novel repair objectives specifically tailored for repairing CPSs; (3) a set of mutation operators to repair Stateflow models automatically; and (4) an evaluation on a new dataset encompassing 19 faulty stateflow models with real bugs.
Results:
Our results suggest that (1) FlowRepair can fix bugs in stateflow models; (2) FlowRepair surpasses or performs similarly to a baseline APR technique inspired by a well-known CPS program repair approach.
Conclusion:
This paper presents the first tool for APR CPSs whose high-level control program is developed in Simulink-Staflow. The results show that the approach is effective and scalable to such complex systems.
{"title":"FlowRepair: Search-based automated program repair of CPS controllers modeled in Simulink-Stateflow","authors":"Aitor Arrieta , Pablo Valle , Shaukat Ali","doi":"10.1016/j.infsof.2025.108010","DOIUrl":"10.1016/j.infsof.2025.108010","url":null,"abstract":"<div><h3>Context:</h3><div>Stateflow models are widely used in the industry to model the high-level control logic of Cyber–Physical Systems (CPSs) in Simulink. Many approaches exist to test Simulink models, but once a fault is detected, the process to repair it remains manual. Such a manual process increases the software development cost. Automated Program Repair (APR) techniques can significantly reduce this cost by automatically generating patches that fix bugs. However, current approaches face scalability issues to be applicable in the CPS context.</div></div><div><h3>Objectives:</h3><div>The goal of this paper is to propose an APR method which is scalable for Stateflow models.</div></div><div><h3>Method:</h3><div>We propose an automated search-based approach called <span>FlowRepair</span>, explicitly designed to repair Stateflow models. The novelty of <span>FlowRepair</span> includes, (1) a new algorithm that combines global and local search for patch generation; (2) a definition of novel repair objectives specifically tailored for repairing CPSs; (3) a set of mutation operators to repair Stateflow models automatically; and (4) an evaluation on a new dataset encompassing 19 faulty stateflow models with real bugs.</div></div><div><h3>Results:</h3><div>Our results suggest that (1) <span>FlowRepair</span> can fix bugs in stateflow models; (2) <span>FlowRepair</span> surpasses or performs similarly to a baseline APR technique inspired by a well-known CPS program repair approach.</div></div><div><h3>Conclusion:</h3><div>This paper presents the first tool for APR CPSs whose high-level control program is developed in Simulink-Staflow. The results show that the approach is effective and scalable to such complex systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108010"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Previous studies on Cross-Project Software Vulnerability Detection (CSVD) have shown that leveraging a small number of labeled modules from the target project can enhance the performance of CSVD. However, how to systematically select representative modules for labeling has not received sufficient attention. In addition, program modules can be measured using either expert or semantic metrics. There has been insufficient attention given to whether considering both metrics simultaneously helps in selecting representative modules.
Objective:
To address these challenges, we introduce a novel approach CSVD-AES. This method aims to fuse expert and semantic metrics and employs the active learning to select the most representative modules for labeling.
Methods:
CSVD-AES consists of three phases: the code representation phase, the active learning phase, and the model construction phase. In the code representation phase, a self-attention mechanism is used to fuse the metrics. In the active learning phase, an uncertainty sampling strategy is employed to select the most representative modules for labeling. In the model construction phase, the weighted cross-entropy (WCE) loss function is applied to address the class imbalance issue in the labeled modules. The metric fusion helps active learning identify representative modules. Since selecting modules can exacerbate the class imbalance issue in the labeled modules, we employ a sampling balancing strategy during the active learning phase to address this problem.
Results:
CSVD-AES is evaluated through a comprehensive study on four real-world projects. The results demonstrate that CSVD-AES outperforms five state-of-the-art baselines, achieving AUC improvements ranging from 4.0% to 24.4%. A series of ablation experiments verify the rationality of the CSVD-AES component settings.
Conclusion:
CSVD-AES effectively addresses the challenges in the field of CSVD by combining active learning and metric fusion, significantly advancing the development of this field.
{"title":"CSVD-AES: Cross-project software vulnerability detection based on active learning with metric fusion","authors":"Zhidan Yuan , Xiang Chen , Juan Zhang , Weiming Zeng","doi":"10.1016/j.infsof.2026.108015","DOIUrl":"10.1016/j.infsof.2026.108015","url":null,"abstract":"<div><h3>Context:</h3><div>Previous studies on Cross-Project Software Vulnerability Detection (CSVD) have shown that leveraging a small number of labeled modules from the target project can enhance the performance of CSVD. However, how to systematically select representative modules for labeling has not received sufficient attention. In addition, program modules can be measured using either expert or semantic metrics. There has been insufficient attention given to whether considering both metrics simultaneously helps in selecting representative modules.</div></div><div><h3>Objective:</h3><div>To address these challenges, we introduce a novel approach CSVD-AES. This method aims to fuse expert and semantic metrics and employs the active learning to select the most representative modules for labeling.</div></div><div><h3>Methods:</h3><div>CSVD-AES consists of three phases: the code representation phase, the active learning phase, and the model construction phase. In the code representation phase, a self-attention mechanism is used to fuse the metrics. In the active learning phase, an uncertainty sampling strategy is employed to select the most representative modules for labeling. In the model construction phase, the weighted cross-entropy (WCE) loss function is applied to address the class imbalance issue in the labeled modules. The metric fusion helps active learning identify representative modules. Since selecting modules can exacerbate the class imbalance issue in the labeled modules, we employ a sampling balancing strategy during the active learning phase to address this problem.</div></div><div><h3>Results:</h3><div>CSVD-AES is evaluated through a comprehensive study on four real-world projects. The results demonstrate that CSVD-AES outperforms five state-of-the-art baselines, achieving AUC improvements ranging from 4.0% to 24.4%. A series of ablation experiments verify the rationality of the CSVD-AES component settings.</div></div><div><h3>Conclusion:</h3><div>CSVD-AES effectively addresses the challenges in the field of CSVD by combining active learning and metric fusion, significantly advancing the development of this field.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108015"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI technologies are rapidly being integrated into society, offering numerous benefits but also raising significant ethical and social concerns. While some AI systems aim to improve efficiency and decision-making, they can also cause harmful impacts on individuals and society.
Objective:
This study examines both the immediate and systemic negative effects of AI systems, as well as the underlying factors that might contribute to these issues.
Method:
Using a multi-vocal literature review, we analyze 28 AI systems and their associated impacts, including discrimination, psychological and physical harm, and unfair treatment.
Results:
We identify key factors that might have led AI systems to operate in that manner and explain why these impacts may occur. Additionally, we propose initial concrete actions to mitigate these negative effects and promote the development of AI systems that align with ethical and social sustainability principles.
Impact:
By shedding light on these issues, we aim to raise awareness among researchers and developers, encouraging the adoption of more responsible and inclusive as well as concrete AI guidelines.
{"title":"AI systems’ negative social impact and factors","authors":"Nafen Haj Ahmad , Linnea Stigholt , Leticia Duboc , Birgit Penzenstadler","doi":"10.1016/j.infsof.2026.108038","DOIUrl":"10.1016/j.infsof.2026.108038","url":null,"abstract":"<div><h3>Context:</h3><div>AI technologies are rapidly being integrated into society, offering numerous benefits but also raising significant ethical and social concerns. While some AI systems aim to improve efficiency and decision-making, they can also cause harmful impacts on individuals and society.</div></div><div><h3>Objective:</h3><div>This study examines both the immediate and systemic negative effects of AI systems, as well as the underlying factors that might contribute to these issues.</div></div><div><h3>Method:</h3><div>Using a multi-vocal literature review, we analyze 28 AI systems and their associated impacts, including discrimination, psychological and physical harm, and unfair treatment.</div></div><div><h3>Results:</h3><div>We identify key factors that might have led AI systems to operate in that manner and explain why these impacts may occur. Additionally, we propose initial concrete actions to mitigate these negative effects and promote the development of AI systems that align with ethical and social sustainability principles.</div></div><div><h3>Impact:</h3><div>By shedding light on these issues, we aim to raise awareness among researchers and developers, encouraging the adoption of more responsible and inclusive as well as concrete AI guidelines.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108038"},"PeriodicalIF":4.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}