首页 > 最新文献

Information and Software Technology最新文献

英文 中文
Empirical analysis of generative AI tool adoption in software development 生成式人工智能工具在软件开发中的应用实证分析
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-13 DOI: 10.1016/j.infsof.2026.108036
Deo Shao, Fredrick Ishengoma

Context

Software development is evolving with the emergence of Generative AI (GAI) tools that boost productivity, reduce manual errors, and accelerate workflows. However, little is known about how users perceive the usability, effectiveness, and security of these tools, especially among varied user populations.

Objectives

This study examines the determinants of GAI tool adoption. Specifically, it examines the behavioural determinants driving GAI adoption in software development and investigates how students compare with professionals in their perception of GAI adoption.

Methods

This study employs a cross-sectional, quantitative approach, comprising structured surveys distributed to software engineering students and senior engineers. The survey was designed based on the UTAUT framework. Data was collected from 305 participants (125 students, 133 professional developers, and 47 other tech professionals; industry total = 180). Descriptive statistics, t-tests, and regression analysis were conducted to analyse data and report trends and predictors of adoption intention.

Results

Social influence was the most important predictor of adoption intention (β = 0.945, p< 0.001), and its effect differed between groups. Compared to professionals, students are more cautious about security, though their responses are less technically specific. Professional developers employ systematic refinement strategies; a large percentage make extensive code changes to improve maintainability and ensure architectural alignment. By contrast, students exhibit different usage behaviour, focusing more on getting the final product working but less on code refinement and security issues.

Conclusion

This study fills the empirical gap in the diffusion of generative AI into software development. The findings suggest different patterns between students and professional developers. The results are of interest to educators, developers, and industry leaders. Future studies should examine adoption trends among a broader range of user groups and assess the long-term effects of GAI tools on software engineering.
随着生成式人工智能(GAI)工具的出现,软件开发正在不断发展,这些工具提高了生产力,减少了人工错误,并加快了工作流程。然而,对于用户如何看待这些工具的可用性、有效性和安全性,特别是在不同的用户群体中,我们所知甚少。目的本研究探讨GAI工具采用的决定因素。具体来说,它考察了在软件开发中推动GAI采用的行为决定因素,并调查了学生与专业人员在GAI采用的看法方面的比较。方法本研究采用横断面定量方法,包括结构化调查,分发给软件工程学生和高级工程师。该调查是根据UTAUT框架设计的。数据收集自305名参与者(125名学生,133名专业开发人员和47名其他技术专业人员;行业总数= 180)。采用描述性统计、t检验和回归分析来分析数据并报告采用意向的趋势和预测因素。结果社会影响是影响收养意向的最重要预测因子(β = 0.945, p< 0.001),且其影响在组间存在差异。与专业人士相比,学生在安全问题上更加谨慎,尽管他们的回答在技术上不那么具体。专业开发人员采用系统化的细化策略;很大一部分人进行了大量的代码更改,以提高可维护性并确保体系结构的一致性。相比之下,学生表现出不同的使用行为,他们更多地关注最终产品的工作,而较少关注代码的改进和安全问题。结论本研究填补了生成式人工智能在软件开发中推广的经验空白。研究结果表明,学生和专业开发人员之间存在不同的模式。其结果引起了教育工作者、开发人员和行业领导者的兴趣。未来的研究应该在更广泛的用户群体中检查采用趋势,并评估GAI工具对软件工程的长期影响。
{"title":"Empirical analysis of generative AI tool adoption in software development","authors":"Deo Shao,&nbsp;Fredrick Ishengoma","doi":"10.1016/j.infsof.2026.108036","DOIUrl":"10.1016/j.infsof.2026.108036","url":null,"abstract":"<div><h3>Context</h3><div>Software development is evolving with the emergence of Generative AI (GAI) tools that boost productivity, reduce manual errors, and accelerate workflows. However, little is known about how users perceive the usability, effectiveness, and security of these tools, especially among varied user populations.</div></div><div><h3>Objectives</h3><div>This study examines the determinants of GAI tool adoption. Specifically, it examines the behavioural determinants driving GAI adoption in software development and investigates how students compare with professionals in their perception of GAI adoption.</div></div><div><h3>Methods</h3><div>This study employs a cross-sectional, quantitative approach, comprising structured surveys distributed to software engineering students and senior engineers. The survey was designed based on the UTAUT framework. Data was collected from 305 participants (125 students, 133 professional developers, and 47 other tech professionals; industry total = 180). Descriptive statistics, <em>t</em>-tests, and regression analysis were conducted to analyse data and report trends and predictors of adoption intention.</div></div><div><h3>Results</h3><div>Social influence was the most important predictor of adoption intention (<em>β</em> = 0.945, <em>p</em>&lt; 0.001), and its effect differed between groups. Compared to professionals, students are more cautious about security, though their responses are less technically specific. Professional developers employ systematic refinement strategies; a large percentage make extensive code changes to improve maintainability and ensure architectural alignment. By contrast, students exhibit different usage behaviour, focusing more on getting the final product working but less on code refinement and security issues.</div></div><div><h3>Conclusion</h3><div>This study fills the empirical gap in the diffusion of generative AI into software development. The findings suggest different patterns between students and professional developers. The results are of interest to educators, developers, and industry leaders. Future studies should examine adoption trends among a broader range of user groups and assess the long-term effects of GAI tools on software engineering.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108036"},"PeriodicalIF":4.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consensus planning boosts LLM code generation 共识计划促进LLM代码生成
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.infsof.2026.108030
Chao Wen , Jie Liu , Liang Du
While large language models (LLMs) have demonstrated impressive ability in natural language processing (NLP), they are struggling for addressing the code generation tasks with complicated human intent. It is universally recognized that humans require insights into problem descriptions, elaborate plans from collaborative perspectives and consciously organize modules prior to coding implementation. To achieve this aim, we introduce consensus to boost multi-agent prompting approach to code generation tasks by imitating human developers. The insights into consensus among distinct candidate plans are leveraged by LLM agent for mitigating discrepancies. The discrepancies indicate overlooked crucial details that may lead to potential errors. Besides, the consensus plan is exploited to firstly construct code modules at distinct levels and then hierarchically organize them for final code generation. We conduct extensive experiments on eight program synthesis benchmarks, three of which are challenging problem-solving. Experimental results show that the proposed framework showcases the improved reflection on code generation, achieving new state-of-the-art (pass@1) results. Moreover, our approach consistently delivers superior performance across various programming languages and varying problem difficulties. Code available at https://github.com/AISP-group/CPCG.
虽然大型语言模型(llm)在自然语言处理(NLP)方面表现出了令人印象深刻的能力,但它们在处理具有复杂人类意图的代码生成任务方面仍处于挣扎状态。人们普遍认为,在编码实现之前,人类需要洞察问题描述,从协作的角度制定详细计划,并有意识地组织模块。为了实现这一目标,我们引入共识,通过模仿人类开发人员来促进多智能体提示方法的代码生成任务。LLM代理利用对不同候选计划之间共识的洞察来减少差异。这些差异表明忽视了可能导致潜在错误的关键细节。此外,利用共识计划首先构建不同层次的代码模块,然后将它们分层组织以最终生成代码。我们在八个程序合成基准上进行了广泛的实验,其中三个是具有挑战性的问题解决。实验结果表明,该框架在代码生成方面表现出改进的反射,获得了新的最先进的结果(pass@1)。此外,我们的方法在各种编程语言和各种问题困难中始终提供卓越的性能。代码可从https://github.com/AISP-group/CPCG获得。
{"title":"Consensus planning boosts LLM code generation","authors":"Chao Wen ,&nbsp;Jie Liu ,&nbsp;Liang Du","doi":"10.1016/j.infsof.2026.108030","DOIUrl":"10.1016/j.infsof.2026.108030","url":null,"abstract":"<div><div>While large language models (LLMs) have demonstrated impressive ability in natural language processing (NLP), they are struggling for addressing the code generation tasks with complicated human intent. It is universally recognized that humans require insights into problem descriptions, elaborate plans from collaborative perspectives and consciously organize modules prior to coding implementation. To achieve this aim, we introduce consensus to boost multi-agent prompting approach to code generation tasks by imitating human developers. The insights into consensus among distinct candidate plans are leveraged by LLM agent for mitigating discrepancies. The discrepancies indicate overlooked crucial details that may lead to potential errors. Besides, the consensus plan is exploited to firstly construct code modules at distinct levels and then hierarchically organize them for final code generation. We conduct extensive experiments on eight program synthesis benchmarks, three of which are challenging problem-solving. Experimental results show that the proposed framework showcases the improved reflection on code generation, achieving new state-of-the-art (pass@1) results. Moreover, our approach consistently delivers superior performance across various programming languages and varying problem difficulties. Code available at <span><span>https://github.com/AISP-group/CPCG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108030"},"PeriodicalIF":4.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wise recommender: LLMs refined by iterative critics 明智的推荐人:经过反复批评的法学硕士
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-09 DOI: 10.1016/j.infsof.2026.108021
Zhisheng Yang , Xiaofei Xu , Ke Deng , Li Li

Context:

Large Language Models (LLMs) have been applied to recommendation tasks, giving rise to the new paradigm of LLM-as-Recommendation Systems (LLM-as-RS). Existing methods fall into two categories: tuning and non-tuning. While tuning strategies offer better task alignment, they are expensive and require specialized training. Non-tuning strategies are easier to deploy but often lack task-specific knowledge, limiting their effectiveness.

Objective:

This study aims to enhance the recommendation quality of non-tuning LLM-based systems by addressing their lack of task awareness.

Method:

We propose a novel approach, Critique-based LLMs as Recommendation Systems (Critic-LLM-RS), which introduces an independent machine learning model—the Recommendation Critic—to provide feedback on LLM-generated recommendations and guide the LLM toward improved recommendation strategies.

Results:

Experiments on multiple real-world datasets demonstrate that Critic-LLM-RS significantly outperforms existing non-tuning approaches, regardless of whether open-source or proprietary LLMs are used.

Conclusion:

Critic-LLM-RS enhances the task adaptability of non-tuning LLMs through a collaborative feedback mechanism, offering a new solution for building efficient and easily deployable recommendation systems.
背景:大型语言模型(llm)已经被应用于推荐任务,产生了llm -as-推荐系统(LLM-as-RS)的新范式。现有的方法分为两类:调优和非调优。虽然调优策略提供了更好的任务一致性,但它们是昂贵的,并且需要专门的培训。非调优策略更容易部署,但通常缺乏特定于任务的知识,从而限制了它们的有效性。目的:本研究旨在通过解决非调优llm系统缺乏任务意识的问题来提高其推荐质量。方法:我们提出了一种新颖的方法,基于批评的法学硕士作为推荐系统(critical -LLM- rs),它引入了一个独立的机器学习模型——推荐批评——来对法学硕士生成的建议提供反馈,并指导法学硕士改进推荐策略。结果:在多个真实世界数据集上的实验表明,无论使用开源还是专有的llm, critical - llm - rs都明显优于现有的非调优方法。结论:critical - llm - rs通过协同反馈机制增强了非调优llm的任务适应性,为构建高效且易于部署的推荐系统提供了新的解决方案。
{"title":"Wise recommender: LLMs refined by iterative critics","authors":"Zhisheng Yang ,&nbsp;Xiaofei Xu ,&nbsp;Ke Deng ,&nbsp;Li Li","doi":"10.1016/j.infsof.2026.108021","DOIUrl":"10.1016/j.infsof.2026.108021","url":null,"abstract":"<div><h3>Context:</h3><div>Large Language Models (LLMs) have been applied to recommendation tasks, giving rise to the new paradigm of LLM-as-Recommendation Systems (LLM-as-RS). Existing methods fall into two categories: tuning and non-tuning. While tuning strategies offer better task alignment, they are expensive and require specialized training. Non-tuning strategies are easier to deploy but often lack task-specific knowledge, limiting their effectiveness.</div></div><div><h3>Objective:</h3><div>This study aims to enhance the recommendation quality of non-tuning LLM-based systems by addressing their lack of task awareness.</div></div><div><h3>Method:</h3><div>We propose a novel approach, Critique-based LLMs as Recommendation Systems (Critic-LLM-RS), which introduces an independent machine learning model—the Recommendation Critic—to provide feedback on LLM-generated recommendations and guide the LLM toward improved recommendation strategies.</div></div><div><h3>Results:</h3><div>Experiments on multiple real-world datasets demonstrate that Critic-LLM-RS significantly outperforms existing non-tuning approaches, regardless of whether open-source or proprietary LLMs are used.</div></div><div><h3>Conclusion:</h3><div>Critic-LLM-RS enhances the task adaptability of non-tuning LLMs through a collaborative feedback mechanism, offering a new solution for building efficient and easily deployable recommendation systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108021"},"PeriodicalIF":4.3,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
User-centric requirements prioritization in mHealth applications: Insights from a Discrete Choice Experiment 移动医疗应用中以用户为中心的需求优先级:来自离散选择实验的见解
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-08 DOI: 10.1016/j.infsof.2026.108014
Wei Wang , Hourieh Khalajzadeh , John Grundy , Anuradha Madugalla , Humphrey O. Obie

Context:

Mobile health (mHealth) applications are widely used for chronic disease management, but usability and accessibility challenges persist due to the diverse needs of users. Adaptive User Interfaces (AUIs) offer a promising approach to personalizing interactions and improving user experience. However, their adoption remains limited, partly due to a lack of understanding of how users perceive and evaluate different adaptation strategies. Addressing this gap is crucial for advancing user-centered design and requirements engineering in software systems for health contexts.

Objective:

This study identifies key factors influencing user preferences and trade-offs in mHealth adaptation design.

Method:

A Discrete Choice Experiment (DCE) was conducted with 186 participants living with chronic conditions who regularly use mHealth applications. Each participant completed a series of choice tasks, selecting their preferred adaptation designs from scenarios composed of six attributes with varying levels. A mixed logit model was applied to examine preference heterogeneity. Subgroup analyses were also conducted to explore variations in preferences across age, gender, health condition, and coping mechanism.

Results:

Participants preferred adaptation designs that preserved usability, offered controllability, introduced changes infrequently, and applied small-scale modifications. Conversely, adaptations affecting frequently used functions and those involving caregiver input were generally viewed less favorably. These findings highlight key trade-offs that influence user acceptance of adaptive mHealth interfaces.

Conclusion:

This study employs a data-driven approach to quantify user preferences, identify key trade-offs, and reveal variations across demographic and behavioral subgroups through preference heterogeneity modeling. These insights provide actionable guidance for designing more user-centered adaptive interfaces and contribute to advancing requirements prioritization practices in software engineering—particularly in the context of health technologies.
背景:移动健康(mHealth)应用程序广泛用于慢性疾病管理,但由于用户的不同需求,可用性和可访问性方面的挑战仍然存在。自适应用户界面(AUIs)为个性化交互和改善用户体验提供了一种很有前途的方法。然而,它们的采用仍然有限,部分原因是缺乏对用户如何感知和评价不同适应策略的了解。解决这一差距对于推进卫生环境软件系统中以用户为中心的设计和需求工程至关重要。目的:本研究确定了影响移动医疗适应性设计中用户偏好和权衡的关键因素。方法:对186名经常使用移动健康应用程序的慢性疾病患者进行离散选择实验(DCE)。每个参与者完成一系列的选择任务,从六个不同级别的属性组成的场景中选择他们喜欢的适应设计。采用混合logit模型检验偏好异质性。亚组分析还探讨了不同年龄、性别、健康状况和应对机制对偏好的影响。结果:参与者更喜欢保留可用性、提供可控性、不频繁引入变化和应用小规模修改的适应性设计。相反,影响经常使用的功能和涉及护理人员输入的适应通常不太受欢迎。这些发现突出了影响用户接受适应性移动健康界面的关键权衡。结论:本研究采用数据驱动的方法来量化用户偏好,确定关键的权衡,并通过偏好异质性模型揭示人口统计学和行为亚组之间的差异。这些见解为设计更多以用户为中心的自适应界面提供了可操作的指导,并有助于推进软件工程中的需求优先级实践——特别是在卫生技术的背景下。
{"title":"User-centric requirements prioritization in mHealth applications: Insights from a Discrete Choice Experiment","authors":"Wei Wang ,&nbsp;Hourieh Khalajzadeh ,&nbsp;John Grundy ,&nbsp;Anuradha Madugalla ,&nbsp;Humphrey O. Obie","doi":"10.1016/j.infsof.2026.108014","DOIUrl":"10.1016/j.infsof.2026.108014","url":null,"abstract":"<div><h3>Context:</h3><div>Mobile health (mHealth) applications are widely used for chronic disease management, but usability and accessibility challenges persist due to the diverse needs of users. Adaptive User Interfaces (AUIs) offer a promising approach to personalizing interactions and improving user experience. However, their adoption remains limited, partly due to a lack of understanding of how users perceive and evaluate different adaptation strategies. Addressing this gap is crucial for advancing user-centered design and requirements engineering in software systems for health contexts.</div></div><div><h3>Objective:</h3><div>This study identifies key factors influencing user preferences and trade-offs in mHealth adaptation design.</div></div><div><h3>Method:</h3><div>A Discrete Choice Experiment (DCE) was conducted with 186 participants living with chronic conditions who regularly use mHealth applications. Each participant completed a series of choice tasks, selecting their preferred adaptation designs from scenarios composed of six attributes with varying levels. A mixed logit model was applied to examine preference heterogeneity. Subgroup analyses were also conducted to explore variations in preferences across age, gender, health condition, and coping mechanism.</div></div><div><h3>Results:</h3><div>Participants preferred adaptation designs that preserved usability, offered controllability, introduced changes infrequently, and applied small-scale modifications. Conversely, adaptations affecting frequently used functions and those involving caregiver input were generally viewed less favorably. These findings highlight key trade-offs that influence user acceptance of adaptive mHealth interfaces.</div></div><div><h3>Conclusion:</h3><div>This study employs a data-driven approach to quantify user preferences, identify key trade-offs, and reveal variations across demographic and behavioral subgroups through preference heterogeneity modeling. These insights provide actionable guidance for designing more user-centered adaptive interfaces and contribute to advancing requirements prioritization practices in software engineering—particularly in the context of health technologies.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108014"},"PeriodicalIF":4.3,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Requirements-driven analysis of variability in configurable software 对可配置软件中可变性的需求驱动分析
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-07 DOI: 10.1016/j.infsof.2026.108017
Chin Khor, Robyn R. Lutz

Context:

It is difficult, time-consuming, and error-prone to detect misalignments between the variability requirements in configurable software and the source code intended to implement those requirements.

Objective:

The paper reports progress in checking the consistency between variability requirements and their implementation.

Method:

To automate the consistency checking of variability requirements and variability source code, we create a variability model of configurable features and constraints from the requirements specification. We evaluate the consistency of the variability model against a formal representation of the presence conditions controlling variability in the source code. We generate a traceability-rich consistency dashboard for the developer of any misalignments and a minimal set of configurations providing full variability code coverage for variability testing. The approach is implemented in an open-source prototype tool called VarCHEK.

Results:

VarCHEK was evaluated on three diverse, configurable software projects. VarCHEK accurately identified variability requirements not implemented in the source code, found variabilities in the source code not specified in the requirements, and provided more relevant information to the user for troubleshooting and resolving inconsistencies than is currently available.

Conclusion:

This paper describes a new, practical way to automatically identify inconsistencies between the variability requirements specified for configurable software and the source code developed to implement those requirements.
上下文:检测可配置软件中的可变性需求和用于实现这些需求的源代码之间的不一致是困难的,耗时的,并且容易出错的。目的:本文报告了在检查可变性需求及其实施之间的一致性方面的进展。方法:为了自动化可变性需求和可变性源代码的一致性检查,我们从需求规范中创建可配置特性和约束的可变性模型。我们根据源代码中控制可变性的存在条件的形式化表示来评估可变性模型的一致性。我们为开发人员生成一个可跟踪性丰富的一致性仪表板,用于任何不一致,并为可变性测试提供完整的可变性代码覆盖的最小配置集。该方法是在一个名为VarCHEK的开源原型工具中实现的。结果:VarCHEK在三个不同的、可配置的软件项目上进行了评估。VarCHEK准确地识别了未在源代码中实现的可变性需求,发现了未在需求中指定的源代码中的可变性,并为用户提供了比当前可用的更多的有关故障排除和解决不一致的信息。结论:本文描述了一种新的、实用的方法来自动识别为可配置软件指定的可变性需求和为实现这些需求而开发的源代码之间的不一致性。
{"title":"Requirements-driven analysis of variability in configurable software","authors":"Chin Khor,&nbsp;Robyn R. Lutz","doi":"10.1016/j.infsof.2026.108017","DOIUrl":"10.1016/j.infsof.2026.108017","url":null,"abstract":"<div><h3>Context:</h3><div>It is difficult, time-consuming, and error-prone to detect misalignments between the variability requirements in configurable software and the source code intended to implement those requirements.</div></div><div><h3>Objective:</h3><div>The paper reports progress in checking the consistency between variability requirements and their implementation.</div></div><div><h3>Method:</h3><div>To automate the consistency checking of variability requirements and variability source code, we create a variability model of configurable features and constraints from the requirements specification. We evaluate the consistency of the variability model against a formal representation of the presence conditions controlling variability in the source code. We generate a traceability-rich consistency dashboard for the developer of any misalignments and a minimal set of configurations providing full variability code coverage for variability testing. The approach is implemented in an open-source prototype tool called VarCHEK.</div></div><div><h3>Results:</h3><div>VarCHEK was evaluated on three diverse, configurable software projects. VarCHEK accurately identified variability requirements not implemented in the source code, found variabilities in the source code not specified in the requirements, and provided more relevant information to the user for troubleshooting and resolving inconsistencies than is currently available.</div></div><div><h3>Conclusion:</h3><div>This paper describes a new, practical way to automatically identify inconsistencies between the variability requirements specified for configurable software and the source code developed to implement those requirements.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108017"},"PeriodicalIF":4.3,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Android malware detection by using graph optimization of static features based on pre-trained language models 基于预训练语言模型的静态特征图优化Android恶意软件检测
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-06 DOI: 10.1016/j.infsof.2026.108012
Nghi Hoang Khoa, Doan Minh Trung, Duong The Dat, Phan The Duy, Van-Hau Pham, Nguyen Tan Cam

Context:

The Android platform is the dominant mobile operating system, making it a prime target for malware attacks. The increasing complexity of Android malware necessitates advanced detection methods that integrate modern machine learning techniques with security analysis.

Objectives:

This study aims to enhance Android malware detection and classification by leveraging pre-trained language models (PLMs) alongside graph learning techniques. The primary objective is to address challenges in transforming graph-based data into a format compatible with PLMs while preserving essential relational information.

Methods:

We propose APSDroid, a novel approach that combines permissions-intents (PIs) and API call graphs (ACGs) for Android malware analysis. APSDroid explores two distinct fusion strategies: raw-data-level fusion and feature-level fusion (including concatenation, self-attention, and cross-attention mechanisms) to evaluate their effectiveness in enriching semantic representations and improving detection robustness. The approach also incorporates forensic analysis to extract meaningful behavioral patterns and graph optimization techniques based on community detection and centrality measures to reduce complexity while retaining contextual flow.

Results:

Experiments conducted on the CICMalDroid2020 dataset demonstrate the effectiveness of APSDroid. The model achieved an accuracy of 97.40% in malware detection and 94.23% in malware category classification. Furthermore, APSDroid with attention-based fusion mechanisms (e.g., self-attention and cross-attention) remained highly robust against obfuscation techniques, achieving 98.69% F1-score in binary classification and 83.98% in multi-class classification, outperforming several SOTA methods.

Conclusion:

APSDroid provides a robust and scalable solution for Android malware detection by integrating PLMs with optimized graph-based representations. This approach enhances malware analysis while addressing key challenges related to computational efficiency and relational data preservation. Future research will focus on improving scalability and extending APSDroid to detect emerging malware variants.
背景:Android平台是占主导地位的移动操作系统,使其成为恶意软件攻击的主要目标。Android恶意软件日益复杂,需要先进的检测方法,将现代机器学习技术与安全分析相结合。目的:本研究旨在通过利用预训练语言模型(PLMs)和图学习技术来增强Android恶意软件的检测和分类。主要目标是解决将基于图的数据转换为与plm兼容的格式的挑战,同时保留基本的关系信息。方法:我们提出了APSDroid,这是一种结合权限意图(pi)和API调用图(acg)的Android恶意软件分析的新方法。APSDroid探索了两种不同的融合策略:原始数据级融合和特征级融合(包括连接、自注意和交叉注意机制),以评估它们在丰富语义表示和提高检测鲁棒性方面的有效性。该方法还结合了取证分析,以提取有意义的行为模式,以及基于社区检测和中心性度量的图形优化技术,以降低复杂性,同时保留上下文流。结果:在CICMalDroid2020数据集上进行的实验验证了APSDroid的有效性。该模型的恶意软件检测准确率为97.40%,恶意软件分类准确率为94.23%。此外,具有基于注意力的融合机制(如自我注意和交叉注意)的APSDroid对混淆技术保持高度鲁棒性,在二元分类中获得98.69%的f1得分,在多类分类中获得83.98%的f1得分,优于几种SOTA方法。结论:APSDroid通过集成plm和优化的基于图形的表示,为Android恶意软件检测提供了一个强大且可扩展的解决方案。这种方法增强了恶意软件分析,同时解决了与计算效率和关系数据保存相关的关键挑战。未来的研究将集中在提高可扩展性和扩展APSDroid来检测新出现的恶意软件变体。
{"title":"Android malware detection by using graph optimization of static features based on pre-trained language models","authors":"Nghi Hoang Khoa,&nbsp;Doan Minh Trung,&nbsp;Duong The Dat,&nbsp;Phan The Duy,&nbsp;Van-Hau Pham,&nbsp;Nguyen Tan Cam","doi":"10.1016/j.infsof.2026.108012","DOIUrl":"10.1016/j.infsof.2026.108012","url":null,"abstract":"<div><h3>Context:</h3><div>The Android platform is the dominant mobile operating system, making it a prime target for malware attacks. The increasing complexity of Android malware necessitates advanced detection methods that integrate modern machine learning techniques with security analysis.</div></div><div><h3>Objectives:</h3><div>This study aims to enhance Android malware detection and classification by leveraging pre-trained language models (PLMs) alongside graph learning techniques. The primary objective is to address challenges in transforming graph-based data into a format compatible with PLMs while preserving essential relational information.</div></div><div><h3>Methods:</h3><div>We propose <em>APSDroid</em>, a novel approach that combines permissions-intents (PIs) and API call graphs (ACGs) for Android malware analysis. APSDroid explores two distinct fusion strategies: raw-data-level fusion and feature-level fusion (including concatenation, self-attention, and cross-attention mechanisms) to evaluate their effectiveness in enriching semantic representations and improving detection robustness. The approach also incorporates forensic analysis to extract meaningful behavioral patterns and graph optimization techniques based on community detection and centrality measures to reduce complexity while retaining contextual flow.</div></div><div><h3>Results:</h3><div>Experiments conducted on the <em>CICMalDroid2020</em> dataset demonstrate the effectiveness of APSDroid. The model achieved an accuracy of <strong>97.40%</strong> in malware detection and <strong>94.23%</strong> in malware category classification. Furthermore, APSDroid with attention-based fusion mechanisms (e.g., self-attention and cross-attention) remained highly robust against obfuscation techniques, achieving <strong>98.69%</strong> F1-score in binary classification and <strong>83.98%</strong> in multi-class classification, outperforming several SOTA methods.</div></div><div><h3>Conclusion:</h3><div>APSDroid provides a robust and scalable solution for Android malware detection by integrating PLMs with optimized graph-based representations. This approach enhances malware analysis while addressing key challenges related to computational efficiency and relational data preservation. Future research will focus on improving scalability and extending APSDroid to detect emerging malware variants.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108012"},"PeriodicalIF":4.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BADS: A backdoor attack against code intent summarization engines BADS:针对代码意图摘要引擎的后门攻击
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-06 DOI: 10.1016/j.infsof.2026.108018
Yubin Qu , Binyong Li , Song Huang , Peng Nie , Long Li , Yongming Yao

Context:

Code Large Language Models (LLMs) have transformed software engineering practices by automating code understanding and documentation generation. However, relying on third-party training datasets and pre-trained models introduces significant security vulnerabilities. This is particularly critical in code intent summarization, where security implications must be accurately identified and communicated.

Objective:

This study investigates backdoor attacks targeting code intent summarization in LLMs, with particular focus on security-related code documentation. Our work addresses a critical gap in existing literature by proposing a novel structure-based backdoor attack method that exploits abstract syntax tree features as triggers, effectively manipulating model outputs while maintaining normal behavior for benign inputs.

Methods:

We designed a two-stage approach comprising trigger design and model fine-tuning. Our method first conducts a statistical analysis of abstract syntax tree depths in source code, then selects appropriate samples for poisoning based on their structural characteristics. We employed parameter-efficient fine-tuning techniques (LoRA) to embed these backdoors into code LLMs, and developed comprehensive evaluation metrics based on the similarity between security and functional intent summaries to assess attack effectiveness.

Results:

Our findings demonstrate that the proposed BADS method achieves superior attack success rates compared to baseline approaches, maintaining high effectiveness (92.41% ASR) even against state-of-the-art defense strategies. While traditional grammar-based triggers (PCFG) show advantages at lower poisoning ratios, our structure-based approach exhibits remarkable performance at higher ratios, highlighting a critical vulnerability in current code intent summarization systems.

Conclusions:

By focusing on structure-based backdoor attacks in code intent summarization, this study provides unique insights into securing code LLMs against supply chain attacks. Our work reveals how seemingly benign structural patterns can be exploited to manipulate security-critical documentation, laying the foundation for future research on developing more robust defense mechanisms against stealthy backdoor attacks in software engineering applications.
上下文:代码大型语言模型(llm)通过自动化代码理解和文档生成已经改变了软件工程实践。然而,依赖第三方训练数据集和预训练模型会引入重大的安全漏洞。这在代码意图总结中尤其重要,因为必须准确地识别和传达安全含义。目的:本研究调查了针对法学硕士代码意图总结的后门攻击,特别关注与安全相关的代码文档。我们的工作解决了现有文献中的一个关键空白,提出了一种新的基于结构的后门攻击方法,该方法利用抽象语法树特征作为触发器,有效地操纵模型输出,同时保持良性输入的正常行为。方法:我们设计了一个两阶段的方法,包括触发设计和模型微调。我们的方法首先对源代码中的抽象语法树深度进行统计分析,然后根据其结构特征选择合适的样本进行投毒。我们采用参数高效微调技术(LoRA)将这些后门嵌入到代码llm中,并基于安全性和功能意图摘要之间的相似性开发了综合评估指标,以评估攻击有效性。结果:我们的研究结果表明,与基线方法相比,所提出的BADS方法实现了更高的攻击成功率,即使面对最先进的防御策略,也保持了较高的有效性(92.41%的ASR)。虽然传统的基于语法的触发器(PCFG)在较低的中毒率下显示出优势,但我们基于结构的方法在较高的中毒率下表现出显著的性能,突出了当前代码意图摘要系统中的一个关键漏洞。结论:通过关注代码意图总结中基于结构的后门攻击,本研究为保护代码llm免受供应链攻击提供了独特的见解。我们的工作揭示了如何利用看似良性的结构模式来操纵安全关键文档,为未来研究开发更强大的防御机制来抵御软件工程应用程序中的隐形后门攻击奠定了基础。
{"title":"BADS: A backdoor attack against code intent summarization engines","authors":"Yubin Qu ,&nbsp;Binyong Li ,&nbsp;Song Huang ,&nbsp;Peng Nie ,&nbsp;Long Li ,&nbsp;Yongming Yao","doi":"10.1016/j.infsof.2026.108018","DOIUrl":"10.1016/j.infsof.2026.108018","url":null,"abstract":"<div><h3>Context:</h3><div>Code Large Language Models (LLMs) have transformed software engineering practices by automating code understanding and documentation generation. However, relying on third-party training datasets and pre-trained models introduces significant security vulnerabilities. This is particularly critical in code intent summarization, where security implications must be accurately identified and communicated.</div></div><div><h3>Objective:</h3><div>This study investigates backdoor attacks targeting code intent summarization in LLMs, with particular focus on security-related code documentation. Our work addresses a critical gap in existing literature by proposing a novel structure-based backdoor attack method that exploits abstract syntax tree features as triggers, effectively manipulating model outputs while maintaining normal behavior for benign inputs.</div></div><div><h3>Methods:</h3><div>We designed a two-stage approach comprising trigger design and model fine-tuning. Our method first conducts a statistical analysis of abstract syntax tree depths in source code, then selects appropriate samples for poisoning based on their structural characteristics. We employed parameter-efficient fine-tuning techniques (LoRA) to embed these backdoors into code LLMs, and developed comprehensive evaluation metrics based on the similarity between security and functional intent summaries to assess attack effectiveness.</div></div><div><h3>Results:</h3><div>Our findings demonstrate that the proposed BADS method achieves superior attack success rates compared to baseline approaches, maintaining high effectiveness (92.41% ASR) even against state-of-the-art defense strategies. While traditional grammar-based triggers (PCFG) show advantages at lower poisoning ratios, our structure-based approach exhibits remarkable performance at higher ratios, highlighting a critical vulnerability in current code intent summarization systems.</div></div><div><h3>Conclusions:</h3><div>By focusing on structure-based backdoor attacks in code intent summarization, this study provides unique insights into securing code LLMs against supply chain attacks. Our work reveals how seemingly benign structural patterns can be exploited to manipulate security-critical documentation, laying the foundation for future research on developing more robust defense mechanisms against stealthy backdoor attacks in software engineering applications.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108018"},"PeriodicalIF":4.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlowRepair: Search-based automated program repair of CPS controllers modeled in Simulink-Stateflow FlowRepair:在simulink - statflow中建模的CPS控制器的基于搜索的自动程序修复
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-06 DOI: 10.1016/j.infsof.2025.108010
Aitor Arrieta , Pablo Valle , Shaukat Ali

Context:

Stateflow models are widely used in the industry to model the high-level control logic of Cyber–Physical Systems (CPSs) in Simulink. Many approaches exist to test Simulink models, but once a fault is detected, the process to repair it remains manual. Such a manual process increases the software development cost. Automated Program Repair (APR) techniques can significantly reduce this cost by automatically generating patches that fix bugs. However, current approaches face scalability issues to be applicable in the CPS context.

Objectives:

The goal of this paper is to propose an APR method which is scalable for Stateflow models.

Method:

We propose an automated search-based approach called FlowRepair, explicitly designed to repair Stateflow models. The novelty of FlowRepair includes, (1) a new algorithm that combines global and local search for patch generation; (2) a definition of novel repair objectives specifically tailored for repairing CPSs; (3) a set of mutation operators to repair Stateflow models automatically; and (4) an evaluation on a new dataset encompassing 19 faulty stateflow models with real bugs.

Results:

Our results suggest that (1) FlowRepair can fix bugs in stateflow models; (2) FlowRepair surpasses or performs similarly to a baseline APR technique inspired by a well-known CPS program repair approach.

Conclusion:

This paper presents the first tool for APR CPSs whose high-level control program is developed in Simulink-Staflow. The results show that the approach is effective and scalable to such complex systems.
背景:状态流模型在工业界被广泛用于模拟Simulink中的信息物理系统(cps)的高级控制逻辑。有许多方法可以测试Simulink模型,但是一旦检测到故障,修复过程仍然是手动的。这样的手工过程增加了软件开发成本。自动程序修复(APR)技术可以通过自动生成修复错误的补丁来显著降低这一成本。然而,当前的方法面临着可伸缩性问题,难以适用于CPS上下文中。目的:本文的目标是提出一种针对状态流模型可扩展的APR方法。方法:我们提出了一种名为FlowRepair的基于自动搜索的方法,明确设计用于修复状态流模型。FlowRepair的新颖性包括:(1)结合全局和局部搜索来生成补丁的新算法;(2)定义专门针对cps修复的新型修复目标;(3)一组自动修复statflow模型的突变算子;(4)对包含19个错误状态流模型的新数据集进行评估。结果:我们的研究结果表明:(1)FlowRepair可以修复状态流模型中的错误;(2) FlowRepair的性能优于或类似于受知名CPS程序修复方法启发的基准APR技术。结论:本文提出了首个在Simulink-Staflow中开发APR cps高级控制程序的工具。结果表明,该方法对此类复杂系统具有良好的可扩展性和有效性。
{"title":"FlowRepair: Search-based automated program repair of CPS controllers modeled in Simulink-Stateflow","authors":"Aitor Arrieta ,&nbsp;Pablo Valle ,&nbsp;Shaukat Ali","doi":"10.1016/j.infsof.2025.108010","DOIUrl":"10.1016/j.infsof.2025.108010","url":null,"abstract":"<div><h3>Context:</h3><div>Stateflow models are widely used in the industry to model the high-level control logic of Cyber–Physical Systems (CPSs) in Simulink. Many approaches exist to test Simulink models, but once a fault is detected, the process to repair it remains manual. Such a manual process increases the software development cost. Automated Program Repair (APR) techniques can significantly reduce this cost by automatically generating patches that fix bugs. However, current approaches face scalability issues to be applicable in the CPS context.</div></div><div><h3>Objectives:</h3><div>The goal of this paper is to propose an APR method which is scalable for Stateflow models.</div></div><div><h3>Method:</h3><div>We propose an automated search-based approach called <span>FlowRepair</span>, explicitly designed to repair Stateflow models. The novelty of <span>FlowRepair</span> includes, (1) a new algorithm that combines global and local search for patch generation; (2) a definition of novel repair objectives specifically tailored for repairing CPSs; (3) a set of mutation operators to repair Stateflow models automatically; and (4) an evaluation on a new dataset encompassing 19 faulty stateflow models with real bugs.</div></div><div><h3>Results:</h3><div>Our results suggest that (1) <span>FlowRepair</span> can fix bugs in stateflow models; (2) <span>FlowRepair</span> surpasses or performs similarly to a baseline APR technique inspired by a well-known CPS program repair approach.</div></div><div><h3>Conclusion:</h3><div>This paper presents the first tool for APR CPSs whose high-level control program is developed in Simulink-Staflow. The results show that the approach is effective and scalable to such complex systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108010"},"PeriodicalIF":4.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artifact validity under varying agent configurations in LLM-assisted software development: A comparative analysis llm辅助软件开发中不同代理配置下工件有效性的比较分析
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-06 DOI: 10.1016/j.infsof.2026.108022
Dae-Kyoo Kim

Context:

The integration of large language models (LLMs) into software engineering has advanced toward agent-based automation across the development lifecycle. However, the comparative effectiveness of different multi-agent orchestration strategies remains underexplored.

Objective:

This study examines how three agent configuration strategies – Task-Specialized (TS), Phase-Specialized (PS), and Process-Generalist (PG) – impact the validity of software artifacts generated across key development tasks.

Methods:

Using a unified LLM backend within a structured orchestration framework, we evaluate the three configurations across nine core software engineering tasks – covering requirements analysis, design modeling, implementation, and testing – within three application domains: Tour Reservation System (TORS), Smart Wallet System (SWS), and Food Order and Delivery System (FODS). Artifact validity is measured using structural and semantic criteria.

Result:

No configuration consistently outperforms the others across all tasks. The overall average validity score is 0.56, with zero standard deviation, indicating uniformly constrained performance. Validity is highest in early requirements tasks (0.63–0.85), moderate in implementation and testing (0.61), and lowest in modeling tasks (0.25–0.42). TS agents perform well in modeling tasks due to focused specialization; PS agents benefit from contextual continuity in tasks like operation identification and test design, though performance varies; PG agents offer stable but less tailored performance across the pipeline. All configurations perform best in the TORS domain, which features simple and modular requirements.

Conclusions:

Artifact quality appears more influenced by the LLM’s capabilities than orchestration strategy alone. However, task- and domain-specific variations suggest that adaptive or hybrid orchestration strategies – tailored to both task type and domain context – can enhance the effectiveness of agent-assisted software development. These findings support the need for more targeted specialization strategies and possibly domain-adapted LLMs.
上下文:将大型语言模型(llm)集成到软件工程中已经在整个开发生命周期中向基于代理的自动化发展。然而,不同的多代理编排策略的比较有效性仍然没有得到充分的研究。目的:本研究考察了三种代理配置策略——任务专门化(TS)、阶段专门化(PS)和过程通用化(PG)——如何影响跨关键开发任务生成的软件工件的有效性。方法:在结构化编排框架内使用统一的LLM后端,我们在三个应用领域(旅游预订系统(TORS)、智能钱包系统(SWS)和食品订购和配送系统(FODS))中评估了九个核心软件工程任务的三种配置,包括需求分析、设计建模、实现和测试。工件有效性是使用结构和语义标准来测量的。结果:在所有任务中,没有任何配置始终优于其他配置。总体平均效度得分为0.56,标准差为零,表明绩效受到均匀约束。有效性在早期需求任务中是最高的(0.63-0.85),在实现和测试中是中等的(0.61),在建模任务中是最低的(0.25-0.42)。由于集中的专业化,TS代理在建模任务中表现良好;PS代理受益于操作识别和测试设计等任务的上下文连续性,尽管性能有所不同;PG剂在整个管道中提供稳定但不太定制的性能。所有配置在TORS域中表现最佳,其特点是需求简单且模块化。结论:工件质量似乎更受LLM能力的影响,而不是单独的编制策略。然而,特定于任务和领域的变化表明,自适应或混合编排策略——针对任务类型和领域上下文定制——可以增强代理辅助软件开发的有效性。这些发现支持需要更有针对性的专业化策略和可能的领域适应法学硕士。
{"title":"Artifact validity under varying agent configurations in LLM-assisted software development: A comparative analysis","authors":"Dae-Kyoo Kim","doi":"10.1016/j.infsof.2026.108022","DOIUrl":"10.1016/j.infsof.2026.108022","url":null,"abstract":"<div><h3>Context:</h3><div>The integration of large language models (LLMs) into software engineering has advanced toward agent-based automation across the development lifecycle. However, the comparative effectiveness of different multi-agent orchestration strategies remains underexplored.</div></div><div><h3>Objective:</h3><div>This study examines how three agent configuration strategies – Task-Specialized (TS), Phase-Specialized (PS), and Process-Generalist (PG) – impact the validity of software artifacts generated across key development tasks.</div></div><div><h3>Methods:</h3><div>Using a unified LLM backend within a structured orchestration framework, we evaluate the three configurations across nine core software engineering tasks – covering requirements analysis, design modeling, implementation, and testing – within three application domains: Tour Reservation System (TORS), Smart Wallet System (SWS), and Food Order and Delivery System (FODS). Artifact validity is measured using structural and semantic criteria.</div></div><div><h3>Result:</h3><div>No configuration consistently outperforms the others across all tasks. The overall average validity score is 0.56, with zero standard deviation, indicating uniformly constrained performance. Validity is highest in early requirements tasks (0.63–0.85), moderate in implementation and testing (0.61), and lowest in modeling tasks (0.25–0.42). TS agents perform well in modeling tasks due to focused specialization; PS agents benefit from contextual continuity in tasks like operation identification and test design, though performance varies; PG agents offer stable but less tailored performance across the pipeline. All configurations perform best in the TORS domain, which features simple and modular requirements.</div></div><div><h3>Conclusions:</h3><div>Artifact quality appears more influenced by the LLM’s capabilities than orchestration strategy alone. However, task- and domain-specific variations suggest that adaptive or hybrid orchestration strategies – tailored to both task type and domain context – can enhance the effectiveness of agent-assisted software development. These findings support the need for more targeted specialization strategies and possibly domain-adapted LLMs.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"192 ","pages":"Article 108022"},"PeriodicalIF":4.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond platforms — Growing distributed transaction networks for digital commerce 超越平台——为数字商务发展的分布式交易网络
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-06 DOI: 10.1016/j.infsof.2026.108016
Yvonne Dittrich , Kim Peiter Jørgensen , Ravi Prakash , Willard Rafnsson , Jonas Kastberg Hinrichsen

Context

We talk of the internet as digital infrastructure; but we leave the building of digital 'rails' and 'roads' to the quasi-monopolistic platform providers that benefit from both vendor and customer log-in. Decentralised architectures provide a number of advantages: They are potentially more inclusive for small players; more resilient against adversarial events, and seem to generate more innovation. However, it is not well understood how to evolve, adapt and govern decentralised infrastructures.

Objective

This article reports empirical qualitative research on the development and governance of the Beckn Protocol, an open source protocol for decentralised transactions, the successful development of domain-specific adaptations, and implementation and scaling of commercial infrastructures based on it. It explores how the architecture and governance support local innovation for specific business domains, and how the domain-specific innovations feed back into the development of the core concept.

Method

The Beckn Protocol is researched as a defining element of a software ecosystem underpinning infrastructures for digital commerce. The research applied a case study approach, triangulating interviews with core members of the Beckn community with interviews with community leaders of domain specific adaptations and analysis of online documents and the protocol itself.

Results

The article shows the possibility of a decentralised approach to IT infrastructures. It analyses the Beckn Protocol, domain specific adaptations, and networks built on them with respect to architecture and evolution, community and governance, the outcome, and communication and collaboration. Based on the analysis, a number of generative mechanisms - socio-technical arrangements that support adoption, innovation, and scaling of infrastructures are highlighted.

Conclusion

The article discusses the importance of governance also concerning security of decentralised networks. It emphasises the importance of feedback loops to both provide input for technical evolution and to recognise misconduct and develop means to address it. Implications for practice and research are highlighted.
我们把互联网称为数字基础设施;但我们把数字“铁路”和“道路”的建设留给了准垄断的平台提供商,他们从供应商和客户登录中都受益。去中心化架构提供了许多优势:它们可能对小型参与者更具包容性;对敌对事件更有弹性,似乎产生了更多的创新。然而,如何发展、适应和管理分散的基础设施还没有得到很好的理解。本文报告了对Beckn协议的开发和治理的实证定性研究,Beckn协议是一种用于分散交易的开源协议,成功开发了特定领域的适应性,以及基于它的商业基础设施的实现和扩展。它探讨了体系结构和治理如何支持特定业务领域的本地创新,以及特定于领域的创新如何反馈到核心概念的开发中。方法将贝克协议作为支撑数字商务基础设施的软件生态系统的定义元素进行研究。该研究采用了案例研究方法,对Beckn社区核心成员的访谈与对特定领域适应性的社区领导人的访谈进行了三角测量,并对在线文档和协议本身进行了分析。结果本文展示了分散IT基础设施方法的可能性。它分析了beckprotocol、特定领域的适应性,以及基于它们构建的网络,包括体系结构和进化、社区和治理、结果以及通信和协作。基于分析,强调了支持基础设施采用、创新和扩展的一些生成机制——社会技术安排。本文讨论了治理对分散网络安全的重要性。它强调了反馈回路的重要性,既可以为技术发展提供输入,也可以识别不当行为并开发解决不当行为的方法。强调了对实践和研究的启示。
{"title":"Beyond platforms — Growing distributed transaction networks for digital commerce","authors":"Yvonne Dittrich ,&nbsp;Kim Peiter Jørgensen ,&nbsp;Ravi Prakash ,&nbsp;Willard Rafnsson ,&nbsp;Jonas Kastberg Hinrichsen","doi":"10.1016/j.infsof.2026.108016","DOIUrl":"10.1016/j.infsof.2026.108016","url":null,"abstract":"<div><h3>Context</h3><div>We talk of the internet as digital infrastructure; but we leave the building of digital 'rails' and 'roads' to the quasi-monopolistic platform providers that benefit from both vendor and customer log-in. Decentralised architectures provide a number of advantages: They are potentially more inclusive for small players; more resilient against adversarial events, and seem to generate more innovation. However, it is not well understood how to evolve, adapt and govern decentralised infrastructures.</div></div><div><h3>Objective</h3><div>This article reports empirical qualitative research on the development and governance of the Beckn Protocol, an open source protocol for decentralised transactions, the successful development of domain-specific adaptations, and implementation and scaling of commercial infrastructures based on it. It explores how the architecture and governance support local innovation for specific business domains, and how the domain-specific innovations feed back into the development of the core concept.</div></div><div><h3>Method</h3><div>The Beckn Protocol is researched as a defining element of a software ecosystem underpinning infrastructures for digital commerce. The research applied a case study approach, triangulating interviews with core members of the Beckn community with interviews with community leaders of domain specific adaptations and analysis of online documents and the protocol itself.</div></div><div><h3>Results</h3><div>The article shows the possibility of a decentralised approach to IT infrastructures. It analyses the Beckn Protocol, domain specific adaptations, and networks built on them with respect to architecture and evolution, community and governance, the outcome, and communication and collaboration. Based on the analysis, a number of generative mechanisms - socio-technical arrangements that support adoption, innovation, and scaling of infrastructures are highlighted.</div></div><div><h3>Conclusion</h3><div>The article discusses the importance of governance also concerning security of decentralised networks. It emphasises the importance of feedback loops to both provide input for technical evolution and to recognise misconduct and develop means to address it. Implications for practice and research are highlighted.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"193 ","pages":"Article 108016"},"PeriodicalIF":4.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information and Software Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1