首页 > 最新文献

Information and Software Technology最新文献

英文 中文
Navigating ASD barriers: a role-contextual framework for enhancing ASD implementation in self-organizing teams 导航ASD障碍:在自组织团队中增强ASD实现的角色-上下文框架
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-12-14 DOI: 10.1016/j.infsof.2025.107997
Soumya Prakash Rath , Gunjan Tomer

Context

Agile Software Development (ASD) emphasizes adaptability, collaboration, and rapid, iterative delivery to address evolving client needs and manage interdependent roles effectively. Though extant literature has explored ASD and its various dimensions, intersections, and dependencies among various well-defined ASD job roles, it remains underexplored.

Objective

This study examines the implementation barriers faced by ASD teams and identifies the best practices adopted across key ASD roles—Scrum Masters, Product Owners, and Developers—as well as senior leadership, to overcome them.

Method

This study employs a Grounded Theory (GT) methodology to explore the complex, role-based challenges in ASD inductively. Drawing on qualitative insights from 23 industry practitioners representing multiple ASD roles, the research delves into the ground realities of ASD projects to surface emergent patterns and contextual nuances. Through iterative coding and constant comparison, the study develops a role-contextual understanding of barriers to ASD implementation and synthesizes practitioner-validated best practices for overcoming them.

Results

The study identifies a spectrum of barriers—role-specific, inter-role, and leadership-related—spanning strategic, operational, and interpersonal dimensions. It proposes a Role-Contextual Framework that categorizes these barriers at triadic, dyadic, and intra-role levels and links them to practitioner-validated best practices.

Conclusion

The findings contribute to ASD research by highlighting the contextual and systemic nature of ASD barriers and underscoring the need for adaptive, role-based strategies, such as strategic alignment and leadership involvement, to facilitate the successful implementation of ASD.
敏捷软件开发(ASD)强调适应性、协作和快速、迭代的交付,以满足不断变化的客户需求,并有效地管理相互依赖的角色。虽然现有的文献已经探讨了ASD及其各种维度、交叉点和各种明确定义的ASD工作角色之间的依赖关系,但它仍然没有得到充分的探索。本研究考察了ASD团队面临的实现障碍,并确定了跨关键ASD角色(scrum master、产品负责人和开发人员)以及高级领导层采用的最佳实践,以克服这些障碍。方法本研究采用扎根理论(GT)方法归纳探讨ASD中复杂的、基于角色的挑战。从代表多个ASD角色的23个行业从业者的定性见解中,该研究深入研究了ASD项目的基本现实,以揭示新兴模式和上下文的细微差别。通过反复编码和不断的比较,该研究发展了对ASD实施障碍的角色上下文理解,并综合了经过实践验证的最佳实践来克服这些障碍。结果研究确定了一系列跨越战略、运营和人际维度的障碍——角色特定障碍、角色间障碍和领导相关障碍。它提出了一个角色-上下文框架,将这些障碍分为三元、二元和角色内级别,并将它们与从业者验证的最佳实践联系起来。结论研究结果强调了ASD障碍的情境性和系统性,并强调需要适应性的、基于角色的策略,如战略一致性和领导参与,以促进ASD的成功实施,从而有助于ASD的研究。
{"title":"Navigating ASD barriers: a role-contextual framework for enhancing ASD implementation in self-organizing teams","authors":"Soumya Prakash Rath ,&nbsp;Gunjan Tomer","doi":"10.1016/j.infsof.2025.107997","DOIUrl":"10.1016/j.infsof.2025.107997","url":null,"abstract":"<div><h3>Context</h3><div>Agile Software Development (ASD) emphasizes adaptability, collaboration, and rapid, iterative delivery to address evolving client needs and manage interdependent roles effectively. Though extant literature has explored ASD and its various dimensions, intersections, and dependencies among various well-defined ASD job roles, it remains underexplored.</div></div><div><h3>Objective</h3><div>This study examines the implementation barriers faced by ASD teams and identifies the best practices adopted across key ASD roles—Scrum Masters, Product Owners, and Developers—as well as senior leadership, to overcome them.</div></div><div><h3>Method</h3><div>This study employs a Grounded Theory (GT) methodology to explore the complex, role-based challenges in ASD inductively. Drawing on qualitative insights from 23 industry practitioners representing multiple ASD roles, the research delves into the ground realities of ASD projects to surface emergent patterns and contextual nuances. Through iterative coding and constant comparison, the study develops a role-contextual understanding of barriers to ASD implementation and synthesizes practitioner-validated best practices for overcoming them.</div></div><div><h3>Results</h3><div>The study identifies a spectrum of barriers—role-specific, inter-role, and leadership-related—spanning strategic, operational, and interpersonal dimensions. It proposes a Role-Contextual Framework that categorizes these barriers at triadic, dyadic, and intra-role levels and links them to practitioner-validated best practices.</div></div><div><h3>Conclusion</h3><div>The findings contribute to ASD research by highlighting the contextual and systemic nature of ASD barriers and underscoring the need for adaptive, role-based strategies, such as strategic alignment and leadership involvement, to facilitate the successful implementation of ASD.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 107997"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing organizational alignment, team autonomy, and control in large-scale agile organizations 在大型敏捷组织中平衡组织一致性、团队自治和控制
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-12-16 DOI: 10.1016/j.infsof.2025.107998
Bart Daemen, Konstantinos Tsilionis, Oktay Turetken

Context

Agile software development has become widespread, particularly in small team settings, because of its emphasis on user value and adaptability. Scaling agile practices in large organizations, however, presents challenges. As organizations grow, projects become complex, requiring control mechanisms to ensure quality deliverables, align outputs across teams, and connect them to organizational goals. Scaling agile frameworks provide structured approaches to coordinate teams and maintain alignment while preserving team autonomy. Yet, applying these frameworks rigidly can reduce flexibility, overlook organizational context, and disrupt the balance between autonomy and alignment.

Objective

This study examines how different control mechanisms, both formal controls such as structured processes, rules, and performance metrics, and informal controls such as trust, social norms, and mutual adjustment, can foster organizational alignment while preserving team autonomy in large organizations that implement scaling agile frameworks.

Methods

A multi-method approach was adopted, combining a systematic literature review with an interpretive case study. The literature review synthesized prior research and produced eight theoretical propositions on the dynamic interaction of control, alignment, and autonomy. The case study was conducted in the IT department of a multinational organization transitioning from waterfall to agile, offering an empirical setting to explore how these propositions unfold in practice.

Results

Our findings show that formal controls, such as adherence to architectural standards, provide structure and support alignment. Informal mechanisms, including communities of practice, peer coaching, and visibility-based incentives, enhance accountability and autonomy. However, rigid separation of formal and informal controls is restrictive. More integrated hybrid approaches that combine agile methods with structured processes, supported by parallel decision-making bodies, address inter-team dependencies while maintaining flexibility.

Conclusion

Large organizations should adopt context-sensitive strategies when applying scaling agile frameworks. Control mechanisms need to be tailored to organizational needs, maturity, and conditions, supported by clear communication, cultural commitment, and adaptive governance.
由于强调用户价值和适应性,敏捷软件开发已经变得非常普遍,特别是在小型团队环境中。然而,在大型组织中扩展敏捷实践存在挑战。随着组织的发展,项目变得复杂,需要控制机制来确保高质量的可交付成果,协调跨团队的输出,并将它们与组织目标联系起来。可伸缩的敏捷框架提供了结构化的方法来协调团队,并在保持团队自主性的同时保持一致性。然而,严格地应用这些框架会降低灵活性,忽略组织环境,并破坏自治和一致性之间的平衡。本研究考察了不同的控制机制,包括正式的控制,如结构化过程、规则和绩效指标,以及非正式的控制,如信任、社会规范和相互调整,如何在实现规模化敏捷框架的大型组织中促进组织一致性,同时保持团队自主性。方法采用系统文献综述和案例分析相结合的方法。文献综述综合了前人的研究成果,提出了控制、对齐和自治动态交互作用的八个理论命题。案例研究是在一家跨国公司的IT部门进行的,该部门正在从瀑布式开发过渡到敏捷开发,提供了一个经验背景来探索这些主张在实践中是如何展开的。结果:我们的发现表明,正式的控制,例如对体系结构标准的遵守,提供了结构和支持对齐。非正式机制,包括实践社区、同伴指导和基于可见性的激励,增强了问责制和自主权。然而,正式控制和非正式控制的严格分离是有限制的。更集成的混合方法将敏捷方法与结构化过程结合起来,由并行决策机构支持,在保持灵活性的同时处理团队间的依赖关系。大型组织在应用可伸缩敏捷框架时应该采用上下文敏感策略。控制机制需要根据组织的需要、成熟度和条件进行调整,并得到清晰的沟通、文化承诺和适应性治理的支持。
{"title":"Balancing organizational alignment, team autonomy, and control in large-scale agile organizations","authors":"Bart Daemen,&nbsp;Konstantinos Tsilionis,&nbsp;Oktay Turetken","doi":"10.1016/j.infsof.2025.107998","DOIUrl":"10.1016/j.infsof.2025.107998","url":null,"abstract":"<div><h3>Context</h3><div>Agile software development has become widespread, particularly in small team settings, because of its emphasis on user value and adaptability. Scaling agile practices in large organizations, however, presents challenges. As organizations grow, projects become complex, requiring control mechanisms to ensure quality deliverables, align outputs across teams, and connect them to organizational goals. Scaling agile frameworks provide structured approaches to coordinate teams and maintain alignment while preserving team autonomy. Yet, applying these frameworks rigidly can reduce flexibility, overlook organizational context, and disrupt the balance between autonomy and alignment.</div></div><div><h3>Objective</h3><div>This study examines how different control mechanisms, both formal controls such as structured processes, rules, and performance metrics, and informal controls such as trust, social norms, and mutual adjustment, can foster organizational alignment while preserving team autonomy in large organizations that implement scaling agile frameworks.</div></div><div><h3>Methods</h3><div>A multi-method approach was adopted, combining a systematic literature review with an interpretive case study. The literature review synthesized prior research and produced eight theoretical propositions on the dynamic interaction of control, alignment, and autonomy. The case study was conducted in the IT department of a multinational organization transitioning from waterfall to agile, offering an empirical setting to explore how these propositions unfold in practice.</div></div><div><h3>Results</h3><div>Our findings show that formal controls, such as adherence to architectural standards, provide structure and support alignment. Informal mechanisms, including communities of practice, peer coaching, and visibility-based incentives, enhance accountability and autonomy. However, rigid separation of formal and informal controls is restrictive. More integrated hybrid approaches that combine agile methods with structured processes, supported by parallel decision-making bodies, address inter-team dependencies while maintaining flexibility.</div></div><div><h3>Conclusion</h3><div>Large organizations should adopt context-sensitive strategies when applying scaling agile frameworks. Control mechanisms need to be tailored to organizational needs, maturity, and conditions, supported by clear communication, cultural commitment, and adaptive governance.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 107998"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compositional security analysis of dynamic component-based systems 基于动态组件的系统组合安全性分析
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-12-26 DOI: 10.1016/j.infsof.2025.108002
Charilaos Skandylas , Narges Khakpour

Context:

To reason about and enforce security in dynamic software systems, automated analysis and verification approaches are required. However, such approaches often encounter scalability issues, particularly when employed for runtime analysis, which is necessary in software systems with dynamically changing architectures, such as self-adaptive systems.

Objective:

In this work, we propose an automated formal approach for security analysis of component-based systems with dynamic architectures.

Methods:

This approach leverages formal abstraction and incremental analysis techniques to reduce the complexity of runtime analysis. We have implemented and evaluated our approach against ZNN, a widely known self-adaptive system exemplar.

Results:

Compared to the state of the art, our results demonstrate an improvement both in the size of systems that can be analyzed and at the time required to complete the analysis. In particular, our incremental analysis is well suited for systems that alter their architectures at runtime.

Conclusion:

Therefore, this approach is suitable for analyzing the security dynamic component based both statically and at runtime.
上下文:为了对动态软件系统中的安全性进行推理和强制执行,需要使用自动分析和验证方法。然而,这种方法经常遇到可伸缩性问题,特别是在用于运行时分析时,这在具有动态变化体系结构的软件系统中是必要的,例如自适应系统。目的:在这项工作中,我们提出了一种自动化的形式化方法,用于动态架构的基于组件的系统的安全分析。方法:该方法利用形式抽象和增量分析技术来减少运行时分析的复杂性。我们已经针对ZNN(一个广为人知的自适应系统范例)实现并评估了我们的方法。结果:与目前的技术水平相比,我们的结果表明,可以分析的系统的大小和完成分析所需的时间都有所改善。特别是,我们的增量分析非常适合在运行时改变其体系结构的系统。结论:该方法适用于基于静态和运行时的安全动态组件分析。
{"title":"Compositional security analysis of dynamic component-based systems","authors":"Charilaos Skandylas ,&nbsp;Narges Khakpour","doi":"10.1016/j.infsof.2025.108002","DOIUrl":"10.1016/j.infsof.2025.108002","url":null,"abstract":"<div><h3>Context:</h3><div>To reason about and enforce security in dynamic software systems, automated analysis and verification approaches are required. However, such approaches often encounter scalability issues, particularly when employed for runtime analysis, which is necessary in software systems with dynamically changing architectures, such as self-adaptive systems.</div></div><div><h3>Objective:</h3><div>In this work, we propose an automated formal approach for security analysis of component-based systems with dynamic architectures.</div></div><div><h3>Methods:</h3><div>This approach leverages formal abstraction and incremental analysis techniques to reduce the complexity of runtime analysis. We have implemented and evaluated our approach against ZNN, a widely known self-adaptive system exemplar.</div></div><div><h3>Results:</h3><div>Compared to the state of the art, our results demonstrate an improvement both in the size of systems that can be analyzed and at the time required to complete the analysis. In particular, our incremental analysis is well suited for systems that alter their architectures at runtime.</div></div><div><h3>Conclusion:</h3><div>Therefore, this approach is suitable for analyzing the security dynamic component based both statically and at runtime.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 108002"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards green game software engineering: A comparative analysis of energy consumption between the widespread unity and unreal video game engines 走向绿色游戏软件工程:广泛统一与虚幻电子游戏引擎能耗比较分析
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-12-03 DOI: 10.1016/j.infsof.2025.107991
Carlos Pérez , Javier Verón , Francisca Pérez , MaÁngeles Moraga , Coral Calero , Carlos Cetina

Context:

The total energy cost of computing activities is steadily increasing, and projections indicate that it will be one of the dominant global energy consumers in the coming decades. However, the video game sector has not yet developed the same level of environmental awareness as other computing technologies despite the estimated three billion regular video game players in the world.

Objective:

This work evaluates the energy consumption of the most widely used industry-scale video game engines: Unity and Unreal Engine.

Method:

Specifically, our work uses three scenarios representing relevant aspects of video games (Physics, Static Meshes, and Dynamic Meshes) to compare the energy consumption of the engines. The aim is to determine the influence of using each engine on energy consumption.

Results:

Our research has confirmed notable differences in energy consumption: 351% in Physics in favor of Unity, 17% in Static Meshes in favor of Unity, and 26% in Dynamic Meshes in favor of Unreal Engine.

Conclusion:

Considering the estimated three billion regular video game players worldwide and the high computational requirements of the sector, the magnitude of potential savings is a relevant issue for the research community. This might encourage a new branch of research on energy efficient video game engines.
背景:计算活动的总能源成本正在稳步增长,预测表明,在未来几十年,它将成为全球主要的能源消费者之一。然而,尽管世界上估计有30亿电子游戏玩家,但电子游戏行业还没有像其他计算技术那样发展出同样的环保意识。目的:本工作评估了最广泛使用的工业规模视频游戏引擎:Unity和虚幻引擎的能耗。方法:具体来说,我们的工作使用了代表电子游戏相关方面的三个场景(物理,静态网格和动态网格)来比较引擎的能量消耗。目的是确定使用每种发动机对能源消耗的影响。结果:我们的研究证实了在能量消耗上的显著差异:在物理上351%的人支持Unity,在静态网格上17%的人支持Unity,在动态网格上26%的人支持虚幻引擎。结论:考虑到全球约有30亿普通电子游戏玩家以及该领域的高计算需求,潜在节省的规模是研究社区的一个相关问题。这可能会促进节能电子游戏引擎研究的新分支。
{"title":"Towards green game software engineering: A comparative analysis of energy consumption between the widespread unity and unreal video game engines","authors":"Carlos Pérez ,&nbsp;Javier Verón ,&nbsp;Francisca Pérez ,&nbsp;MaÁngeles Moraga ,&nbsp;Coral Calero ,&nbsp;Carlos Cetina","doi":"10.1016/j.infsof.2025.107991","DOIUrl":"10.1016/j.infsof.2025.107991","url":null,"abstract":"<div><h3>Context:</h3><div>The total energy cost of computing activities is steadily increasing, and projections indicate that it will be one of the dominant global energy consumers in the coming decades. However, the video game sector has not yet developed the same level of environmental awareness as other computing technologies despite the estimated three billion regular video game players in the world.</div></div><div><h3>Objective:</h3><div>This work evaluates the energy consumption of the most widely used industry-scale video game engines: Unity and Unreal Engine.</div></div><div><h3>Method:</h3><div>Specifically, our work uses three scenarios representing relevant aspects of video games (Physics, Static Meshes, and Dynamic Meshes) to compare the energy consumption of the engines. The aim is to determine the influence of using each engine on energy consumption.</div></div><div><h3>Results:</h3><div>Our research has confirmed notable differences in energy consumption: 351% in Physics in favor of Unity, 17% in Static Meshes in favor of Unity, and 26% in Dynamic Meshes in favor of Unreal Engine.</div></div><div><h3>Conclusion:</h3><div>Considering the estimated three billion regular video game players worldwide and the high computational requirements of the sector, the magnitude of potential savings is a relevant issue for the research community. This might encourage a new branch of research on energy efficient video game engines.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 107991"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality assessment of software requirements using artificial intelligence methods: A systematic literature review 使用人工智能方法的软件需求质量评估:系统的文献综述
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-11-25 DOI: 10.1016/j.infsof.2025.107979
Elise Wolf, Adam Trendowicz, Julien Siebert

Context:

The quality of requirements specifications is a critical success factor in software development. Assuring high-quality requirements, specifically in an automated way, poses a significant challenge due to their unstructured and multi-modal character. With the rise of deep learning and large language models (LLMs), new opportunities have developed to assess the quality of requirements automatically, particularly user stories in the context of agile software engineering, where short development cycles require efficient tool support.

Objective:

This study aims to systematically review and investigate the current landscape of approaches based on artificial intelligence techniques such as natural language processing and deep learning for assessing the quality of software requirements. The investigation focuses on the artificial intelligence techniques adopted, quality aspects considered, datasets used to tune and evaluate the proposed approaches, and their performance.

Method:

We conducted a systematic literature review of 26 peer-reviewed papers published between 2019 and 2025. We selected the papers after a title and abstract review of 353 papers identified through a literature databases query and forward–backward snowballing.

Results:

The results reveal significant overlap among considered quality aspects, which can be mapped onto the higher-order requirements quality model INVEST. Most studies focus on assessing requirement quality rather than improving requirements and rely heavily on synthetic and public datasets. LLMs have rapidly gained popularity since 2023, though model evaluation strategies remain inconsistent. Metrics such as accuracy, precision, recall, and F1-Score are common, yet a few studies use semantic or expert-based evaluations.

Conclusion:

The field is evolving toward LLM-driven, semantically rich models, yet lacks methodological standardization, reproducible datasets for evaluating the models, and integration of the approaches with real-world requirements engineering processes. Future work should address these limitations by developing benchmark datasets, standardizing evaluation metrics, and exploring hybrid systems that combine AI-based and traditional requirements quality assurance approaches.
背景:需求规格说明的质量是软件开发中成功的关键因素。确保高质量的需求,特别是以自动化的方式,由于其非结构化和多模式的特性,提出了重大的挑战。随着深度学习和大型语言模型(llm)的兴起,自动评估需求质量的新机会已经出现,尤其是敏捷软件工程背景下的用户故事,在这种情况下,短的开发周期需要有效的工具支持。目的:本研究旨在系统地回顾和调查基于人工智能技术(如自然语言处理和深度学习)的软件需求质量评估方法的现状。调查的重点是采用的人工智能技术,所考虑的质量方面,用于调整和评估所提出的方法的数据集,以及它们的性能。方法:对2019 - 2025年间发表的26篇同行评议论文进行系统文献综述。我们通过文献数据库查询和前后滚雪球法对353篇论文进行标题和摘要综述后选择了这些论文。结果:结果揭示了在考虑的质量方面之间显著的重叠,这可以映射到高阶需求质量模型INVEST上。大多数研究关注于评估需求质量,而不是改进需求,并且严重依赖于合成的和公共的数据集。自2023年以来,法学硕士迅速普及,尽管模型评估策略仍然不一致。准确度、精度、召回率和F1-Score等指标很常见,但也有一些研究使用语义或基于专家的评估。结论:该领域正在向法学硕士驱动的、语义丰富的模型发展,但缺乏方法标准化、用于评估模型的可重复数据集,以及与现实世界需求工程过程的方法集成。未来的工作应该通过开发基准数据集、标准化评估度量以及探索结合基于人工智能和传统需求质量保证方法的混合系统来解决这些限制。
{"title":"Quality assessment of software requirements using artificial intelligence methods: A systematic literature review","authors":"Elise Wolf,&nbsp;Adam Trendowicz,&nbsp;Julien Siebert","doi":"10.1016/j.infsof.2025.107979","DOIUrl":"10.1016/j.infsof.2025.107979","url":null,"abstract":"<div><h3>Context:</h3><div>The quality of requirements specifications is a critical success factor in software development. Assuring high-quality requirements, specifically in an automated way, poses a significant challenge due to their unstructured and multi-modal character. With the rise of deep learning and large language models (LLMs), new opportunities have developed to assess the quality of requirements automatically, particularly user stories in the context of agile software engineering, where short development cycles require efficient tool support.</div></div><div><h3>Objective:</h3><div>This study aims to systematically review and investigate the current landscape of approaches based on artificial intelligence techniques such as natural language processing and deep learning for assessing the quality of software requirements. The investigation focuses on the artificial intelligence techniques adopted, quality aspects considered, datasets used to tune and evaluate the proposed approaches, and their performance.</div></div><div><h3>Method:</h3><div>We conducted a systematic literature review of 26 peer-reviewed papers published between 2019 and 2025. We selected the papers after a title and abstract review of 353 papers identified through a literature databases query and forward–backward snowballing.</div></div><div><h3>Results:</h3><div>The results reveal significant overlap among considered quality aspects, which can be mapped onto the higher-order requirements quality model INVEST. Most studies focus on assessing requirement quality rather than improving requirements and rely heavily on synthetic and public datasets. LLMs have rapidly gained popularity since 2023, though model evaluation strategies remain inconsistent. Metrics such as accuracy, precision, recall, and F1-Score are common, yet a few studies use semantic or expert-based evaluations.</div></div><div><h3>Conclusion:</h3><div>The field is evolving toward LLM-driven, semantically rich models, yet lacks methodological standardization, reproducible datasets for evaluating the models, and integration of the approaches with real-world requirements engineering processes. Future work should address these limitations by developing benchmark datasets, standardizing evaluation metrics, and exploring hybrid systems that combine AI-based and traditional requirements quality assurance approaches.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 107979"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ReEPM: A Reliability Estimation Framework for CNNs based on Error Probability Matrix modeling ReEPM:基于误差概率矩阵建模的cnn可靠性估计框架
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.infsof.2025.107981
Jie Xiao , Aizhu Liu , Yujian Yang , Yuhao Huang , Zhezhao Yang , Jungang Lou

Context:

The deployment of Convolutional Neural Networks (CNNs) in safety-critical applications faces significant challenges from soft errors. While accurate reliability assessment is vital, existing methods typically suffer from prohibitively high computational overheads, creating a critical trade-off between precision and efficiency that severely limits their practical applicability.

Objective:

To overcome this critical precision-efficiency dilemma, we aim to design a novel framework designed for CNN reliability assessment, aiming for accurate and highly efficient CNN reliability evaluation across diverse error conditions.

Method:

We propose ReEPM (Reliability Estimation Framework based on Error Probability Matrix). ReEPM constructs an Error Probability Matrix (EPM) that precisely models bit-flip error impact on CNN weights, fundamentally enabling parallel, accurate error injection without brute-force simulation. Moreover, we integrate an adaptive iterative process driven by Kalman filtering, which intelligently converges on reliability estimates with a drastically reduced number of input samples. This combination offers superior analytical rigor and computational efficiency.

Result:

Experimental results show that ReEPM achieves average accuracy of 0.9017 (single-error) and 0.9984 (multiple-error), while being 69.53× and 1989.27× faster compared widely adopted Monte Carlo fault injection, respectively. Furthermore, ReEPM significantly outperforms probability-based methods like SERN in accuracy (0.9017 vs. 0.7192 in single-error) and boasts broader applicability, covering entire networks and complex multiple-error scenarios.

Conclusion:

ReEPM establishes a new paradigm for CNN reliability assessment by effectively overcoming the critical accuracy-overhead trade-off. It offers an accuracy and rapid evaluation tool for designing resilient CNNs in next-generation safety-critical intelligent systems.
背景:卷积神经网络(cnn)在安全关键应用中的部署面临着来自软错误的重大挑战。虽然准确的可靠性评估是至关重要的,但现有的方法通常存在过高的计算开销,在精度和效率之间产生了一个关键的权衡,严重限制了它们的实际适用性。为了克服这一关键的精度效率困境,我们旨在设计一种新的CNN可靠性评估框架,旨在准确、高效地评估不同误差条件下的CNN可靠性。方法:提出基于误差概率矩阵的可靠性估计框架。ReEPM构建了一个错误概率矩阵(Error Probability Matrix, EPM),精确地模拟比特翻转错误对CNN权重的影响,从根本上实现了并行、准确的错误注入,而无需暴力模拟。此外,我们集成了一个由卡尔曼滤波驱动的自适应迭代过程,该过程在输入样本数量急剧减少的情况下智能地收敛于可靠性估计。这种组合提供了优越的分析严谨性和计算效率。结果:实验结果表明,ReEPM的平均准确率为0.9017(单误差)和0.9984(多误差),与广泛采用的蒙特卡罗故障注入相比,分别快了69.53倍和1989.27倍。此外,ReEPM在准确率上明显优于基于概率的方法,如SERN(单错误0.9017 vs 0.7192),并且具有更广泛的适用性,涵盖了整个网络和复杂的多错误场景。结论:ReEPM通过有效地克服临界精度-开销权衡,为CNN可靠性评估建立了一个新的范式。它为下一代安全关键型智能系统中弹性cnn的设计提供了准确、快速的评估工具。
{"title":"ReEPM: A Reliability Estimation Framework for CNNs based on Error Probability Matrix modeling","authors":"Jie Xiao ,&nbsp;Aizhu Liu ,&nbsp;Yujian Yang ,&nbsp;Yuhao Huang ,&nbsp;Zhezhao Yang ,&nbsp;Jungang Lou","doi":"10.1016/j.infsof.2025.107981","DOIUrl":"10.1016/j.infsof.2025.107981","url":null,"abstract":"<div><h3>Context:</h3><div>The deployment of Convolutional Neural Networks (CNNs) in safety-critical applications faces significant challenges from soft errors. While accurate reliability assessment is vital, existing methods typically suffer from prohibitively high computational overheads, creating a critical trade-off between precision and efficiency that severely limits their practical applicability.</div></div><div><h3>Objective:</h3><div>To overcome this critical precision-efficiency dilemma, we aim to design a novel framework designed for CNN reliability assessment, aiming for accurate and highly efficient CNN reliability evaluation across diverse error conditions.</div></div><div><h3>Method:</h3><div>We propose ReEPM (Reliability Estimation Framework based on Error Probability Matrix). ReEPM constructs an Error Probability Matrix (EPM) that precisely models bit-flip error impact on CNN weights, fundamentally enabling parallel, accurate error injection without brute-force simulation. Moreover, we integrate an adaptive iterative process driven by Kalman filtering, which intelligently converges on reliability estimates with a drastically reduced number of input samples. This combination offers superior analytical rigor and computational efficiency.</div></div><div><h3>Result:</h3><div>Experimental results show that ReEPM achieves average accuracy of 0.9017 (single-error) and 0.9984 (multiple-error), while being 69.53<span><math><mo>×</mo></math></span> and 1989.27<span><math><mo>×</mo></math></span> faster compared widely adopted Monte Carlo fault injection, respectively. Furthermore, ReEPM significantly outperforms probability-based methods like SERN in accuracy (0.9017 vs. 0.7192 in single-error) and boasts broader applicability, covering entire networks and complex multiple-error scenarios.</div></div><div><h3>Conclusion:</h3><div>ReEPM establishes a new paradigm for CNN reliability assessment by effectively overcoming the critical accuracy-overhead trade-off. It offers an accuracy and rapid evaluation tool for designing resilient CNNs in next-generation safety-critical intelligent systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 107981"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representation learning for coincidental correctness in fault localization 故障定位中巧合正确性的表示学习
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-11-24 DOI: 10.1016/j.infsof.2025.107978
Jian Hu

Context:

Fault localization (FL) is a critical phase in the software debugging process, which employs execution coverage matrix to identify the exact locations of faults or bugs in a program’s source code. However, researchers have proved that coincidental correctness test cases (CCTC) that execute the faulty statements whereas produce the correct output are prevalent in test suites and can negatively affect the effectiveness of fault localization.

Objective:

To address this problem, we propose ER4FL: a representation learning based CCTC detection method for fault localization. Our method first detects the CCTCs in the coverage matrix, then relabels them, and finally uses the optimized coverage matrix for fault localization.

Method:

ER4FL leverages an autoencoder-based representation learning to refine the coverage matrix, which captures its most important features in a compressed form. Based on the enhanced representation (i.e., compact coverage matrix), ER4FL adopts a Gaussian Mixture Model (GMM) as a probabilistic model to identify and manipulate CCTC. Finally, ER4FL fed the coverage matrix without CCTC into FL pipeline.

Results:

Our experimental results demonstrate that ER4FL reduces the Mean First Rank (MFR) of Ochiai from 333.18 to 258.26, achieving a relative improvement of 22.49%. In addition, ER4FL decreases the number of checked statements in Convolutional Neural Network (CNN) FL from 859.20 to 579.65, corresponding to a relative reduction of 48.23%.

Conclusion:

The experimental results demonstrate that our method is statistically more effective than the six FL baselines, as well as the two CCTC detection methods.
上下文:错误定位(FL)是软件调试过程中的关键阶段,它使用执行覆盖率矩阵来确定程序源代码中错误或错误的确切位置。然而,研究人员已经证明,在测试套件中,执行错误语句而产生正确输出的巧合正确性测试用例(CCTC)非常普遍,并且会对错误定位的有效性产生负面影响。为了解决这一问题,我们提出了一种基于表示学习的CCTC检测方法ER4FL。该方法首先检测覆盖矩阵中的cctc,然后对其进行重新标记,最后利用优化后的覆盖矩阵进行故障定位。方法:ER4FL利用基于自动编码器的表示学习来细化覆盖矩阵,该矩阵以压缩形式捕获其最重要的特征。基于增强的表示(即紧凑覆盖矩阵),ER4FL采用高斯混合模型(Gaussian Mixture Model, GMM)作为概率模型来识别和操纵CCTC。最后,ER4FL将不含CCTC的覆盖矩阵送入FL管道。结果:我们的实验结果表明,ER4FL将Ochiai的Mean First Rank (MFR)从333.18降低到258.26,相对提高了22.49%。此外,ER4FL将卷积神经网络(CNN) FL中的检查语句数从859.20条减少到579.65条,相对减少48.23%。结论:实验结果表明,该方法在统计学上优于6种FL基线和2种CCTC检测方法。
{"title":"Representation learning for coincidental correctness in fault localization","authors":"Jian Hu","doi":"10.1016/j.infsof.2025.107978","DOIUrl":"10.1016/j.infsof.2025.107978","url":null,"abstract":"<div><h3>Context:</h3><div>Fault localization (FL) is a critical phase in the software debugging process, which employs execution coverage matrix to identify the exact locations of faults or bugs in a program’s source code. However, researchers have proved that coincidental correctness test cases (CCTC) that execute the faulty statements whereas produce the correct output are prevalent in test suites and can negatively affect the effectiveness of fault localization.</div></div><div><h3>Objective:</h3><div>To address this problem, we propose ER4FL: a representation learning based CCTC detection method for fault localization. Our method first detects the CCTCs in the coverage matrix, then relabels them, and finally uses the optimized coverage matrix for fault localization.</div></div><div><h3>Method:</h3><div>ER4FL leverages an autoencoder-based representation learning to refine the coverage matrix, which captures its most important features in a compressed form. Based on the enhanced representation (i.e., compact coverage matrix), ER4FL adopts a Gaussian Mixture Model (GMM) as a probabilistic model to identify and manipulate CCTC. Finally, ER4FL fed the coverage matrix without CCTC into FL pipeline.</div></div><div><h3>Results:</h3><div>Our experimental results demonstrate that ER4FL reduces the Mean First Rank (MFR) of Ochiai from 333.18 to 258.26, achieving a relative improvement of 22.49%. In addition, ER4FL decreases the number of checked statements in Convolutional Neural Network (CNN) FL from 859.20 to 579.65, corresponding to a relative reduction of 48.23%.</div></div><div><h3>Conclusion:</h3><div>The experimental results demonstrate that our method is statistically more effective than the six FL baselines, as well as the two CCTC detection methods.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 107978"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximizing quantum hardware utilization via multiprogramming circuits and shot-wise distribution 通过多路编程电路和单点分布最大化量子硬件利用率
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-12-20 DOI: 10.1016/j.infsof.2025.108005
Giuseppe Bisicchia , Jaime Alvarado-Valiente , Javier Romero-Álvarez , Jose Garcia-Alonso , Juan M. Murillo , Antonio Brogi

Context:

Quantum computing is rapidly evolving, offering new opportunities for solving problems in optimization, cryptography, and simulation. However, the limited availability of quantum resources makes efficient utilization of quantum hardware a current challenge. Today’s paradigms often lead to under-utilization of qubits, increased costs, and execution delays, especially in the NISQ era.

Objective:

This work aims to improve the utilization of quantum hardware by introducing an execution model that integrates multiprogramming at circuit level with quantum shot-wise distribution in a single policy-driven pipeline.

Methods:

An architecture has been implemented that combines circuit scheduling and shot distribution techniques to aggregate multiple circuits and distribute their shots across heterogeneous QPUs. The approach was empirically validated on actual IBM Quantum devices using a diverse set of reference circuits.

Results:

The proposal achieved a reduction in cost of 95% and a reduction in tasks 92%. Moreover, the fidelity analysis of the results showed an increase in noise, with an average increase of approximately 20% using different statistical distances.

Conclusions:

This research provides a usable and extensible solution to increase the efficiency, cost effectiveness, and resilience of quantum workload execution in heterogeneous and dynamic cloud environments. These results obtained suggest that users should weigh the implications of fidelity versus cost (and time) savings based on the application requirements and their goals.
背景:量子计算正在迅速发展,为解决优化、密码学和模拟方面的问题提供了新的机会。然而,量子资源的有限可用性使得量子硬件的有效利用成为当前的挑战。今天的范例经常导致量子位利用率不足、成本增加和执行延迟,特别是在NISQ时代。目的:本工作旨在通过引入一种执行模型来提高量子硬件的利用率,该模型在单个策略驱动的管道中集成了电路级的多编程和量子shot-wise分布。方法:实现了一种结合电路调度和镜头分配技术的体系结构,以聚合多个电路并将其镜头分布在异构qpu上。该方法在实际的IBM量子设备上使用多种参考电路进行了经验验证。结果:该方案成本降低95%,任务减少92%。此外,结果的保真度分析显示噪声增加,使用不同的统计距离平均增加约20%。结论:本研究提供了一种可用且可扩展的解决方案,以提高异构和动态云环境中量子工作负载执行的效率、成本效益和弹性。获得的这些结果表明,用户应该根据应用程序需求和他们的目标来权衡保真度与成本(和时间)节省的含义。
{"title":"Maximizing quantum hardware utilization via multiprogramming circuits and shot-wise distribution","authors":"Giuseppe Bisicchia ,&nbsp;Jaime Alvarado-Valiente ,&nbsp;Javier Romero-Álvarez ,&nbsp;Jose Garcia-Alonso ,&nbsp;Juan M. Murillo ,&nbsp;Antonio Brogi","doi":"10.1016/j.infsof.2025.108005","DOIUrl":"10.1016/j.infsof.2025.108005","url":null,"abstract":"<div><h3>Context:</h3><div>Quantum computing is rapidly evolving, offering new opportunities for solving problems in optimization, cryptography, and simulation. However, the limited availability of quantum resources makes efficient utilization of quantum hardware a current challenge. Today’s paradigms often lead to under-utilization of qubits, increased costs, and execution delays, especially in the NISQ era.</div></div><div><h3>Objective:</h3><div>This work aims to improve the utilization of quantum hardware by introducing an execution model that integrates multiprogramming at circuit level with quantum shot-wise distribution in a single policy-driven pipeline.</div></div><div><h3>Methods:</h3><div>An architecture has been implemented that combines circuit scheduling and shot distribution techniques to aggregate multiple circuits and distribute their shots across heterogeneous QPUs. The approach was empirically validated on actual IBM Quantum devices using a diverse set of reference circuits.</div></div><div><h3>Results:</h3><div>The proposal achieved a reduction in cost of 95% and a reduction in tasks 92%. Moreover, the fidelity analysis of the results showed an increase in noise, with an average increase of approximately 20% using different statistical distances.</div></div><div><h3>Conclusions:</h3><div>This research provides a usable and extensible solution to increase the efficiency, cost effectiveness, and resilience of quantum workload execution in heterogeneous and dynamic cloud environments. These results obtained suggest that users should weigh the implications of fidelity versus cost (and time) savings based on the application requirements and their goals.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 108005"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepRegion: Black-box efficient testing for DNNs based on region analysis DeepRegion:基于区域分析的dnn黑盒高效测试
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-12-08 DOI: 10.1016/j.infsof.2025.107995
Qing Sheng , Zhiyi Zhang , Shuxian Chen , Ziyuan Wang , Zhiqiu Huang

Context:

Deep Neural Networks (DNNs) have achieved remarkable success in many safety-critical domains. However, their vulnerability to small input perturbations poses serious security risks. White-box testing methods are often impractical in sensitive domains where access to model internals is restricted. Moreover, some recent studies have shown that neuron coverage does not have a strong correlation with the ability to detect faults, making these methods relatively inefficient. Black-box testing offers a viable alternative by evaluating models solely through input–output behavior. But current approaches typically apply uniform perturbations across the input space, overlooking the fact that DNN decisions often rely heavily on a small set of critical input regions, making them suffer from low efficiency and high query costs.

Objective:

To improve query efficiency in black-box testing of DNNs and reduce computational costs while enhancing defect detection within limited query budgets, this paper introduces DeepRegion, a novel black-box testing method based on region analysis.

Method:

First, we design a partition policy to initialize the partitioning of the original image. In each iteration, we apply a well-designed discriminant function to guide the localization process of the region, and apply the dynamic adjustment process in the partition policy to refine the partition for further analysis. After the regions containing important features are successfully located, we then implement a feedback-based approach to strategically perturb these regions, generating error-inducing inputs.

Results:

Our experimental evaluation on four widely used datasets and seven well-established DNN architectures shows that DeepRegion can not only improve testing efficiency but also enhance defect detection within limited query budgets.

Conclusion:

DeepRegion is capable of automatically generating high-quality test inputs to detect the inconsistency of the target systems under the condition of limited resources, and the error-inducing inputs found by DeepRegion can be used to fine-tune target models to improve accuracy.
背景:深度神经网络(dnn)在许多安全关键领域取得了显著的成功。然而,它们对小输入扰动的脆弱性构成了严重的安全风险。在访问模型内部受到限制的敏感领域,白盒测试方法通常是不切实际的。此外,最近的一些研究表明,神经元覆盖率与检测故障的能力没有很强的相关性,使得这些方法相对低效。黑盒测试提供了一种可行的替代方案,即仅通过输入-输出行为来评估模型。但目前的方法通常在输入空间中应用均匀扰动,忽略了DNN决策通常严重依赖于一小部分关键输入区域的事实,使它们遭受低效率和高查询成本的困扰。目的:为了提高深度神经网络黑盒测试的查询效率,降低计算成本,同时在有限的查询预算下增强缺陷检测能力,本文介绍了一种基于区域分析的新型黑盒测试方法DeepRegion。方法:首先,我们设计一个分区策略来初始化原始映像的分区。在每次迭代中,我们采用精心设计的判别函数来指导区域的定位过程,并采用分区策略中的动态调整过程来细化分区,以便进一步分析。在包含重要特征的区域被成功定位后,我们然后实施一种基于反馈的方法来策略性地扰动这些区域,产生误差诱导输入。结果:我们在4个广泛使用的数据集和7个完善的深度神经网络架构上的实验评估表明,DeepRegion不仅可以提高测试效率,而且可以在有限的查询预算下增强缺陷检测。结论:DeepRegion能够在资源有限的情况下自动生成高质量的测试输入来检测目标系统的不一致性,并且DeepRegion发现的误差诱导输入可以用于对目标模型进行微调以提高准确性。
{"title":"DeepRegion: Black-box efficient testing for DNNs based on region analysis","authors":"Qing Sheng ,&nbsp;Zhiyi Zhang ,&nbsp;Shuxian Chen ,&nbsp;Ziyuan Wang ,&nbsp;Zhiqiu Huang","doi":"10.1016/j.infsof.2025.107995","DOIUrl":"10.1016/j.infsof.2025.107995","url":null,"abstract":"<div><h3>Context:</h3><div>Deep Neural Networks (DNNs) have achieved remarkable success in many safety-critical domains. However, their vulnerability to small input perturbations poses serious security risks. White-box testing methods are often impractical in sensitive domains where access to model internals is restricted. Moreover, some recent studies have shown that neuron coverage does not have a strong correlation with the ability to detect faults, making these methods relatively inefficient. Black-box testing offers a viable alternative by evaluating models solely through input–output behavior. But current approaches typically apply uniform perturbations across the input space, overlooking the fact that DNN decisions often rely heavily on a small set of critical input regions, making them suffer from low efficiency and high query costs.</div></div><div><h3>Objective:</h3><div>To improve query efficiency in black-box testing of DNNs and reduce computational costs while enhancing defect detection within limited query budgets, this paper introduces DeepRegion, a novel black-box testing method based on region analysis.</div></div><div><h3>Method:</h3><div>First, we design a partition policy to initialize the partitioning of the original image. In each iteration, we apply a well-designed discriminant function to guide the localization process of the region, and apply the dynamic adjustment process in the partition policy to refine the partition for further analysis. After the regions containing important features are successfully located, we then implement a feedback-based approach to strategically perturb these regions, generating error-inducing inputs.</div></div><div><h3>Results:</h3><div>Our experimental evaluation on four widely used datasets and seven well-established DNN architectures shows that DeepRegion can not only improve testing efficiency but also enhance defect detection within limited query budgets.</div></div><div><h3>Conclusion:</h3><div>DeepRegion is capable of automatically generating high-quality test inputs to detect the inconsistency of the target systems under the condition of limited resources, and the error-inducing inputs found by DeepRegion can be used to fine-tune target models to improve accuracy.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 107995"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward an automated cross-multimodal verification of mobile app bug fixes integrating user feedback, developer responses, changelogs, and UI visual analysis 面向集成用户反馈、开发者响应、变更日志和UI可视化分析的移动应用漏洞修复的自动跨多模式验证
IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2025-12-08 DOI: 10.1016/j.infsof.2025.107996
Rhodes Massenon , Ishaya Gambo , Javed Ali Khan

Context

Verifying claimed bug fixes in mobile applications is crucial, yet the "fixed but not resolved" phenomenon remains a persistent challenge. Existing bug analysis tools focus on pre-fix tasks like detection and reproduction, but lack mechanisms to holistically verify a fix post-deployment by cross-referencing developer claims, visual UI changes, and subsequent user feedback. This gap leads to persistent bugs, wasted developer effort, and user dissatisfaction.

Objective

This paper introduces BUGFixChecker, the first framework for automated, multimodal cross-verification of mobile app bug fixes. Our primary goal is to determine if a claimed fix has truly resolved a user-reported issue.

Methods

BUGFixChecker integrates five data sources: the original user bug report, the developer's fix claim, "before" and "after" UI screenshots, and post-fix user reviews. The core methodology employs a Multimodal Large Language Model (MLLM) guided by a Chain-of-Thought prompt to perform a comparative reasoning task. We evaluated the framework on a curated dataset of 53 real-world bug fix cases from Android applications.

Results

BUGFixChecker achieved a high overall accuracy of 83.0 % and a macro F1-score of 0.805 in correctly verifying the status of bug fixes. It proved particularly effective at identifying discrepancies with strong evidentiary signals, such as "Unresolved Visual Mismatch" (F1-score = 0.865). Most significantly, a rigorouss ablation study demonstrated the critical contribution of the visual modality: the full multimodal framework outperformed a text-only baseline by over 19 % points in F1-score (0.805 vs. 0.610), proving that visual evidence is indispensable for this task.

Conclusion

BUGFixChecker offers a novel and pragmatic approach to automated bug fix verification. By moving beyond pre-fix analysis to the critical post-fix verification stage, our multimodal framework provides a scalable solution to enhance the integrity of bug tracking systems, reduce developer workload, and ensure higher software quality in rapidly evolving mobile ecosystems.
验证移动应用程序中声称的bug修复是至关重要的,然而“修复但未解决”的现象仍然是一个持续的挑战。现有的bug分析工具侧重于修复前的任务,如检测和复制,但缺乏通过交叉引用开发人员声明、可视化UI更改和随后的用户反馈来全面验证部署后的修复的机制。这种差距导致了持续的bug,浪费了开发人员的努力和用户的不满。本文介绍了BUGFixChecker,这是第一个用于自动、多模式交叉验证移动应用程序错误修复的框架。我们的主要目标是确定声称的修复是否真正解决了用户报告的问题。方法:bugfixchecker集成了五个数据源:原始用户bug报告、开发人员的修复声明、“修复前”和“修复后”的UI截图以及修复后的用户评论。核心方法采用由思维链提示引导的多模态大语言模型(MLLM)来执行比较推理任务。我们在一个收集了来自Android应用程序的53个真实bug修复案例的数据集上评估了该框架。结果bugfixchecker在正确验证错误修复状态方面达到了83.0%的高总体准确率和0.805的宏f1分数。它被证明在识别具有强烈证据信号的差异方面特别有效,例如“未解决的视觉不匹配”(F1-score = 0.865)。最重要的是,一项严格的消融研究证明了视觉模式的关键贡献:完整的多模式框架在f1得分上比纯文本基线高出19%以上(0.805比0.610),证明视觉证据对于这项任务是不可或缺的。结论:bugfixchecker提供了一种新颖而实用的方法来自动进行错误修复验证。通过从修复前分析过渡到关键的修复后验证阶段,我们的多模式框架提供了一个可扩展的解决方案,以增强缺陷跟踪系统的完整性,减少开发人员的工作量,并确保在快速发展的移动生态系统中获得更高的软件质量。
{"title":"Toward an automated cross-multimodal verification of mobile app bug fixes integrating user feedback, developer responses, changelogs, and UI visual analysis","authors":"Rhodes Massenon ,&nbsp;Ishaya Gambo ,&nbsp;Javed Ali Khan","doi":"10.1016/j.infsof.2025.107996","DOIUrl":"10.1016/j.infsof.2025.107996","url":null,"abstract":"<div><h3>Context</h3><div>Verifying claimed bug fixes in mobile applications is crucial, yet the \"fixed but not resolved\" phenomenon remains a persistent challenge. Existing bug analysis tools focus on pre-fix tasks like detection and reproduction, but lack mechanisms to holistically verify a fix post-deployment by cross-referencing developer claims, visual UI changes, and subsequent user feedback. This gap leads to persistent bugs, wasted developer effort, and user dissatisfaction.</div></div><div><h3>Objective</h3><div>This paper introduces BUGFixChecker, the first framework for automated, multimodal cross-verification of mobile app bug fixes. Our primary goal is to determine if a claimed fix has truly resolved a user-reported issue.</div></div><div><h3>Methods</h3><div>BUGFixChecker integrates five data sources: the original user bug report, the developer's fix claim, \"before\" and \"after\" UI screenshots, and post-fix user reviews. The core methodology employs a Multimodal Large Language Model (MLLM) guided by a Chain-of-Thought prompt to perform a comparative reasoning task. We evaluated the framework on a curated dataset of 53 real-world bug fix cases from Android applications.</div></div><div><h3>Results</h3><div>BUGFixChecker achieved a high overall accuracy of 83.0 % and a macro F1-score of 0.805 in correctly verifying the status of bug fixes. It proved particularly effective at identifying discrepancies with strong evidentiary signals, such as \"Unresolved Visual Mismatch\" (F1-score = 0.865). Most significantly, a rigorouss ablation study demonstrated the critical contribution of the visual modality: the full multimodal framework outperformed a text-only baseline by over 19 % points in F1-score (0.805 vs. 0.610), proving that visual evidence is indispensable for this task.</div></div><div><h3>Conclusion</h3><div>BUGFixChecker offers a novel and pragmatic approach to automated bug fix verification. By moving beyond pre-fix analysis to the critical post-fix verification stage, our multimodal framework provides a scalable solution to enhance the integrity of bug tracking systems, reduce developer workload, and ensure higher software quality in rapidly evolving mobile ecosystems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"191 ","pages":"Article 107996"},"PeriodicalIF":4.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information and Software Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1