首页 > 最新文献

Journal of Systems and Software最新文献

英文 中文
A comparative analysis of industrial involvement and licensing in the open source software ecosystems of four IoT standards 四种物联网标准开源软件生态系统的产业参与和许可比较分析
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-22 DOI: 10.1016/j.jss.2025.112708
Gregorio Robles , Jonas Gamalielsson , Björn Lundell , Christoffer Brax , Tomas Persson , Anders Mattsson , Tomas Gustavsson , Jonas Feist , Jonas Öberg
Context: IoT standards are vital for interoperability and longevity, with Open Source Software (OSS) implementations preventing vendor lock-in. These implementations form vast software ecosystems on platforms like GitHub, where industrial participation is crucial. Goal: This study characterizes industrial involvement (participation, leadership, collaboration) across the software ecosystems of four IoT standards (LwM2M, NB-IoT, CoAP, Zigbee) from different standards-setting organizations. It also investigates how software licensing, particularly OSS licenses, reflects and shapes this involvement. Method: We analyzed software projects related to these standards that are publicly available on the GitHub platform, examining authorship of commits, bug reports, pull requests, and metadata like licenses. We identified organizational affiliations (corporate or academic) of contributors to assess their presence and leadership. We performed a licensing analysis to understand the legal frameworks governing these projects. Results: Our research shows significant diversity in ecosystem scale and activity, with a consistent pattern of major corporate and organizational leadership in highly active projects. Despite robust institutional involvement, a pervasive issue is the widespread absence of explicit software licenses, even in collaborative and active repositories. When licenses are present, permissive OSS licenses (e.g., Apache-2.0, MIT) dominate. This indicates a complex and often ambiguous legal landscape. Conclusion: IoT standard ecosystem growth is driven by established organizations. Addressing the prevalent lack of licensing is crucial for fostering clearer collaboration, mitigating legal risks, and ensuring long-term sustainability and adoption of these foundational technologies.
背景:物联网标准对于互操作性和寿命至关重要,开源软件(OSS)的实施可以防止供应商锁定。这些实现在GitHub等平台上形成了庞大的软件生态系统,在这些平台上,行业参与至关重要。目标:本研究描述了来自不同标准制定组织的四种物联网标准(LwM2M、NB-IoT、CoAP、Zigbee)的软件生态系统中的工业参与(参与、领导、协作)。它还研究了软件许可,特别是OSS许可,是如何反映和塑造这种参与的。方法:我们分析了与这些标准相关的软件项目,这些项目在GitHub平台上公开可用,检查提交、bug报告、pull请求和元数据(如许可证)的作者身份。我们确定了贡献者的组织关系(公司或学术),以评估他们的存在和领导能力。我们进行了许可分析,以了解管理这些项目的法律框架。结果:我们的研究显示了生态系统规模和活动的显著多样性,在高度活跃的项目中,主要企业和组织的领导模式是一致的。尽管有强有力的机构参与,但普遍存在的问题是广泛缺乏明确的软件许可,即使在协作和活跃的存储库中也是如此。当许可证存在时,宽松的OSS许可证(例如,Apache-2.0, MIT)占主导地位。这表明一个复杂且往往模棱两可的法律环境。结论:物联网标准生态系统的增长是由成熟的组织推动的。解决普遍缺乏许可的问题对于促进更明确的合作、减轻法律风险以及确保这些基础技术的长期可持续性和采用至关重要。
{"title":"A comparative analysis of industrial involvement and licensing in the open source software ecosystems of four IoT standards","authors":"Gregorio Robles ,&nbsp;Jonas Gamalielsson ,&nbsp;Björn Lundell ,&nbsp;Christoffer Brax ,&nbsp;Tomas Persson ,&nbsp;Anders Mattsson ,&nbsp;Tomas Gustavsson ,&nbsp;Jonas Feist ,&nbsp;Jonas Öberg","doi":"10.1016/j.jss.2025.112708","DOIUrl":"10.1016/j.jss.2025.112708","url":null,"abstract":"<div><div>Context: IoT standards are vital for interoperability and longevity, with Open Source Software (OSS) implementations preventing vendor lock-in. These implementations form vast software ecosystems on platforms like GitHub, where industrial participation is crucial. Goal: This study characterizes industrial involvement (participation, leadership, collaboration) across the software ecosystems of four IoT standards (LwM2M, NB-IoT, CoAP, Zigbee) from different standards-setting organizations. It also investigates how software licensing, particularly OSS licenses, reflects and shapes this involvement. Method: We analyzed software projects related to these standards that are publicly available on the GitHub platform, examining authorship of commits, bug reports, pull requests, and metadata like licenses. We identified organizational affiliations (corporate or academic) of contributors to assess their presence and leadership. We performed a licensing analysis to understand the legal frameworks governing these projects. Results: Our research shows significant diversity in ecosystem scale and activity, with a consistent pattern of major corporate and organizational leadership in highly active projects. Despite robust institutional involvement, a pervasive issue is the widespread absence of explicit software licenses, even in collaborative and active repositories. When licenses are present, permissive OSS licenses (e.g., Apache-2.0, MIT) dominate. This indicates a complex and often ambiguous legal landscape. Conclusion: IoT standard ecosystem growth is driven by established organizations. Addressing the prevalent lack of licensing is crucial for fostering clearer collaboration, mitigating legal risks, and ensuring long-term sustainability and adoption of these foundational technologies.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112708"},"PeriodicalIF":4.1,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning for software aging detection: A systematic mapping study 软件老化检测的机器学习:系统映射研究
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-22 DOI: 10.1016/j.jss.2025.112715
Rafael José Moura , Maria Gizele Nascimento , Fumio Machida , Domenico Cotroneo , Ermeson Andrade
Software aging is characterized by the gradual degradation of system reliability and performance due to software issues such as memory leaks, resource exhaustion, or the accumulation of numerical errors over time. This phenomenon can lead to critical failures in production environments, making efficient aging detection essential to ensure system reliability. Several existing studies have explored methods for identifying software aging, with a particular focus on the use of Machine Learning (ML) algorithms. As a variety of ML algorithms has been used recently, it is necessary to understand the state of the art and the main trends in this domain. This study aims to classify software aging detection approaches and techniques that use ML through a Systematic Mapping Study (SMS). As key outcomes, we identify the most commonly used algorithms, the most popular aging indicators, the open datasets available for software aging detection research, the challenges faced by the field, and new directions for future investigations. We expect this work to contribute meaningfully to the software aging field by providing new research perspectives, practical insights, and guidance applicable to real-world scenarios, supporting both researchers and software practitioners.
软件老化的特点是系统可靠性和性能的逐渐退化,这是由于软件问题造成的,比如内存泄漏、资源耗尽,或者随着时间的推移数值错误的积累。这种现象可能导致生产环境中的严重故障,因此有效的老化检测对于确保系统可靠性至关重要。一些现有的研究已经探索了识别软件老化的方法,特别关注机器学习(ML)算法的使用。由于最近使用了各种ML算法,因此有必要了解该领域的最新技术和主要趋势。本研究旨在通过系统映射研究(SMS)对使用ML的软件老化检测方法和技术进行分类。作为关键成果,我们确定了最常用的算法、最流行的老化指标、可用于软件老化检测研究的开放数据集、该领域面临的挑战以及未来研究的新方向。我们期望这项工作通过提供新的研究视角、实用的见解和适用于现实世界场景的指导,为研究人员和软件从业者提供支持,从而对软件老化领域做出有意义的贡献。
{"title":"Machine learning for software aging detection: A systematic mapping study","authors":"Rafael José Moura ,&nbsp;Maria Gizele Nascimento ,&nbsp;Fumio Machida ,&nbsp;Domenico Cotroneo ,&nbsp;Ermeson Andrade","doi":"10.1016/j.jss.2025.112715","DOIUrl":"10.1016/j.jss.2025.112715","url":null,"abstract":"<div><div>Software aging is characterized by the gradual degradation of system reliability and performance due to software issues such as memory leaks, resource exhaustion, or the accumulation of numerical errors over time. This phenomenon can lead to critical failures in production environments, making efficient aging detection essential to ensure system reliability. Several existing studies have explored methods for identifying software aging, with a particular focus on the use of Machine Learning (ML) algorithms. As a variety of ML algorithms has been used recently, it is necessary to understand the state of the art and the main trends in this domain. This study aims to classify software aging detection approaches and techniques that use ML through a Systematic Mapping Study (SMS). As key outcomes, we identify the most commonly used algorithms, the most popular aging indicators, the open datasets available for software aging detection research, the challenges faced by the field, and new directions for future investigations. We expect this work to contribute meaningfully to the software aging field by providing new research perspectives, practical insights, and guidance applicable to real-world scenarios, supporting both researchers and software practitioners.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112715"},"PeriodicalIF":4.1,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automotive software product lines for ECU software configuration: A systematic literature review ECU软件配置的汽车软件产品线:系统的文献综述
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-22 DOI: 10.1016/j.jss.2025.112716
Yannick Lindebauer , Richard von Esebeck , Thomas Vietor
In the automotive industry, vehicles are increasingly configured by software to accommodate individual customer preferences. This leads to a growing number of software variants and calibration parameters that must be managed consistently. Efficient handling of this variability benefits from the application of (systems and) software product line engineering (SPLE), as it offers structured mechanisms for reuse, traceability, and systematic variability management. These benefits are particularly relevant for electronic control unit (ECU) configuration, where maintaining consistency across features, parameters, and vehicle variants is crucial. However, existing approaches in industry, e.g. those relying on configurable bills of materials, indicate that SPLE has yet to see widespread adoption in the configuration of ECUs. Instead, proprietary approaches dominate, often lacking integration, transparency, and scalability. This study investigates the reasons why SPLE is not widely adopted for this use case. The underlying hypothesis suggests that industrial requirements are not accurately captured in academic research, making the implementation of scientific SPLE concepts difficult. To examine this assumption, a systematic literature review was conducted, analyzing relevant publications. The findings indicate a pressing need for closer collaboration between industry and academia to better identify challenges and requirements. Furthermore, current and emerging developments, such as software-defined vehicles (SDVs), require greater consideration in ECU configuration research. Our hypothesis was largely confirmed, indicating that SPLE research must be further extended and refined to meet practical ECU configuration needs. Accordingly, a concise, end-to-end methodology is needed to support SPLE-based calibration processes in SDV environments with increasingly decoupled hardware and software.
在汽车行业,车辆越来越多地由软件配置,以适应个人客户的偏好。这导致越来越多的软件变体和校准参数必须得到一致的管理。有效地处理这种可变性得益于(系统和)软件产品线工程(SPLE)的应用,因为它为重用、可追溯性和系统的可变性管理提供了结构化的机制。这些优点与电子控制单元(ECU)配置尤其相关,因为在ECU配置中,保持功能、参数和车辆型号的一致性至关重要。然而,工业中现有的方法,例如那些依赖于可配置材料清单的方法,表明SPLE尚未在ecu的配置中得到广泛采用。相反,专有方法占主导地位,通常缺乏集成、透明度和可伸缩性。本研究调查了为什么在这个用例中没有广泛采用SPLE的原因。潜在的假设表明,在学术研究中没有准确地捕捉到工业需求,使得科学的SPLE概念难以实施。为了检验这一假设,我们进行了系统的文献综述,分析了相关的出版物。研究结果表明,工业界和学术界迫切需要更密切的合作,以更好地确定挑战和需求。此外,当前和新兴的发展,如软件定义车辆(sdv),需要在ECU配置研究中更多地考虑。我们的假设在很大程度上得到了证实,表明SPLE研究必须进一步扩展和完善,以满足实际的ECU配置需求。因此,需要一种简洁的端到端方法来支持SDV环境中基于脾脏的校准过程,硬件和软件越来越不耦合。
{"title":"Automotive software product lines for ECU software configuration: A systematic literature review","authors":"Yannick Lindebauer ,&nbsp;Richard von Esebeck ,&nbsp;Thomas Vietor","doi":"10.1016/j.jss.2025.112716","DOIUrl":"10.1016/j.jss.2025.112716","url":null,"abstract":"<div><div>In the automotive industry, vehicles are increasingly configured by software to accommodate individual customer preferences. This leads to a growing number of software variants and calibration parameters that must be managed consistently. Efficient handling of this variability benefits from the application of (systems and) software product line engineering (SPLE), as it offers structured mechanisms for reuse, traceability, and systematic variability management. These benefits are particularly relevant for electronic control unit (ECU) configuration, where maintaining consistency across features, parameters, and vehicle variants is crucial. However, existing approaches in industry, e.g. those relying on configurable bills of materials, indicate that SPLE has yet to see widespread adoption in the configuration of ECUs. Instead, proprietary approaches dominate, often lacking integration, transparency, and scalability. This study investigates the reasons why SPLE is not widely adopted for this use case. The underlying hypothesis suggests that industrial requirements are not accurately captured in academic research, making the implementation of scientific SPLE concepts difficult. To examine this assumption, a systematic literature review was conducted, analyzing relevant publications. The findings indicate a pressing need for closer collaboration between industry and academia to better identify challenges and requirements. Furthermore, current and emerging developments, such as software-defined vehicles (SDVs), require greater consideration in ECU configuration research. Our hypothesis was largely confirmed, indicating that SPLE research must be further extended and refined to meet practical ECU configuration needs. Accordingly, a concise, end-to-end methodology is needed to support SPLE-based calibration processes in SDV environments with increasingly decoupled hardware and software.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112716"},"PeriodicalIF":4.1,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145645683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GraMuS: Boosting statement-level fault localization via graph representation and multimodal information GraMuS:通过图形表示和多模态信息增强语句级故障定位
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-20 DOI: 10.1016/j.jss.2025.112700
Ruishi Huang , Binbin Yang , Shumei Wu , Zheng Li , Doyle Paul , Xiao-Yi Zhang , Xiang Chen , Yong Liu
Fault Localization (FL) aims to reduce the cost of manual debugging by highlighting the statements which are more likely responsible for observed failures. However, existing techniques have limited effectiveness in practice due to inflexible suspiciousness evaluations and oversimplified representation of execution information. In this paper, we propose GraMuS, a novel Graph representation learning and Multimodal information based technique for Statement-level FL. GraMuS comprises two key components: a fine-grained fault diagnosis graph and a multi-level collaborative suspiciousness evaluation. The former integrally records enriched multimodal information from various levels of granularity (including methods, statements, and mutants) by a graph structure. The latter utilizes the interactions between FL tasks at various levels of granularity to extract existing/latent useful features from multimodal information for improving FL precision. Empirical studies on the widely used Defects4J(V2.0.0) dataset show that GraMuS can outperform state-of-the-art baselines in both single-fault programs and multiple-fault programs, including one large language models, four learning-based FL techniques, three variable-based FL techniques, 36 spectrum-based FL techniques, and 36 mutation-based FL techniques. In particular, GraMuS can localize 26/29/31 more faulty statements than the state-of-the-art baseline ChatGPT-4/DepGraph/VarDT, in terms of TOP1 metric. Further investigation shows that the method-level FL task can help GraMuS localize 27 more faulty statements, resulting in a 50.94 % improvement. Finally, we further evaluate GraMuS in 374 Python programs from ConDefects, and find that GraMuS consistently outperforms state-of-the-art FL techniques, showing its generality.
故障定位(FL)旨在通过突出显示更有可能导致观察到的故障的语句来减少手动调试的成本。然而,现有的技术在实践中的有效性有限,因为不灵活的怀疑评估和过于简化的执行信息表示。在本文中,我们提出了一种新的基于多模态信息的图形表示学习技术GraMuS。GraMuS包括两个关键组件:细粒度故障诊断图和多层次协同怀疑评估。前者通过图结构完整地记录来自不同粒度级别(包括方法、语句和突变)的丰富的多模态信息。后者利用不同粒度级别的FL任务之间的相互作用,从多模态信息中提取现有/潜在的有用特征,以提高FL精度。对广泛使用的Defects4J(V2.0.0)数据集的实证研究表明,GraMuS在单故障程序和多故障程序中都可以优于最先进的基线,包括一个大型语言模型、四种基于学习的FL技术、三种基于变量的FL技术、36种基于频谱的FL技术和36种基于突变的FL技术。特别是,就TOP−1度量而言,与最先进的基线ChatGPT-4/DepGraph/VarDT相比,GraMuS可以多定位26/29/31个错误语句。进一步的研究表明,方法级的FL任务可以帮助GraMuS定位27个错误语句,结果提高了50.94%。最后,我们在ConDefects的374个Python程序中进一步评估了GraMuS,并发现GraMuS始终优于最先进的FL技术,显示出其通用性。
{"title":"GraMuS: Boosting statement-level fault localization via graph representation and multimodal information","authors":"Ruishi Huang ,&nbsp;Binbin Yang ,&nbsp;Shumei Wu ,&nbsp;Zheng Li ,&nbsp;Doyle Paul ,&nbsp;Xiao-Yi Zhang ,&nbsp;Xiang Chen ,&nbsp;Yong Liu","doi":"10.1016/j.jss.2025.112700","DOIUrl":"10.1016/j.jss.2025.112700","url":null,"abstract":"<div><div>Fault Localization (FL) aims to reduce the cost of manual debugging by highlighting the statements which are more likely responsible for observed failures. However, existing techniques have limited effectiveness in practice due to inflexible suspiciousness evaluations and oversimplified representation of execution information. In this paper, we propose GraMuS, a novel <em><strong>Gra</strong></em>ph representation learning and <em><strong>Mu</strong></em>ltimodal information based technique for <em><strong>S</strong></em>tatement-level FL. GraMuS comprises two key components: a fine-grained fault diagnosis graph and a multi-level collaborative suspiciousness evaluation. The former integrally records enriched multimodal information from various levels of granularity (including methods, statements, and mutants) by a graph structure. The latter utilizes the interactions between FL tasks at various levels of granularity to extract existing/latent useful features from multimodal information for improving FL precision. Empirical studies on the widely used Defects4J(V2.0.0) dataset show that GraMuS can outperform state-of-the-art baselines in both single-fault programs and multiple-fault programs, including one large language models, four learning-based FL techniques, three variable-based FL techniques, 36 spectrum-based FL techniques, and 36 mutation-based FL techniques. In particular, GraMuS can localize 26/29/31 more faulty statements than the state-of-the-art baseline ChatGPT-4/DepGraph/VarDT, in terms of <span><math><mrow><mi>T</mi><mi>O</mi><mi>P</mi><mspace></mspace><mo>−</mo><mspace></mspace><mn>1</mn></mrow></math></span> metric. Further investigation shows that the method-level FL task can help GraMuS localize 27 more faulty statements, resulting in a 50.94 % improvement. Finally, we further evaluate GraMuS in 374 Python programs from ConDefects, and find that GraMuS consistently outperforms state-of-the-art FL techniques, showing its generality.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112700"},"PeriodicalIF":4.1,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated end-to-end testing for conversational agents 会话代理的自动化端到端测试
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-19 DOI: 10.1016/j.jss.2025.112685
Juan de Lara , Alejandro del Pozzo , Esther Guerra , Jesús Sánchez Cuadrado
The advances in generative artificial intelligence, especially Large Language Models (LLMs), have prompted the proliferation of conversational agents (or chatbots). These can be general-purpose – like ChatGPT – or tailored to specific tasks – like buying tickets or obtaining customer support. Although chatbots play a significant role in today’s software ecosystem, they are hard to test: defining meaningful, thorough tests is time-consuming, and setting an oracle flexible to conversational variations is challenging. This is aggravated when testing LLM-based chatbots, as their conversation is natural but unpredictable.
To alleviate this problem, we present an end-to-end testing approach for conversational agents, comprising two components. First, a highly customisable user simulator that generates meaningful conversations with a chatbot under test, for the given goals (e.g., setting an appointment) and communication styles (e.g., long/short phrases, spelling mistakes). Second, a domain-specific language to specify and check correctness conditions (assertions and metamorphic relations) on the generated conversations. The conditions can assess functional correctness (e.g., booking more tickets costs more) and interaction styles (e.g., the chatbot responds in English and does not deviate from certain topics). This paper describes the approach, an implementation enabling chatbots’ testing independently of their technology, and an evaluation of its effectiveness in finding defects. We tested our tool on chatbots with artificially injected errors, and on third-party, real-world chatbots. Our tool detected between 81.25 % and 100 % of the injected errors, and identified actual functional issues in the real-world chatbots by applying manually defined correctness rules.
生成式人工智能的进步,尤其是大型语言模型(llm),促进了对话代理(或聊天机器人)的激增。这些工具可以是通用的(如ChatGPT),也可以是针对特定任务定制的(如购票或获得客户支持)。尽管聊天机器人在当今的软件生态系统中扮演着重要的角色,但它们很难测试:定义有意义的、彻底的测试是耗时的,设置一个灵活的oracle以适应会话变化是具有挑战性的。在测试基于llm的聊天机器人时,这种情况会更加严重,因为它们的对话自然但不可预测。为了缓解这个问题,我们提出了一种会话代理的端到端测试方法,它包括两个组件。首先,一个高度可定制的用户模拟器,针对给定的目标(例如,设定约会)和沟通风格(例如,长/短短语,拼写错误),与被测聊天机器人生成有意义的对话。第二,特定于领域的语言,用于指定和检查生成对话的正确性条件(断言和变形关系)。这些条件可以评估功能的正确性(例如,预订更多的票要花更多的钱)和交互风格(例如,聊天机器人用英语回应,不会偏离某些主题)。本文描述了这种方法,一种使聊天机器人能够独立于其技术进行测试的实现,以及对其在发现缺陷方面的有效性的评估。我们在带有人为注入错误的聊天机器人以及第三方聊天机器人上测试了我们的工具。我们的工具检测到81.25%到100%的注入错误,并通过应用手动定义的正确性规则来识别真实聊天机器人中的实际功能问题。
{"title":"Automated end-to-end testing for conversational agents","authors":"Juan de Lara ,&nbsp;Alejandro del Pozzo ,&nbsp;Esther Guerra ,&nbsp;Jesús Sánchez Cuadrado","doi":"10.1016/j.jss.2025.112685","DOIUrl":"10.1016/j.jss.2025.112685","url":null,"abstract":"<div><div>The advances in generative artificial intelligence, especially Large Language Models (LLMs), have prompted the proliferation of conversational agents (or chatbots). These can be general-purpose – like ChatGPT – or tailored to specific tasks – like buying tickets or obtaining customer support. Although chatbots play a significant role in today’s software ecosystem, they are hard to test: defining meaningful, thorough tests is time-consuming, and setting an oracle flexible to conversational variations is challenging. This is aggravated when testing LLM-based chatbots, as their conversation is natural but unpredictable.</div><div>To alleviate this problem, we present an end-to-end testing approach for conversational agents, comprising two components. First, a highly customisable user simulator that generates meaningful conversations with a chatbot under test, for the given goals (e.g., setting an appointment) and communication styles (e.g., long/short phrases, spelling mistakes). Second, a domain-specific language to specify and check correctness conditions (assertions and metamorphic relations) on the generated conversations. The conditions can assess functional correctness (e.g., booking more tickets costs more) and interaction styles (e.g., the chatbot responds in English and does not deviate from certain topics). This paper describes the approach, an implementation enabling chatbots’ testing independently of their technology, and an evaluation of its effectiveness in finding defects. We tested our tool on chatbots with artificially injected errors, and on third-party, real-world chatbots. Our tool detected between 81.25 % and 100 % of the injected errors, and identified actual functional issues in the real-world chatbots by applying manually defined correctness rules.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112685"},"PeriodicalIF":4.1,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating correctness, performance and energy footprint of semantic reasoners in mobile edge computing 评估移动边缘计算中语义推理器的正确性、性能和能量足迹
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-16 DOI: 10.1016/j.jss.2025.112696
Ivano Bilenchi, Davide Loconte, Floriano Scioscia, Michele Ruta
The integration of Semantic Web technologies into Mobile Edge Computing (MEC) platforms is enhancing the capabilities of real-time, context-aware applications across diverse domains. MEC brings processing closer to the network edge, reducing latency and allowing for the improvement of data privacy, while Semantic Web technologies provide machine-interpretable knowledge representation and reasoning capabilities. Despite their potential, deploying semantic reasoners on edge devices is challenging due to their resource-intensive nature, which requires significant memory availability, computational power, and energy. Furthermore, correctness, performance and energy consumption are simultaneously important, as MEC semantics-based applications often call for real-time queries for autonomous agent decision or user-oriented decision support. This paper presents an extensive experimental evaluation of Web Ontology Language (OWL) reasoners deployed in MEC environments, assessing correctness, processing time, memory usage, and energy consumption across both a reference tablet and a single-board computer. For energy measurement, both software profiling and hardware monitoring have been exploited and compared. The study is supported by a modular, cross-platform benchmarking framework that automates data collection and ensures reproducibility. The findings highlight the trade-offs between reasoning capabilities and resource consumption, offering valuable insights for refining testing methodologies as well as optimizing semantic reasoners in MEC settings.
语义网技术与移动边缘计算(MEC)平台的集成增强了跨不同领域的实时、上下文感知应用程序的能力。MEC使处理更接近网络边缘,减少延迟并允许改进数据隐私,而语义Web技术提供机器可解释的知识表示和推理能力。尽管具有潜力,但在边缘设备上部署语义推理器具有挑战性,因为它们需要大量的内存可用性、计算能力和能源。此外,正确性、性能和能耗同时很重要,因为基于MEC语义的应用程序经常需要实时查询来进行自主代理决策或面向用户的决策支持。本文对部署在MEC环境中的Web本体语言(OWL)推理器进行了广泛的实验评估,评估了在参考平板电脑和单板计算机上的正确性、处理时间、内存使用和能耗。在能量测量方面,对软件分析和硬件监测两种方法进行了探讨和比较。该研究由模块化、跨平台基准测试框架支持,该框架可自动收集数据并确保可重复性。研究结果强调了推理能力和资源消耗之间的权衡,为改进测试方法以及优化MEC设置中的语义推理器提供了有价值的见解。
{"title":"Evaluating correctness, performance and energy footprint of semantic reasoners in mobile edge computing","authors":"Ivano Bilenchi,&nbsp;Davide Loconte,&nbsp;Floriano Scioscia,&nbsp;Michele Ruta","doi":"10.1016/j.jss.2025.112696","DOIUrl":"10.1016/j.jss.2025.112696","url":null,"abstract":"<div><div>The integration of Semantic Web technologies into Mobile Edge Computing (MEC) platforms is enhancing the capabilities of real-time, context-aware applications across diverse domains. MEC brings processing closer to the network edge, reducing latency and allowing for the improvement of data privacy, while Semantic Web technologies provide machine-interpretable knowledge representation and reasoning capabilities. Despite their potential, deploying semantic reasoners on edge devices is challenging due to their resource-intensive nature, which requires significant memory availability, computational power, and energy. Furthermore, correctness, performance and energy consumption are simultaneously important, as MEC semantics-based applications often call for real-time queries for autonomous agent decision or user-oriented decision support. This paper presents an extensive experimental evaluation of Web Ontology Language (OWL) reasoners deployed in MEC environments, assessing correctness, processing time, memory usage, and energy consumption across both a reference tablet and a single-board computer. For energy measurement, both software profiling and hardware monitoring have been exploited and compared. The study is supported by a modular, cross-platform benchmarking framework that automates data collection and ensures reproducibility. The findings highlight the trade-offs between reasoning capabilities and resource consumption, offering valuable insights for refining testing methodologies as well as optimizing semantic reasoners in MEC settings.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112696"},"PeriodicalIF":4.1,"publicationDate":"2025-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A knowledge graph enabled recommendation system for implicitly associated items: Application to vertical e-commerce of parts 基于知识图谱的隐式关联商品推荐系统:在垂直电子商务中的应用
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-15 DOI: 10.1016/j.jss.2025.112698
Xinjun Lai , Guitao Huang , Yirun Chen , Dejun Wang , Martin Lai , Ming Cai
Although recommendation algorithms have been proved efficient for the e-commerce platforms in the last twenty years, it is not a trivial task for the vertical platforms to adopt these methods. Compared to the general platforms, mostly, the vertical ones share the following characteristics: (1) data volume is small but data types are various, due to the online ecosystem and community; (2) items might be implicitly associated, such as parts; (3) operated by SME (small and medium-size enterprise), where the IT and AI resources are limited. Targeted at these features, we propose a knowledge-graph(KG)-based recommender for our studied Lego parts company. First, a KG is developed to mine the various implicit associations of users and items, where information on users’/designers’ online works, posts, interactions, buying behaviours etc., and part-set relations, are modelled in the heterogeneous KG. Second, a modified RippleNet algorithm is proposed, where users’ interests are modelled as ripples in the KG. In addition, information on important neighbouring nodes is embedded, to model the semi-social influence in the KG for a user. Third, the best timing to update the algorithm is studied by monitoring and predicting the topology of the KG, to achieve the best cost-performance in algorithm operation and maintenance. The recommender system is implemented in the studied company, where the offline and online evaluations suggest that our method is practical, efficient and SME-friendly.
虽然在过去的二十年中,推荐算法在电子商务平台上已经被证明是有效的,但对于垂直平台来说,采用这些方法并不是一件容易的事情。与一般平台相比,垂直平台大多具有以下特点:(1)由于在线生态系统和社区的存在,数据量小但数据类型多;(2)项目可能隐含关联,如零件;(3)由中小型企业(SME)运营,IT和AI资源有限。针对这些特征,我们针对所研究的乐高零部件公司提出了一个基于知识图(KG)的推荐。首先,开发了一个KG来挖掘用户和物品之间的各种隐性关联,其中用户/设计师的在线作品、帖子、交互、购买行为等信息以及部分集关系在异构KG中建模。其次,提出了一种改进的RippleNet算法,将用户的兴趣建模为KG中的波纹。此外,还嵌入了有关重要相邻节点的信息,以模拟用户在KG中的半社会影响。第三,通过监测和预测KG的拓扑结构,研究算法更新的最佳时机,实现算法运维的最佳性价比。该推荐系统在被研究的公司中实施,线下和在线评价表明,我们的方法是实用、高效和中小企业友好的。
{"title":"A knowledge graph enabled recommendation system for implicitly associated items: Application to vertical e-commerce of parts","authors":"Xinjun Lai ,&nbsp;Guitao Huang ,&nbsp;Yirun Chen ,&nbsp;Dejun Wang ,&nbsp;Martin Lai ,&nbsp;Ming Cai","doi":"10.1016/j.jss.2025.112698","DOIUrl":"10.1016/j.jss.2025.112698","url":null,"abstract":"<div><div>Although recommendation algorithms have been proved efficient for the e-commerce platforms in the last twenty years, it is not a trivial task for the vertical platforms to adopt these methods. Compared to the general platforms, mostly, the vertical ones share the following characteristics: (1) data volume is small but data types are various, due to the online ecosystem and community; (2) items might be implicitly associated, such as parts; (3) operated by SME (small and medium-size enterprise), where the IT and AI resources are limited. Targeted at these features, we propose a knowledge-graph(KG)-based recommender for our studied Lego parts company. First, a KG is developed to mine the various implicit associations of users and items, where information on users’/designers’ online works, posts, interactions, buying behaviours etc., and part-set relations, are modelled in the heterogeneous KG. Second, a modified RippleNet algorithm is proposed, where users’ interests are modelled as ripples in the KG. In addition, information on important neighbouring nodes is embedded, to model the semi-social influence in the KG for a user. Third, the best timing to update the algorithm is studied by monitoring and predicting the topology of the KG, to achieve the best cost-performance in algorithm operation and maintenance. The recommender system is implemented in the studied company, where the offline and online evaluations suggest that our method is practical, efficient and SME-friendly.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112698"},"PeriodicalIF":4.1,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
User stories as boundary objects in agile requirements engineering: A theoretical literature review 作为敏捷需求工程边界对象的用户故事:理论文献综述
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-13 DOI: 10.1016/j.jss.2025.112693
Tor Sporsem , Torgeir Dingsøyr , Klaas-Jan Stol
User stories have become the predominant method for managing requirements in software development, used by approximately half of all software developers. Despite this widespread adoption, there is limited theoretical understanding of how user stories are used in practice. Through a theoretical literature review of 14 industry studies, we develop five theoretical propositions: 1) user stories facilitate shared understanding between developers and users; 2) small user stories help developers cope with change; 3) clarifying the ‘why’ in user stories reinforces focus on user needs but adds complexity to the development process; 4) conversations triggered by user stories can hamper the sense of productivity; and 5) user stories as recorded in writing degrade over time. Using boundary object theory as an analytical lens, we explain how user stories facilitate knowledge transfer across syntactic, semantic, and pragmatic boundaries between developers and users. This theoretical lens offers new insights into why some user stories succeed while others fail to bridge boundaries between users and developers. The review highlights the sharp contrast between the widespread use of user stories among practitioners and the limited academic research on their practical application. We end with identifying opportunities for future research, particularly on how user stories can be used in the era of generative AI.
用户描述已经成为软件开发中管理需求的主要方法,被大约一半的软件开发人员使用。尽管用户故事被广泛采用,但是对于如何在实践中使用用户故事的理论理解是有限的。通过对14项行业研究的理论文献回顾,我们提出了五个理论命题:1)用户故事促进了开发者和用户之间的共享理解;2)小用户故事帮助开发者应对变化;3)在用户故事中明确“为什么”,强化了对用户需求的关注,但增加了开发过程的复杂性;4)由用户故事引发的对话会阻碍工作效率;5)以书面形式记录的用户故事会随着时间的推移而退化。使用边界对象理论作为分析视角,我们解释了用户故事如何促进开发人员和用户之间跨越语法、语义和实用边界的知识转移。这一理论视角为我们提供了新的见解,解释为什么有些用户故事成功了,而有些却无法在用户和开发人员之间架起桥梁。这篇综述强调了在实践者中广泛使用用户故事和对其实际应用的有限学术研究之间的鲜明对比。最后,我们确定了未来研究的机会,特别是如何在生成式人工智能时代使用用户故事。
{"title":"User stories as boundary objects in agile requirements engineering: A theoretical literature review","authors":"Tor Sporsem ,&nbsp;Torgeir Dingsøyr ,&nbsp;Klaas-Jan Stol","doi":"10.1016/j.jss.2025.112693","DOIUrl":"10.1016/j.jss.2025.112693","url":null,"abstract":"<div><div>User stories have become the predominant method for managing requirements in software development, used by approximately half of all software developers. Despite this widespread adoption, there is limited theoretical understanding of how user stories are used in practice. Through a theoretical literature review of 14 industry studies, we develop five theoretical propositions: 1) user stories facilitate shared understanding between developers and users; 2) small user stories help developers cope with change; 3) clarifying the ‘why’ in user stories reinforces focus on user needs but adds complexity to the development process; 4) conversations triggered by user stories can hamper the sense of productivity; and 5) user stories as recorded in writing degrade over time. Using boundary object theory as an analytical lens, we explain how user stories facilitate knowledge transfer across syntactic, semantic, and pragmatic boundaries between developers and users. This theoretical lens offers new insights into why some user stories succeed while others fail to bridge boundaries between users and developers. The review highlights the sharp contrast between the widespread use of user stories among practitioners and the limited academic research on their practical application. We end with identifying opportunities for future research, particularly on how user stories can be used in the era of generative AI.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112693"},"PeriodicalIF":4.1,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MENTOR: Fixing introductory programming assignments with formula-based fault localization and LLM-driven program repair MENTOR:使用基于公式的故障定位和llm驱动的程序修复来修复介绍性编程任务
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-10 DOI: 10.1016/j.jss.2025.112690
Pedro Orvalho , Mikoláš Janota , Vasco Manquinho
The increasing demand for programming education has led to online evaluations like MOOCs, which rely on introductory programming assignments (IPAs). A major challenge in these courses is providing personalized feedback at scale. This paper introduces MENTOR, a semantic automated program repair (APR) framework designed to fix faulty student programs. MENTOR validates repairs through execution on a test suite, and returns the repaired program or highlights faulty statements.
Unlike symbolic repair tools like Clara and Verifix, which require correct implementations with identical control flow graphs (CFGs), MENTOR’s LLM-based approach enables flexible repairs without strict structural alignment. MENTOR clusters successful submissions regardless of CFGs, and employs a Graph Neural Network (GNN)-based variable alignment module for enhanced accuracy. Next, MENTOR’s fault localization module leverages MaxSAT techniques to pinpoint buggy code segments precisely. Finally, MENTOR’s program fixer integrates Formal Methods (FM) and Large Language Models (LLMs) through a Counterexample Guided Inductive Synthesis (CEGIS) loop, iteratively refining repairs. Experimental results show that MENTOR significantly improves repair success rates, achieving 64.4 %, far surpassing Verifix (6.3 %) and Clara (34.6 %). By merging formula-based fault localization, and LLM-driven repair, MENTOR provides an innovative, scalable framework for programming education.
对编程教育日益增长的需求催生了像mooc这样依赖于编程入门作业(IPAs)的在线评估。这些课程的一个主要挑战是提供大规模的个性化反馈。本文介绍了MENTOR,一个语义自动程序修复(APR)框架,用于修复错误的学生程序。MENTOR通过在测试套件上执行来验证修复,并返回修复的程序或突出显示有错误的语句。与像Clara和Verifix这样的符号修复工具不同,它们需要使用相同的控制流程图(cfg)来正确实现,MENTOR基于llm的方法可以实现灵活的修复,而无需严格的结构对齐。无论CFGs如何,MENTOR都会对成功的提交进行聚类,并采用基于图神经网络(GNN)的变量对齐模块来提高准确性。接下来,MENTOR的故障定位模块利用MaxSAT技术精确定位有缺陷的代码段。最后,MENTOR的程序修复器通过反例引导归纳综合(CEGIS)循环集成了形式方法(FM)和大型语言模型(LLMs),迭代地改进修复。实验结果表明,MENTOR显著提高了修复成功率,达到64.4%,远远超过了Verifix(6.3%)和Clara(34.6%)。通过结合基于公式的故障定位和llm驱动的修复,MENTOR为编程教育提供了一个创新的、可扩展的框架。
{"title":"MENTOR: Fixing introductory programming assignments with formula-based fault localization and LLM-driven program repair","authors":"Pedro Orvalho ,&nbsp;Mikoláš Janota ,&nbsp;Vasco Manquinho","doi":"10.1016/j.jss.2025.112690","DOIUrl":"10.1016/j.jss.2025.112690","url":null,"abstract":"<div><div>The increasing demand for programming education has led to online evaluations like MOOCs, which rely on introductory programming assignments (IPAs). A major challenge in these courses is providing personalized feedback at scale. This paper introduces <span>MENTOR</span>, a semantic automated program repair (APR) framework designed to fix faulty student programs. <span>MENTOR</span> validates repairs through execution on a test suite, and returns the repaired program or highlights faulty statements.</div><div>Unlike symbolic repair tools like <span>Clara</span> and <span>Verifix</span>, which require correct implementations with identical control flow graphs (CFGs), <span>MENTOR</span>’s <span>LLM</span>-based approach enables flexible repairs without strict structural alignment. <span>MENTOR</span> clusters successful submissions regardless of CFGs, and employs a Graph Neural Network (<span>GNN</span>)-based variable alignment module for enhanced accuracy. Next, <span>MENTOR</span>’s fault localization module leverages MaxSAT techniques to pinpoint buggy code segments precisely. Finally, <span>MENTOR</span>’s program fixer integrates Formal Methods (FM) and Large Language Models (<span>LLMs</span>) through a Counterexample Guided Inductive Synthesis (CEGIS) loop, iteratively refining repairs. Experimental results show that <span>MENTOR</span> significantly improves repair success rates, achieving 64.4 %, far surpassing Verifix (6.3 %) and Clara (34.6 %). By merging formula-based fault localization, and <span>LLM</span>-driven repair, <span>MENTOR</span> provides an innovative, scalable framework for programming education.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112690"},"PeriodicalIF":4.1,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SARG: Software application resiliency prediction using graph neural networks SARG:使用图神经网络的软件应用程序弹性预测
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-10 DOI: 10.1016/j.jss.2025.112697
Mohammad Reza Pirayesh, Mohsen Raji, Mohammad Reza Moosavi
Accurate analysis of software applications resiliency to soft errors is crucial for ensuring reliable operation in computing systems, particularly in safety-critical applications where failures can result in significant consequences. Machine Learning (ML) methods have been investigated as effective tools for evaluating software application resiliency. The core idea is to leverage patterns in historical data, system behavior, and dynamic runtime features to build predictive models. However, current approaches often suffer from capturing the intricate structure of source code, leading to models that lack the acceptable accuracy needed for resiliency prediction in software applications. This paper introduces a novel ML-based approach, named SARG, that leverages graph neural networks to predict software resiliency in the presence of soft errors. By incorporating both control flow and data flow graphs, dynamic runtime information, and code sequence signatures, SARG captures the syntactic and semantic features of software applications that are vital for accurate resiliency prediction. Experimental results demonstrate that the proposed approach outperforms existing ML-based approaches, achieving 43 % higher prediction accuracy while being 3x faster. The results show the potential of SARG as an efficient, generalizable, and accurate solution for evaluating software resiliency in the presence of soft errors.
准确分析软件应用程序对软错误的弹性对于确保计算系统的可靠运行至关重要,特别是在故障可能导致严重后果的安全关键型应用程序中。机器学习(ML)方法已被研究作为评估软件应用程序弹性的有效工具。其核心思想是利用历史数据、系统行为和动态运行时特性中的模式来构建预测模型。然而,当前的方法往往难以捕获源代码的复杂结构,从而导致模型缺乏软件应用程序中弹性预测所需的可接受的准确性。本文介绍了一种新的基于机器学习的方法,称为SARG,它利用图神经网络来预测存在软错误时的软件弹性。通过结合控制流和数据流图、动态运行时信息和代码序列签名,SARG捕获了软件应用程序的语法和语义特征,这些特征对于准确的弹性预测至关重要。实验结果表明,该方法优于现有的基于机器学习的方法,预测精度提高43%,速度提高3倍。结果表明,SARG作为评估存在软错误时软件弹性的有效、可推广和准确的解决方案的潜力。
{"title":"SARG: Software application resiliency prediction using graph neural networks","authors":"Mohammad Reza Pirayesh,&nbsp;Mohsen Raji,&nbsp;Mohammad Reza Moosavi","doi":"10.1016/j.jss.2025.112697","DOIUrl":"10.1016/j.jss.2025.112697","url":null,"abstract":"<div><div>Accurate analysis of software applications resiliency to soft errors is crucial for ensuring reliable operation in computing systems, particularly in safety-critical applications where failures can result in significant consequences. Machine Learning (ML) methods have been investigated as effective tools for evaluating software application resiliency. The core idea is to leverage patterns in historical data, system behavior, and dynamic runtime features to build predictive models. However, current approaches often suffer from capturing the intricate structure of source code, leading to models that lack the acceptable accuracy needed for resiliency prediction in software applications. This paper introduces a novel ML-based approach, named SARG, that leverages graph neural networks to predict software resiliency in the presence of soft errors. By incorporating both control flow and data flow graphs, dynamic runtime information, and code sequence signatures, SARG captures the syntactic and semantic features of software applications that are vital for accurate resiliency prediction. Experimental results demonstrate that the proposed approach outperforms existing ML-based approaches, achieving 43 % higher prediction accuracy while being 3x faster. The results show the potential of SARG as an efficient, generalizable, and accurate solution for evaluating software resiliency in the presence of soft errors.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"232 ","pages":"Article 112697"},"PeriodicalIF":4.1,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Systems and Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1