Pub Date : 2025-11-25DOI: 10.1016/j.jss.2025.112713
Chunyong Zhang , Liangwei Yao
Software vulnerability detection is an essential part of cybersecurity. The basis of software vulnerability detection is source code detection. Under the assumption that there is sufficient labeled data for training, machine learning and deep learning can automatically and effectively find vulnerabilities. However, due to the lack of high-quality labeled data, the vulnerability detection performance of some deep learning-based methods is poor. Therefore, we propose a way to learn feature representations from multi-domain heterogeneous vulnerability data. It can form the basis of transferring knowledge and improve the vulnerability detection performance. First, we use real-world data Ffmqem and synthetic data SARD as data sources to learn feature extractors to represent the objective function. Next, we obtain the combined vector representation of the target code function for training the vulnerability classifier. Finally, we conducted extensive experiments on two open-source projects. The experimental results show that OLL has better vulnerability detection performance in scenarios with a small amount of labeled data. Compared with SDV, it improves 4.1 % and 3.1 % on the F1-Score metrics. It is demonstrated that learning combined vector representations from two heterogeneous data compensates for the small amount of labeled data. They facilitate the learning of representations of loophole functions.
{"title":"Only less labeled: How to learn representations from multi-domain","authors":"Chunyong Zhang , Liangwei Yao","doi":"10.1016/j.jss.2025.112713","DOIUrl":"10.1016/j.jss.2025.112713","url":null,"abstract":"<div><div>Software vulnerability detection is an essential part of cybersecurity. The basis of software vulnerability detection is source code detection. Under the assumption that there is sufficient labeled data for training, machine learning and deep learning can automatically and effectively find vulnerabilities. However, due to the lack of high-quality labeled data, the vulnerability detection performance of some deep learning-based methods is poor. Therefore, we propose a way to learn feature representations from multi-domain heterogeneous vulnerability data. It can form the basis of transferring knowledge and improve the vulnerability detection performance. First, we use real-world data Ffmqem and synthetic data SARD as data sources to learn feature extractors to represent the objective function. Next, we obtain the combined vector representation of the target code function for training the vulnerability classifier. Finally, we conducted extensive experiments on two open-source projects. The experimental results show that OLL has better vulnerability detection performance in scenarios with a small amount of labeled data. Compared with SDV, it improves 4.1 % and 3.1 % on the F1-Score metrics. It is demonstrated that learning combined vector representations from two heterogeneous data compensates for the small amount of labeled data. They facilitate the learning of representations of loophole functions.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112713"},"PeriodicalIF":4.1,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-22DOI: 10.1016/j.jss.2025.112708
Gregorio Robles , Jonas Gamalielsson , Björn Lundell , Christoffer Brax , Tomas Persson , Anders Mattsson , Tomas Gustavsson , Jonas Feist , Jonas Öberg
Context: IoT standards are vital for interoperability and longevity, with Open Source Software (OSS) implementations preventing vendor lock-in. These implementations form vast software ecosystems on platforms like GitHub, where industrial participation is crucial. Goal: This study characterizes industrial involvement (participation, leadership, collaboration) across the software ecosystems of four IoT standards (LwM2M, NB-IoT, CoAP, Zigbee) from different standards-setting organizations. It also investigates how software licensing, particularly OSS licenses, reflects and shapes this involvement. Method: We analyzed software projects related to these standards that are publicly available on the GitHub platform, examining authorship of commits, bug reports, pull requests, and metadata like licenses. We identified organizational affiliations (corporate or academic) of contributors to assess their presence and leadership. We performed a licensing analysis to understand the legal frameworks governing these projects. Results: Our research shows significant diversity in ecosystem scale and activity, with a consistent pattern of major corporate and organizational leadership in highly active projects. Despite robust institutional involvement, a pervasive issue is the widespread absence of explicit software licenses, even in collaborative and active repositories. When licenses are present, permissive OSS licenses (e.g., Apache-2.0, MIT) dominate. This indicates a complex and often ambiguous legal landscape. Conclusion: IoT standard ecosystem growth is driven by established organizations. Addressing the prevalent lack of licensing is crucial for fostering clearer collaboration, mitigating legal risks, and ensuring long-term sustainability and adoption of these foundational technologies.
{"title":"A comparative analysis of industrial involvement and licensing in the open source software ecosystems of four IoT standards","authors":"Gregorio Robles , Jonas Gamalielsson , Björn Lundell , Christoffer Brax , Tomas Persson , Anders Mattsson , Tomas Gustavsson , Jonas Feist , Jonas Öberg","doi":"10.1016/j.jss.2025.112708","DOIUrl":"10.1016/j.jss.2025.112708","url":null,"abstract":"<div><div>Context: IoT standards are vital for interoperability and longevity, with Open Source Software (OSS) implementations preventing vendor lock-in. These implementations form vast software ecosystems on platforms like GitHub, where industrial participation is crucial. Goal: This study characterizes industrial involvement (participation, leadership, collaboration) across the software ecosystems of four IoT standards (LwM2M, NB-IoT, CoAP, Zigbee) from different standards-setting organizations. It also investigates how software licensing, particularly OSS licenses, reflects and shapes this involvement. Method: We analyzed software projects related to these standards that are publicly available on the GitHub platform, examining authorship of commits, bug reports, pull requests, and metadata like licenses. We identified organizational affiliations (corporate or academic) of contributors to assess their presence and leadership. We performed a licensing analysis to understand the legal frameworks governing these projects. Results: Our research shows significant diversity in ecosystem scale and activity, with a consistent pattern of major corporate and organizational leadership in highly active projects. Despite robust institutional involvement, a pervasive issue is the widespread absence of explicit software licenses, even in collaborative and active repositories. When licenses are present, permissive OSS licenses (e.g., Apache-2.0, MIT) dominate. This indicates a complex and often ambiguous legal landscape. Conclusion: IoT standard ecosystem growth is driven by established organizations. Addressing the prevalent lack of licensing is crucial for fostering clearer collaboration, mitigating legal risks, and ensuring long-term sustainability and adoption of these foundational technologies.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112708"},"PeriodicalIF":4.1,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-22DOI: 10.1016/j.jss.2025.112715
Rafael José Moura , Maria Gizele Nascimento , Fumio Machida , Domenico Cotroneo , Ermeson Andrade
Software aging is characterized by the gradual degradation of system reliability and performance due to software issues such as memory leaks, resource exhaustion, or the accumulation of numerical errors over time. This phenomenon can lead to critical failures in production environments, making efficient aging detection essential to ensure system reliability. Several existing studies have explored methods for identifying software aging, with a particular focus on the use of Machine Learning (ML) algorithms. As a variety of ML algorithms has been used recently, it is necessary to understand the state of the art and the main trends in this domain. This study aims to classify software aging detection approaches and techniques that use ML through a Systematic Mapping Study (SMS). As key outcomes, we identify the most commonly used algorithms, the most popular aging indicators, the open datasets available for software aging detection research, the challenges faced by the field, and new directions for future investigations. We expect this work to contribute meaningfully to the software aging field by providing new research perspectives, practical insights, and guidance applicable to real-world scenarios, supporting both researchers and software practitioners.
{"title":"Machine learning for software aging detection: A systematic mapping study","authors":"Rafael José Moura , Maria Gizele Nascimento , Fumio Machida , Domenico Cotroneo , Ermeson Andrade","doi":"10.1016/j.jss.2025.112715","DOIUrl":"10.1016/j.jss.2025.112715","url":null,"abstract":"<div><div>Software aging is characterized by the gradual degradation of system reliability and performance due to software issues such as memory leaks, resource exhaustion, or the accumulation of numerical errors over time. This phenomenon can lead to critical failures in production environments, making efficient aging detection essential to ensure system reliability. Several existing studies have explored methods for identifying software aging, with a particular focus on the use of Machine Learning (ML) algorithms. As a variety of ML algorithms has been used recently, it is necessary to understand the state of the art and the main trends in this domain. This study aims to classify software aging detection approaches and techniques that use ML through a Systematic Mapping Study (SMS). As key outcomes, we identify the most commonly used algorithms, the most popular aging indicators, the open datasets available for software aging detection research, the challenges faced by the field, and new directions for future investigations. We expect this work to contribute meaningfully to the software aging field by providing new research perspectives, practical insights, and guidance applicable to real-world scenarios, supporting both researchers and software practitioners.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112715"},"PeriodicalIF":4.1,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-22DOI: 10.1016/j.jss.2025.112716
Yannick Lindebauer , Richard von Esebeck , Thomas Vietor
In the automotive industry, vehicles are increasingly configured by software to accommodate individual customer preferences. This leads to a growing number of software variants and calibration parameters that must be managed consistently. Efficient handling of this variability benefits from the application of (systems and) software product line engineering (SPLE), as it offers structured mechanisms for reuse, traceability, and systematic variability management. These benefits are particularly relevant for electronic control unit (ECU) configuration, where maintaining consistency across features, parameters, and vehicle variants is crucial. However, existing approaches in industry, e.g. those relying on configurable bills of materials, indicate that SPLE has yet to see widespread adoption in the configuration of ECUs. Instead, proprietary approaches dominate, often lacking integration, transparency, and scalability. This study investigates the reasons why SPLE is not widely adopted for this use case. The underlying hypothesis suggests that industrial requirements are not accurately captured in academic research, making the implementation of scientific SPLE concepts difficult. To examine this assumption, a systematic literature review was conducted, analyzing relevant publications. The findings indicate a pressing need for closer collaboration between industry and academia to better identify challenges and requirements. Furthermore, current and emerging developments, such as software-defined vehicles (SDVs), require greater consideration in ECU configuration research. Our hypothesis was largely confirmed, indicating that SPLE research must be further extended and refined to meet practical ECU configuration needs. Accordingly, a concise, end-to-end methodology is needed to support SPLE-based calibration processes in SDV environments with increasingly decoupled hardware and software.
{"title":"Automotive software product lines for ECU software configuration: A systematic literature review","authors":"Yannick Lindebauer , Richard von Esebeck , Thomas Vietor","doi":"10.1016/j.jss.2025.112716","DOIUrl":"10.1016/j.jss.2025.112716","url":null,"abstract":"<div><div>In the automotive industry, vehicles are increasingly configured by software to accommodate individual customer preferences. This leads to a growing number of software variants and calibration parameters that must be managed consistently. Efficient handling of this variability benefits from the application of (systems and) software product line engineering (SPLE), as it offers structured mechanisms for reuse, traceability, and systematic variability management. These benefits are particularly relevant for electronic control unit (ECU) configuration, where maintaining consistency across features, parameters, and vehicle variants is crucial. However, existing approaches in industry, e.g. those relying on configurable bills of materials, indicate that SPLE has yet to see widespread adoption in the configuration of ECUs. Instead, proprietary approaches dominate, often lacking integration, transparency, and scalability. This study investigates the reasons why SPLE is not widely adopted for this use case. The underlying hypothesis suggests that industrial requirements are not accurately captured in academic research, making the implementation of scientific SPLE concepts difficult. To examine this assumption, a systematic literature review was conducted, analyzing relevant publications. The findings indicate a pressing need for closer collaboration between industry and academia to better identify challenges and requirements. Furthermore, current and emerging developments, such as software-defined vehicles (SDVs), require greater consideration in ECU configuration research. Our hypothesis was largely confirmed, indicating that SPLE research must be further extended and refined to meet practical ECU configuration needs. Accordingly, a concise, end-to-end methodology is needed to support SPLE-based calibration processes in SDV environments with increasingly decoupled hardware and software.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112716"},"PeriodicalIF":4.1,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145645683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1016/j.jss.2025.112700
Ruishi Huang , Binbin Yang , Shumei Wu , Zheng Li , Doyle Paul , Xiao-Yi Zhang , Xiang Chen , Yong Liu
Fault Localization (FL) aims to reduce the cost of manual debugging by highlighting the statements which are more likely responsible for observed failures. However, existing techniques have limited effectiveness in practice due to inflexible suspiciousness evaluations and oversimplified representation of execution information. In this paper, we propose GraMuS, a novel Graph representation learning and Multimodal information based technique for Statement-level FL. GraMuS comprises two key components: a fine-grained fault diagnosis graph and a multi-level collaborative suspiciousness evaluation. The former integrally records enriched multimodal information from various levels of granularity (including methods, statements, and mutants) by a graph structure. The latter utilizes the interactions between FL tasks at various levels of granularity to extract existing/latent useful features from multimodal information for improving FL precision. Empirical studies on the widely used Defects4J(V2.0.0) dataset show that GraMuS can outperform state-of-the-art baselines in both single-fault programs and multiple-fault programs, including one large language models, four learning-based FL techniques, three variable-based FL techniques, 36 spectrum-based FL techniques, and 36 mutation-based FL techniques. In particular, GraMuS can localize 26/29/31 more faulty statements than the state-of-the-art baseline ChatGPT-4/DepGraph/VarDT, in terms of metric. Further investigation shows that the method-level FL task can help GraMuS localize 27 more faulty statements, resulting in a 50.94 % improvement. Finally, we further evaluate GraMuS in 374 Python programs from ConDefects, and find that GraMuS consistently outperforms state-of-the-art FL techniques, showing its generality.
{"title":"GraMuS: Boosting statement-level fault localization via graph representation and multimodal information","authors":"Ruishi Huang , Binbin Yang , Shumei Wu , Zheng Li , Doyle Paul , Xiao-Yi Zhang , Xiang Chen , Yong Liu","doi":"10.1016/j.jss.2025.112700","DOIUrl":"10.1016/j.jss.2025.112700","url":null,"abstract":"<div><div>Fault Localization (FL) aims to reduce the cost of manual debugging by highlighting the statements which are more likely responsible for observed failures. However, existing techniques have limited effectiveness in practice due to inflexible suspiciousness evaluations and oversimplified representation of execution information. In this paper, we propose GraMuS, a novel <em><strong>Gra</strong></em>ph representation learning and <em><strong>Mu</strong></em>ltimodal information based technique for <em><strong>S</strong></em>tatement-level FL. GraMuS comprises two key components: a fine-grained fault diagnosis graph and a multi-level collaborative suspiciousness evaluation. The former integrally records enriched multimodal information from various levels of granularity (including methods, statements, and mutants) by a graph structure. The latter utilizes the interactions between FL tasks at various levels of granularity to extract existing/latent useful features from multimodal information for improving FL precision. Empirical studies on the widely used Defects4J(V2.0.0) dataset show that GraMuS can outperform state-of-the-art baselines in both single-fault programs and multiple-fault programs, including one large language models, four learning-based FL techniques, three variable-based FL techniques, 36 spectrum-based FL techniques, and 36 mutation-based FL techniques. In particular, GraMuS can localize 26/29/31 more faulty statements than the state-of-the-art baseline ChatGPT-4/DepGraph/VarDT, in terms of <span><math><mrow><mi>T</mi><mi>O</mi><mi>P</mi><mspace></mspace><mo>−</mo><mspace></mspace><mn>1</mn></mrow></math></span> metric. Further investigation shows that the method-level FL task can help GraMuS localize 27 more faulty statements, resulting in a 50.94 % improvement. Finally, we further evaluate GraMuS in 374 Python programs from ConDefects, and find that GraMuS consistently outperforms state-of-the-art FL techniques, showing its generality.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112700"},"PeriodicalIF":4.1,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.jss.2025.112685
Juan de Lara , Alejandro del Pozzo , Esther Guerra , Jesús Sánchez Cuadrado
The advances in generative artificial intelligence, especially Large Language Models (LLMs), have prompted the proliferation of conversational agents (or chatbots). These can be general-purpose – like ChatGPT – or tailored to specific tasks – like buying tickets or obtaining customer support. Although chatbots play a significant role in today’s software ecosystem, they are hard to test: defining meaningful, thorough tests is time-consuming, and setting an oracle flexible to conversational variations is challenging. This is aggravated when testing LLM-based chatbots, as their conversation is natural but unpredictable.
To alleviate this problem, we present an end-to-end testing approach for conversational agents, comprising two components. First, a highly customisable user simulator that generates meaningful conversations with a chatbot under test, for the given goals (e.g., setting an appointment) and communication styles (e.g., long/short phrases, spelling mistakes). Second, a domain-specific language to specify and check correctness conditions (assertions and metamorphic relations) on the generated conversations. The conditions can assess functional correctness (e.g., booking more tickets costs more) and interaction styles (e.g., the chatbot responds in English and does not deviate from certain topics). This paper describes the approach, an implementation enabling chatbots’ testing independently of their technology, and an evaluation of its effectiveness in finding defects. We tested our tool on chatbots with artificially injected errors, and on third-party, real-world chatbots. Our tool detected between 81.25 % and 100 % of the injected errors, and identified actual functional issues in the real-world chatbots by applying manually defined correctness rules.
{"title":"Automated end-to-end testing for conversational agents","authors":"Juan de Lara , Alejandro del Pozzo , Esther Guerra , Jesús Sánchez Cuadrado","doi":"10.1016/j.jss.2025.112685","DOIUrl":"10.1016/j.jss.2025.112685","url":null,"abstract":"<div><div>The advances in generative artificial intelligence, especially Large Language Models (LLMs), have prompted the proliferation of conversational agents (or chatbots). These can be general-purpose – like ChatGPT – or tailored to specific tasks – like buying tickets or obtaining customer support. Although chatbots play a significant role in today’s software ecosystem, they are hard to test: defining meaningful, thorough tests is time-consuming, and setting an oracle flexible to conversational variations is challenging. This is aggravated when testing LLM-based chatbots, as their conversation is natural but unpredictable.</div><div>To alleviate this problem, we present an end-to-end testing approach for conversational agents, comprising two components. First, a highly customisable user simulator that generates meaningful conversations with a chatbot under test, for the given goals (e.g., setting an appointment) and communication styles (e.g., long/short phrases, spelling mistakes). Second, a domain-specific language to specify and check correctness conditions (assertions and metamorphic relations) on the generated conversations. The conditions can assess functional correctness (e.g., booking more tickets costs more) and interaction styles (e.g., the chatbot responds in English and does not deviate from certain topics). This paper describes the approach, an implementation enabling chatbots’ testing independently of their technology, and an evaluation of its effectiveness in finding defects. We tested our tool on chatbots with artificially injected errors, and on third-party, real-world chatbots. Our tool detected between 81.25 % and 100 % of the injected errors, and identified actual functional issues in the real-world chatbots by applying manually defined correctness rules.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112685"},"PeriodicalIF":4.1,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of Semantic Web technologies into Mobile Edge Computing (MEC) platforms is enhancing the capabilities of real-time, context-aware applications across diverse domains. MEC brings processing closer to the network edge, reducing latency and allowing for the improvement of data privacy, while Semantic Web technologies provide machine-interpretable knowledge representation and reasoning capabilities. Despite their potential, deploying semantic reasoners on edge devices is challenging due to their resource-intensive nature, which requires significant memory availability, computational power, and energy. Furthermore, correctness, performance and energy consumption are simultaneously important, as MEC semantics-based applications often call for real-time queries for autonomous agent decision or user-oriented decision support. This paper presents an extensive experimental evaluation of Web Ontology Language (OWL) reasoners deployed in MEC environments, assessing correctness, processing time, memory usage, and energy consumption across both a reference tablet and a single-board computer. For energy measurement, both software profiling and hardware monitoring have been exploited and compared. The study is supported by a modular, cross-platform benchmarking framework that automates data collection and ensures reproducibility. The findings highlight the trade-offs between reasoning capabilities and resource consumption, offering valuable insights for refining testing methodologies as well as optimizing semantic reasoners in MEC settings.
{"title":"Evaluating correctness, performance and energy footprint of semantic reasoners in mobile edge computing","authors":"Ivano Bilenchi, Davide Loconte, Floriano Scioscia, Michele Ruta","doi":"10.1016/j.jss.2025.112696","DOIUrl":"10.1016/j.jss.2025.112696","url":null,"abstract":"<div><div>The integration of Semantic Web technologies into Mobile Edge Computing (MEC) platforms is enhancing the capabilities of real-time, context-aware applications across diverse domains. MEC brings processing closer to the network edge, reducing latency and allowing for the improvement of data privacy, while Semantic Web technologies provide machine-interpretable knowledge representation and reasoning capabilities. Despite their potential, deploying semantic reasoners on edge devices is challenging due to their resource-intensive nature, which requires significant memory availability, computational power, and energy. Furthermore, correctness, performance and energy consumption are simultaneously important, as MEC semantics-based applications often call for real-time queries for autonomous agent decision or user-oriented decision support. This paper presents an extensive experimental evaluation of Web Ontology Language (OWL) reasoners deployed in MEC environments, assessing correctness, processing time, memory usage, and energy consumption across both a reference tablet and a single-board computer. For energy measurement, both software profiling and hardware monitoring have been exploited and compared. The study is supported by a modular, cross-platform benchmarking framework that automates data collection and ensures reproducibility. The findings highlight the trade-offs between reasoning capabilities and resource consumption, offering valuable insights for refining testing methodologies as well as optimizing semantic reasoners in MEC settings.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112696"},"PeriodicalIF":4.1,"publicationDate":"2025-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-15DOI: 10.1016/j.jss.2025.112698
Xinjun Lai , Guitao Huang , Yirun Chen , Dejun Wang , Martin Lai , Ming Cai
Although recommendation algorithms have been proved efficient for the e-commerce platforms in the last twenty years, it is not a trivial task for the vertical platforms to adopt these methods. Compared to the general platforms, mostly, the vertical ones share the following characteristics: (1) data volume is small but data types are various, due to the online ecosystem and community; (2) items might be implicitly associated, such as parts; (3) operated by SME (small and medium-size enterprise), where the IT and AI resources are limited. Targeted at these features, we propose a knowledge-graph(KG)-based recommender for our studied Lego parts company. First, a KG is developed to mine the various implicit associations of users and items, where information on users’/designers’ online works, posts, interactions, buying behaviours etc., and part-set relations, are modelled in the heterogeneous KG. Second, a modified RippleNet algorithm is proposed, where users’ interests are modelled as ripples in the KG. In addition, information on important neighbouring nodes is embedded, to model the semi-social influence in the KG for a user. Third, the best timing to update the algorithm is studied by monitoring and predicting the topology of the KG, to achieve the best cost-performance in algorithm operation and maintenance. The recommender system is implemented in the studied company, where the offline and online evaluations suggest that our method is practical, efficient and SME-friendly.
{"title":"A knowledge graph enabled recommendation system for implicitly associated items: Application to vertical e-commerce of parts","authors":"Xinjun Lai , Guitao Huang , Yirun Chen , Dejun Wang , Martin Lai , Ming Cai","doi":"10.1016/j.jss.2025.112698","DOIUrl":"10.1016/j.jss.2025.112698","url":null,"abstract":"<div><div>Although recommendation algorithms have been proved efficient for the e-commerce platforms in the last twenty years, it is not a trivial task for the vertical platforms to adopt these methods. Compared to the general platforms, mostly, the vertical ones share the following characteristics: (1) data volume is small but data types are various, due to the online ecosystem and community; (2) items might be implicitly associated, such as parts; (3) operated by SME (small and medium-size enterprise), where the IT and AI resources are limited. Targeted at these features, we propose a knowledge-graph(KG)-based recommender for our studied Lego parts company. First, a KG is developed to mine the various implicit associations of users and items, where information on users’/designers’ online works, posts, interactions, buying behaviours etc., and part-set relations, are modelled in the heterogeneous KG. Second, a modified RippleNet algorithm is proposed, where users’ interests are modelled as ripples in the KG. In addition, information on important neighbouring nodes is embedded, to model the semi-social influence in the KG for a user. Third, the best timing to update the algorithm is studied by monitoring and predicting the topology of the KG, to achieve the best cost-performance in algorithm operation and maintenance. The recommender system is implemented in the studied company, where the offline and online evaluations suggest that our method is practical, efficient and SME-friendly.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112698"},"PeriodicalIF":4.1,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.jss.2025.112693
Tor Sporsem , Torgeir Dingsøyr , Klaas-Jan Stol
User stories have become the predominant method for managing requirements in software development, used by approximately half of all software developers. Despite this widespread adoption, there is limited theoretical understanding of how user stories are used in practice. Through a theoretical literature review of 14 industry studies, we develop five theoretical propositions: 1) user stories facilitate shared understanding between developers and users; 2) small user stories help developers cope with change; 3) clarifying the ‘why’ in user stories reinforces focus on user needs but adds complexity to the development process; 4) conversations triggered by user stories can hamper the sense of productivity; and 5) user stories as recorded in writing degrade over time. Using boundary object theory as an analytical lens, we explain how user stories facilitate knowledge transfer across syntactic, semantic, and pragmatic boundaries between developers and users. This theoretical lens offers new insights into why some user stories succeed while others fail to bridge boundaries between users and developers. The review highlights the sharp contrast between the widespread use of user stories among practitioners and the limited academic research on their practical application. We end with identifying opportunities for future research, particularly on how user stories can be used in the era of generative AI.
{"title":"User stories as boundary objects in agile requirements engineering: A theoretical literature review","authors":"Tor Sporsem , Torgeir Dingsøyr , Klaas-Jan Stol","doi":"10.1016/j.jss.2025.112693","DOIUrl":"10.1016/j.jss.2025.112693","url":null,"abstract":"<div><div>User stories have become the predominant method for managing requirements in software development, used by approximately half of all software developers. Despite this widespread adoption, there is limited theoretical understanding of how user stories are used in practice. Through a theoretical literature review of 14 industry studies, we develop five theoretical propositions: 1) user stories facilitate shared understanding between developers and users; 2) small user stories help developers cope with change; 3) clarifying the ‘why’ in user stories reinforces focus on user needs but adds complexity to the development process; 4) conversations triggered by user stories can hamper the sense of productivity; and 5) user stories as recorded in writing degrade over time. Using boundary object theory as an analytical lens, we explain how user stories facilitate knowledge transfer across syntactic, semantic, and pragmatic boundaries between developers and users. This theoretical lens offers new insights into why some user stories succeed while others fail to bridge boundaries between users and developers. The review highlights the sharp contrast between the widespread use of user stories among practitioners and the limited academic research on their practical application. We end with identifying opportunities for future research, particularly on how user stories can be used in the era of generative AI.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"233 ","pages":"Article 112693"},"PeriodicalIF":4.1,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1016/j.jss.2025.112690
Pedro Orvalho , Mikoláš Janota , Vasco Manquinho
The increasing demand for programming education has led to online evaluations like MOOCs, which rely on introductory programming assignments (IPAs). A major challenge in these courses is providing personalized feedback at scale. This paper introduces MENTOR, a semantic automated program repair (APR) framework designed to fix faulty student programs. MENTOR validates repairs through execution on a test suite, and returns the repaired program or highlights faulty statements.
Unlike symbolic repair tools like Clara and Verifix, which require correct implementations with identical control flow graphs (CFGs), MENTOR’s LLM-based approach enables flexible repairs without strict structural alignment. MENTOR clusters successful submissions regardless of CFGs, and employs a Graph Neural Network (GNN)-based variable alignment module for enhanced accuracy. Next, MENTOR’s fault localization module leverages MaxSAT techniques to pinpoint buggy code segments precisely. Finally, MENTOR’s program fixer integrates Formal Methods (FM) and Large Language Models (LLMs) through a Counterexample Guided Inductive Synthesis (CEGIS) loop, iteratively refining repairs. Experimental results show that MENTOR significantly improves repair success rates, achieving 64.4 %, far surpassing Verifix (6.3 %) and Clara (34.6 %). By merging formula-based fault localization, and LLM-driven repair, MENTOR provides an innovative, scalable framework for programming education.
{"title":"MENTOR: Fixing introductory programming assignments with formula-based fault localization and LLM-driven program repair","authors":"Pedro Orvalho , Mikoláš Janota , Vasco Manquinho","doi":"10.1016/j.jss.2025.112690","DOIUrl":"10.1016/j.jss.2025.112690","url":null,"abstract":"<div><div>The increasing demand for programming education has led to online evaluations like MOOCs, which rely on introductory programming assignments (IPAs). A major challenge in these courses is providing personalized feedback at scale. This paper introduces <span>MENTOR</span>, a semantic automated program repair (APR) framework designed to fix faulty student programs. <span>MENTOR</span> validates repairs through execution on a test suite, and returns the repaired program or highlights faulty statements.</div><div>Unlike symbolic repair tools like <span>Clara</span> and <span>Verifix</span>, which require correct implementations with identical control flow graphs (CFGs), <span>MENTOR</span>’s <span>LLM</span>-based approach enables flexible repairs without strict structural alignment. <span>MENTOR</span> clusters successful submissions regardless of CFGs, and employs a Graph Neural Network (<span>GNN</span>)-based variable alignment module for enhanced accuracy. Next, <span>MENTOR</span>’s fault localization module leverages MaxSAT techniques to pinpoint buggy code segments precisely. Finally, <span>MENTOR</span>’s program fixer integrates Formal Methods (FM) and Large Language Models (<span>LLMs</span>) through a Counterexample Guided Inductive Synthesis (CEGIS) loop, iteratively refining repairs. Experimental results show that <span>MENTOR</span> significantly improves repair success rates, achieving 64.4 %, far surpassing Verifix (6.3 %) and Clara (34.6 %). By merging formula-based fault localization, and <span>LLM</span>-driven repair, <span>MENTOR</span> provides an innovative, scalable framework for programming education.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"234 ","pages":"Article 112690"},"PeriodicalIF":4.1,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}