首页 > 最新文献

Journal of Systems and Software最新文献

英文 中文
Systematic literature review on software code smell detection approaches 软件代码气味检测方法的系统文献综述
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-14 DOI: 10.1016/j.jss.2026.112784
Praveen Singh Thakur , Satyendra Singh Chouhan , Santosh Singh Rathore , Jitendra Parmar
Software code smells, subtle indicators of potential design flaws, play a critical role in maintaining software quality and preventing future maintenance issues. Numerous researchers have proposed various tools and employed different machine learning and deep learning techniques to detect software code smells. This survey systematically reviews the work conducted on detecting software code smells through tool-based, ML-based, and DL-based approaches published from 2014 to 2024. The imbalanced nature of datasets is another vital issue in this domain, where instances of software code smells are often significantly underrepresented, which poses a substantial challenge for traditional detection techniques. Therefore, this review also includes efforts to detect software code smells using different imbalance learning techniques. After initial scrutiny and selection, a total of 86 studies are analyzed and reported in this review work, providing a comprehensive overview of the field. This work comprehensively analyzes the intersection between software code smell detection and imbalance learning techniques, highlighting challenges posed by imbalanced datasets. Furthermore, we identify the best-performing ML techniques (e.g., Random Forest, SVM), the most commonly detected code smells (e.g., God Class, Data Class, Long Method, and feature Envy), and popular experimental setup techniques (e.g., K-fold cross-validation) used in prior studies. Based on the analysis, several key challenges and research gaps are identified, offering directions for future research.
软件代码气味是潜在设计缺陷的微妙指示器,在维护软件质量和防止未来的维护问题方面发挥着关键作用。许多研究人员提出了各种工具,并采用了不同的机器学习和深度学习技术来检测软件代码气味。本调查系统地回顾了2014年至2024年发布的基于工具、基于ml和基于dl的方法检测软件代码气味的工作。数据集的不平衡性是该领域的另一个重要问题,其中软件代码气味的实例通常明显未被充分代表,这对传统检测技术构成了重大挑战。因此,本综述还包括使用不同的不平衡学习技术来检测软件代码气味的努力。经过初步的审查和选择,本综述工作共分析和报告了86项研究,提供了该领域的全面概述。这项工作全面分析了软件代码气味检测和不平衡学习技术之间的交集,突出了不平衡数据集带来的挑战。此外,我们确定了性能最好的机器学习技术(例如,随机森林,支持向量机),最常检测到的代码气味(例如,上帝类,数据类,长方法和特征嫉妒),以及在先前研究中使用的流行实验设置技术(例如,K-fold交叉验证)。在此基础上,指出了几个关键挑战和研究空白,为未来的研究提供了方向。
{"title":"Systematic literature review on software code smell detection approaches","authors":"Praveen Singh Thakur ,&nbsp;Satyendra Singh Chouhan ,&nbsp;Santosh Singh Rathore ,&nbsp;Jitendra Parmar","doi":"10.1016/j.jss.2026.112784","DOIUrl":"10.1016/j.jss.2026.112784","url":null,"abstract":"<div><div>Software code smells, subtle indicators of potential design flaws, play a critical role in maintaining software quality and preventing future maintenance issues. Numerous researchers have proposed various tools and employed different machine learning and deep learning techniques to detect software code smells. This survey systematically reviews the work conducted on detecting software code smells through tool-based, ML-based, and DL-based approaches published from 2014 to 2024. The imbalanced nature of datasets is another vital issue in this domain, where instances of software code smells are often significantly underrepresented, which poses a substantial challenge for traditional detection techniques. Therefore, this review also includes efforts to detect software code smells using different imbalance learning techniques. After initial scrutiny and selection, a total of 86 studies are analyzed and reported in this review work, providing a comprehensive overview of the field. This work comprehensively analyzes the intersection between software code smell detection and imbalance learning techniques, highlighting challenges posed by imbalanced datasets. Furthermore, we identify the best-performing ML techniques (e.g., Random Forest, SVM), the most commonly detected code smells (e.g., God Class, Data Class, Long Method, and feature Envy), and popular experimental setup techniques (e.g., K-fold cross-validation) used in prior studies. Based on the analysis, several key challenges and research gaps are identified, offering directions for future research.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112784"},"PeriodicalIF":4.1,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ParserHunter: Identify parsing functions in binary code ParserHunter:识别二进制代码中的解析函数
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-10 DOI: 10.1016/j.jss.2026.112783
Marco Scapin, Fabio Pinelli, Letterio Galletta
Parsing and validation functions are crucial because they process untrusted data, e.g., user inputs. Due to their complexity, these functions are highly susceptible to bugs, making them a primary target for security audits. However, identifying such functions within a binary is time-intensive and challenging, given the numerous functions typically present and the lack of source code or supporting documentation. This paper presents an AI-based methodology for identifying functions with parser-like behavior and complex processing logic within a binary. Our methodology analyzes each binary by identifying its functions, extracting their Control Flow Graphs (CFGs), and enriching them with features derived from an embedding model that captures both structural and semantic aspects of their behavior. These annotated CFGs are the input to a Graph Neural Network trained to identify parsing functions. We implement this methodology in the tool ParserHunter, which allows users to train the model on labeled data, query the model with unseen binaries, and accommodate a symbolic execution phase on the processed binary through a user interface. Our experiments on ten real-world projects from GitHub show that our tool effectively identifies parsers in binaries.
解析和验证函数至关重要,因为它们处理不可信的数据,例如用户输入。由于它们的复杂性,这些函数非常容易受到错误的影响,使它们成为安全审计的主要目标。然而,在二进制文件中识别这样的函数是一项耗时且具有挑战性的工作,因为通常存在大量函数,而且缺乏源代码或支持文档。本文提出了一种基于人工智能的方法,用于在二进制文件中识别具有类似解析器行为和复杂处理逻辑的函数。我们的方法通过识别其功能,提取其控制流图(cfg),并通过捕获其行为的结构和语义方面的嵌入模型派生的特征来丰富它们来分析每个二进制。这些带注释的cfg是经过训练以识别解析函数的图神经网络的输入。我们在ParserHunter工具中实现了这种方法,它允许用户在标记数据上训练模型,用未见过的二进制文件查询模型,并通过用户界面在处理后的二进制文件上容纳符号执行阶段。我们对来自GitHub的十个实际项目进行的实验表明,我们的工具可以有效地识别二进制文件中的解析器。
{"title":"ParserHunter: Identify parsing functions in binary code","authors":"Marco Scapin,&nbsp;Fabio Pinelli,&nbsp;Letterio Galletta","doi":"10.1016/j.jss.2026.112783","DOIUrl":"10.1016/j.jss.2026.112783","url":null,"abstract":"<div><div>Parsing and validation functions are crucial because they process untrusted data, e.g., user inputs. Due to their complexity, these functions are highly susceptible to bugs, making them a primary target for security audits. However, identifying such functions within a binary is time-intensive and challenging, given the numerous functions typically present and the lack of source code or supporting documentation. This paper presents an AI-based methodology for identifying functions with parser-like behavior and complex processing logic within a binary. Our methodology analyzes each binary by identifying its functions, extracting their Control Flow Graphs (CFGs), and enriching them with features derived from an embedding model that captures both structural and semantic aspects of their behavior. These annotated CFGs are the input to a Graph Neural Network trained to identify parsing functions. We implement this methodology in the tool ParserHunter, which allows users to train the model on labeled data, query the model with unseen binaries, and accommodate a symbolic execution phase on the processed binary through a user interface. Our experiments on ten real-world projects from GitHub show that our tool effectively identifies parsers in binaries.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112783"},"PeriodicalIF":4.1,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A white-box prompt injection attack on embodied AI agents driven by large language models the 一种针对大型语言模型驱动的嵌入式人工智能代理的白盒提示注入攻击
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-10 DOI: 10.1016/j.jss.2026.112782
Tongcheng Geng , Yubin Qu , W. Eric Wong
With the widespread deployment of embodied AI agents in safety-critical scenarios, LLM-based decision-making systems face unprecedented risks. Existing prompt injection attacks, designed for general conversational systems, lack semantic contextual adaptability for embodied agents and fail to address scenario-specific semantics and safety constraints. This paper proposes SAPIA (Scenario-Adaptive white-box Prompt Injection Attack), integrating an adaptive context prompt generation module with an enhanced GCG algorithm to dynamically produce scenario-targeted adversarial suffixes. We build a multi-scenario dataset of 40 dangerous instructions across four application domains–autonomous driving, robotic manipulation, drone control, and industrial control–establishing a standardized benchmark for embodied AI safety. Large-scale white-box experiments on three mainstream open-source LLMs show SAPIA substantially outperforms traditional GCG and improved I-GCG, with notably high effectiveness on extremely high-risk instructions. Transferability analysis reveals distinctive properties in embodied settings: cross-architecture transfer is extremely limited, while high cross-version transferability exists within model series, contrasting with cross-model transfer observed in conventional adversarial research. Ablation studies confirm both the adaptive context module and enhanced GCG are critical and synergistic for optimal attack performance. Robustness analyses indicate SAPIA strongly resists mainstream defenses, effectively evading input perturbation, structured self-examination, and safety prefix prompting. This work exposes serious security vulnerabilities in current embodied AI agents and underscores the urgency of scenario-based protection mechanisms for safety-critical deployments.
随着嵌入式AI智能体在安全关键场景中的广泛部署,基于法学硕士的决策系统面临前所未有的风险。现有的提示注入攻击是为一般会话系统设计的,缺乏对嵌入代理的语义上下文适应性,无法解决特定于场景的语义和安全约束。本文提出了场景自适应白盒提示注入攻击(SAPIA),将自适应上下文提示生成模块与增强的GCG算法集成在一起,动态生成针对场景的敌对后缀。我们在四个应用领域(自动驾驶、机器人操作、无人机控制和工业控制)建立了一个包含40个危险指令的多场景数据集,为嵌入式人工智能安全建立了一个标准化基准。在三个主流开源llm上进行的大规模白盒实验表明,SAPIA的性能明显优于传统的GCG和改进的I-GCG,在极高风险指令上具有显著的高效性。可转移性分析揭示了嵌入环境的独特属性:跨架构迁移极其有限,而模型系列中存在高跨版本可转移性,与传统对抗性研究中观察到的跨模型迁移形成鲜明对比。消融研究证实,自适应上下文模块和增强GCG对于优化攻击性能至关重要,并且是协同的。鲁棒性分析表明,SAPIA对主流防御具有很强的抵抗力,能够有效地逃避输入扰动、结构化自我检查和安全前缀提示。这项工作暴露了当前嵌入AI代理中的严重安全漏洞,并强调了基于场景的安全关键部署保护机制的紧迫性。
{"title":"A white-box prompt injection attack on embodied AI agents driven by large language models the","authors":"Tongcheng Geng ,&nbsp;Yubin Qu ,&nbsp;W. Eric Wong","doi":"10.1016/j.jss.2026.112782","DOIUrl":"10.1016/j.jss.2026.112782","url":null,"abstract":"<div><div>With the widespread deployment of embodied AI agents in safety-critical scenarios, LLM-based decision-making systems face unprecedented risks. Existing prompt injection attacks, designed for general conversational systems, lack semantic contextual adaptability for embodied agents and fail to address scenario-specific semantics and safety constraints. This paper proposes <strong>SAPIA</strong> (<strong>S</strong>cenario-<strong>A</strong>daptive white-box <strong>P</strong>rompt <strong>I</strong>njection <strong>A</strong>ttack), integrating an adaptive context prompt generation module with an enhanced GCG algorithm to dynamically produce scenario-targeted adversarial suffixes. We build a multi-scenario dataset of 40 dangerous instructions across four application domains–autonomous driving, robotic manipulation, drone control, and industrial control–establishing a standardized benchmark for embodied AI safety. Large-scale white-box experiments on three mainstream open-source LLMs show SAPIA substantially outperforms traditional GCG and improved I-GCG, with notably high effectiveness on extremely high-risk instructions. Transferability analysis reveals distinctive properties in embodied settings: cross-architecture transfer is extremely limited, while high cross-version transferability exists within model series, contrasting with cross-model transfer observed in conventional adversarial research. Ablation studies confirm both the adaptive context module and enhanced GCG are critical and synergistic for optimal attack performance. Robustness analyses indicate SAPIA strongly resists mainstream defenses, effectively evading input perturbation, structured self-examination, and safety prefix prompting. This work exposes serious security vulnerabilities in current embodied AI agents and underscores the urgency of scenario-based protection mechanisms for safety-critical deployments.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112782"},"PeriodicalIF":4.1,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LogMeta: A few-shot model-agnostic meta-learning framework for robust and adaptive log anomaly detection LogMeta:一个少量模型无关的元学习框架,用于鲁棒和自适应日志异常检测
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-08 DOI: 10.1016/j.jss.2026.112781
Yicheng Sun, Jacky Wai Keung, Hi Kuen Yu, Wenqiang Luo
Context: Log anomaly detection is critical for maintaining the security, stability, and operational efficiency of modern software systems, especially as they generate vast and diverse log data. However, existing deep learning models struggle with the challenges of heterogeneous log formats across systems and the scarcity of labeled anomaly logs, limiting their real-world deployment and generalization capabilities.
Objective: To address these challenges, we propose LogMeta, a novel semi-supervised framework designed for adaptive and efficient log anomaly detection in diverse and low-resource environments.
Method: LogMeta integrates Model-Agnostic Meta-Learning (MAML) with a hybrid language model to address key challenges. MAML enables LogMeta to rapidly adapt to unseen log systems using few-shot samples, while the hybrid model combines RoBERTa for extracting semantic representations with Bi-LSTM and attention mechanisms to capture sequential dependencies and critical features within log sequences. This design reduces reliance on large-scale labeled datasets and enhances adaptability in heterogeneous environments.
Results: Experimental evaluations on multiple benchmark datasets demonstrate that LogMeta consistently outperforms state-of-the-art supervised and unsupervised methods, achieving up to a 28.3% improvement in F1-scores under low-resource scenarios compared to other models. Furthermore, LogMeta exhibits exceptional domain transfer capabilities, maintaining robust performance across diverse log datasets with minimal fine-tuning. In terms of efficiency, LogMeta achieves competitive training and inference times, making it suitable for real-time anomaly detection in large-scale systems.
Conclusion: LogMeta provides a scalable and practical solution for real-world log anomaly detection, overcoming challenges related to data heterogeneity and label scarcity. Its strong generalization capabilities, minimal supervision requirements, and adaptability to new log systems make it a promising tool for enhancing software system reliability and security.
背景信息:日志异常检测对于维护现代软件系统的安全性、稳定性和运行效率至关重要,特别是在日志数据量巨大、种类繁多的情况下。然而,现有的深度学习模型面临着跨系统异构日志格式的挑战,以及标记异常日志的稀缺性,限制了它们在现实世界中的部署和泛化能力。为了解决这些挑战,我们提出了LogMeta,这是一种新颖的半监督框架,旨在自适应和高效地检测多样化和低资源环境中的日志异常。方法:LogMeta将模型不可知元学习(MAML)与混合语言模型集成在一起,以解决关键挑战。MAML使LogMeta能够使用少量样本快速适应未见过的日志系统,而混合模型将RoBERTa与Bi-LSTM和注意力机制相结合,用于提取语义表示,以捕获日志序列中的顺序依赖关系和关键特征。这种设计减少了对大规模标记数据集的依赖,增强了在异构环境中的适应性。结果:在多个基准数据集上的实验评估表明,LogMeta始终优于最先进的监督和无监督方法,在低资源场景下,与其他模型相比,f1分数提高了28.3%。此外,LogMeta展示了出色的域转移能力,可以通过最小的微调在不同的日志数据集上保持稳健的性能。在效率方面,LogMeta实现了有竞争力的训练和推理时间,适用于大规模系统中的实时异常检测。结论:LogMeta为现实世界的日志异常检测提供了一个可扩展和实用的解决方案,克服了与数据异构和标签稀缺性相关的挑战。它具有较强的泛化能力、极少的监管要求和对新型日志系统的适应性,是提高软件系统可靠性和安全性的一种很有前途的工具。
{"title":"LogMeta: A few-shot model-agnostic meta-learning framework for robust and adaptive log anomaly detection","authors":"Yicheng Sun,&nbsp;Jacky Wai Keung,&nbsp;Hi Kuen Yu,&nbsp;Wenqiang Luo","doi":"10.1016/j.jss.2026.112781","DOIUrl":"10.1016/j.jss.2026.112781","url":null,"abstract":"<div><div><strong>Context:</strong> Log anomaly detection is critical for maintaining the security, stability, and operational efficiency of modern software systems, especially as they generate vast and diverse log data. However, existing deep learning models struggle with the challenges of heterogeneous log formats across systems and the scarcity of labeled anomaly logs, limiting their real-world deployment and generalization capabilities.</div><div><strong>Objective:</strong> To address these challenges, we propose LogMeta, a novel semi-supervised framework designed for adaptive and efficient log anomaly detection in diverse and low-resource environments.</div><div><strong>Method:</strong> LogMeta integrates Model-Agnostic Meta-Learning (MAML) with a hybrid language model to address key challenges. MAML enables LogMeta to rapidly adapt to unseen log systems using few-shot samples, while the hybrid model combines RoBERTa for extracting semantic representations with Bi-LSTM and attention mechanisms to capture sequential dependencies and critical features within log sequences. This design reduces reliance on large-scale labeled datasets and enhances adaptability in heterogeneous environments.</div><div><strong>Results:</strong> Experimental evaluations on multiple benchmark datasets demonstrate that LogMeta consistently outperforms state-of-the-art supervised and unsupervised methods, achieving up to a 28.3% improvement in F1-scores under low-resource scenarios compared to other models. Furthermore, LogMeta exhibits exceptional domain transfer capabilities, maintaining robust performance across diverse log datasets with minimal fine-tuning. In terms of efficiency, LogMeta achieves competitive training and inference times, making it suitable for real-time anomaly detection in large-scale systems.</div><div><strong>Conclusion:</strong> LogMeta provides a scalable and practical solution for real-world log anomaly detection, overcoming challenges related to data heterogeneity and label scarcity. Its strong generalization capabilities, minimal supervision requirements, and adaptability to new log systems make it a promising tool for enhancing software system reliability and security.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112781"},"PeriodicalIF":4.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of an automatic class diagram generator using an AI-based GRU classification model and 5W1H heuristic rules 基于人工智能GRU分类模型和5W1H启发式规则的自动类图生成器的开发
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-06 DOI: 10.1016/j.jss.2026.112780
Seungmo Jung, Woojin Lee
In software development, software requirements and class diagrams are core components that are closely related to each other. Software requirements specify the system's functionality in natural language, while class diagrams are created using CASE tools to visually represent the system's structure and behavior based on these requirements. Although software requirements and class diagrams are complementary, ensuring consistency between them is challenging due to the ambiguity and vagueness inherent in natural language. To address this issue, research on automatically transforming natural language into class diagrams is actively being conducted; however, most of these studies focus on requirements written in English. In addition, existing research primarily emphasizes the grammatical structure of natural language requirements, which limits their ability to reflect the conceptual structures of specific domains. To overcome these limitations, this paper proposes a method for developing an automatic class diagram generator that utilizes AI-based GRU classification model and 5W1H-based heuristic rules. The proposed class diagram generator extracts element and class model information from software requirements written in Korean and visualizes class diagrams based on a model interface language. For elements that can be directly extracted from natural language requirements, 5W1H-based heuristic rules considering linguistic characteristics are applied, while domain-specific elements requiring domain knowledge are extracted using an AI-based GRU classification model. Furthermore, when comparing the class diagrams generated by the proposed tool with those manually created by developers, the tool demonstrated high performance in terms of precision, recall, and F1-score.
在软件开发中,软件需求和类图是彼此密切相关的核心组件。软件需求以自然语言指定系统的功能,而类图是使用CASE工具创建的,以基于这些需求可视化地表示系统的结构和行为。尽管软件需求和类图是互补的,但由于自然语言固有的模糊性和模糊性,确保它们之间的一致性是具有挑战性的。为了解决这一问题,自然语言自动转换为类图的研究正在积极进行;然而,这些研究大多侧重于英语书面要求。此外,现有的研究主要强调自然语言的语法结构要求,这限制了它们反映特定领域概念结构的能力。为了克服这些局限性,本文提出了一种利用基于人工智能的GRU分类模型和基于5w1h的启发式规则开发自动类图生成器的方法。提出的类图生成器从用韩语编写的软件需求中提取元素和类模型信息,并基于模型接口语言可视化类图。对于可以直接从自然语言需求中提取的元素,采用基于5w1h的考虑语言特征的启发式规则,使用基于ai的GRU分类模型提取需要领域知识的特定领域元素。此外,当将建议的工具生成的类图与开发人员手工创建的类图进行比较时,该工具在精度、召回率和f1分数方面表现出了较高的性能。
{"title":"Development of an automatic class diagram generator using an AI-based GRU classification model and 5W1H heuristic rules","authors":"Seungmo Jung,&nbsp;Woojin Lee","doi":"10.1016/j.jss.2026.112780","DOIUrl":"10.1016/j.jss.2026.112780","url":null,"abstract":"<div><div>In software development, software requirements and class diagrams are core components that are closely related to each other. Software requirements specify the system's functionality in natural language, while class diagrams are created using CASE tools to visually represent the system's structure and behavior based on these requirements. Although software requirements and class diagrams are complementary, ensuring consistency between them is challenging due to the ambiguity and vagueness inherent in natural language. To address this issue, research on automatically transforming natural language into class diagrams is actively being conducted; however, most of these studies focus on requirements written in English. In addition, existing research primarily emphasizes the grammatical structure of natural language requirements, which limits their ability to reflect the conceptual structures of specific domains. To overcome these limitations, this paper proposes a method for developing an automatic class diagram generator that utilizes AI-based GRU classification model and 5W1H-based heuristic rules. The proposed class diagram generator extracts element and class model information from software requirements written in Korean and visualizes class diagrams based on a model interface language. For elements that can be directly extracted from natural language requirements, 5W1H-based heuristic rules considering linguistic characteristics are applied, while domain-specific elements requiring domain knowledge are extracted using an AI-based GRU classification model. Furthermore, when comparing the class diagrams generated by the proposed tool with those manually created by developers, the tool demonstrated high performance in terms of precision, recall, and F1-score.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112780"},"PeriodicalIF":4.1,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISRLNN: A software defect prediction method based on instance similarity reverse loss ISRLNN:一种基于实例相似性反向损失的软件缺陷预测方法
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-03 DOI: 10.1016/j.jss.2025.112766
Yu Tang , Ye Du , Jian-Bo Gao , Ang Li , Ming-Song Yang
Software defect prediction is a crucial technique for ensuring software reliability. However, software defect datasets often exhibit complex feature dependencies and traditional feature engineering methods have limitations in capturing non-linear relationships between these features.As deep learning can effectively capture these complex relationships, they have the potential to overcome the shortcomings of traditional feature engineering techniques. In this paper, we propose the concept of instance image and transform the software defect prediction problem into an image classification task based on instance images, thus fully leveraging the feature extraction capabilities of deep learning. Additionally, to address the limitations of existing binary cross-entropy loss functions in classification models that they cannot account for instance importance differences, we also design an instance similarity reverse loss function. We first design a method to measure instance similarity and dynamically adjust the instance weights during loss calculation based on this similarity. Next, we use normalized instance similarity loss as the active loss in the active-passive loss framework. Finally, we construct a software defect prediction method based on the Instance Similarity Reverse Loss (ISRL). The experimental results show that the proposed method improves performance by 5% to 8% compared to existing works.
软件缺陷预测是保证软件可靠性的一项关键技术。然而,软件缺陷数据集经常表现出复杂的特征依赖关系,传统的特征工程方法在捕获这些特征之间的非线性关系方面存在局限性。由于深度学习可以有效地捕获这些复杂的关系,它们有可能克服传统特征工程技术的缺点。本文提出了实例图像的概念,将软件缺陷预测问题转化为基于实例图像的图像分类任务,充分利用了深度学习的特征提取能力。此外,为了解决现有二元交叉熵损失函数在分类模型中不能考虑实例重要性差异的局限性,我们还设计了一个实例相似度反向损失函数。首先设计了一种度量实例相似度的方法,并在计算损失时根据相似度动态调整实例权重。接下来,我们使用归一化的实例相似度损失作为主-被动损失框架中的主动损失。最后,我们构建了一种基于实例相似反向损失(ISRL)的软件缺陷预测方法。实验结果表明,该方法与现有方法相比,性能提高了5% ~ 8%。
{"title":"ISRLNN: A software defect prediction method based on instance similarity reverse loss","authors":"Yu Tang ,&nbsp;Ye Du ,&nbsp;Jian-Bo Gao ,&nbsp;Ang Li ,&nbsp;Ming-Song Yang","doi":"10.1016/j.jss.2025.112766","DOIUrl":"10.1016/j.jss.2025.112766","url":null,"abstract":"<div><div>Software defect prediction is a crucial technique for ensuring software reliability. However, software defect datasets often exhibit complex feature dependencies and traditional feature engineering methods have limitations in capturing non-linear relationships between these features.As deep learning can effectively capture these complex relationships, they have the potential to overcome the shortcomings of traditional feature engineering techniques. In this paper, we propose the concept of instance image and transform the software defect prediction problem into an image classification task based on instance images, thus fully leveraging the feature extraction capabilities of deep learning. Additionally, to address the limitations of existing binary cross-entropy loss functions in classification models that they cannot account for instance importance differences, we also design an instance similarity reverse loss function. We first design a method to measure instance similarity and dynamically adjust the instance weights during loss calculation based on this similarity. Next, we use normalized instance similarity loss as the active loss in the active-passive loss framework. Finally, we construct a software defect prediction method based on the <u>I</u>nstance <u>S</u>imilarity <u>R</u>everse <u>L</u>oss (ISRL). The experimental results show that the proposed method improves performance by 5% to 8% compared to existing works.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112766"},"PeriodicalIF":4.1,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test case specification techniques and system testing tools in the automotive industry: A review 汽车工业中的测试用例规范技术和系统测试工具:回顾
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-02 DOI: 10.1016/j.jss.2025.112764
Denesa Zyberaj , Pascal Hirmer , Marco Aiello , Stefan Wagner
The automotive domain is shifting to software-centric development to meet regulation, market pressure, and feature velocity. This shift increases embedded systems’ complexity and strains testing capacity. Despite relevant standards, a coherent system-testing methodology that spans heterogeneous, legacy-constrained toolchains remains elusive, and practice often depends on individual expertise rather than a systematic strategy. We derive challenges and requirements from a systematic literature review (SLR), complemented by industry experience and practice. We map them to test case specification techniques and testing tools, evaluating their suitability for automotive testing using PRISMA. Our contribution is a curated catalog that supports technique/tool selection and can inform future testing frameworks and improvements. We synthesize nine recurring challenge areas across the life cycle, such as requirements quality and traceability, variability management, and toolchain fragmentation. We then provide a prioritized criteria catalog that recommends model-based planning, interoperable and traceable toolchains, requirements uplift, pragmatic automation and virtualization, targeted AI and formal methods, actionable metrics, and lightweight organizational practices.
汽车领域正在转向以软件为中心的开发,以满足法规、市场压力和特性速度的要求。这种转变增加了嵌入式系统的复杂性和测试能力。尽管有相关的标准,跨越异构的、遗留约束的工具链的一致的系统测试方法仍然是难以捉摸的,并且实践常常依赖于个人的专业知识而不是系统的策略。我们从系统的文献回顾(SLR)中获得挑战和需求,并辅以行业经验和实践。我们将它们映射到测试用例规范技术和测试工具,使用PRISMA评估它们对汽车测试的适用性。我们的贡献是一个精心策划的目录,它支持技术/工具的选择,并可以告知未来的测试框架和改进。我们在生命周期中综合了9个反复出现的挑战领域,例如需求质量和可追溯性、可变性管理和工具链碎片。然后,我们提供了一个优先的标准目录,推荐基于模型的计划、可互操作和可跟踪的工具链、需求提升、实用的自动化和虚拟化、目标人工智能和正式方法、可操作的度量标准,以及轻量级的组织实践。
{"title":"Test case specification techniques and system testing tools in the automotive industry: A review","authors":"Denesa Zyberaj ,&nbsp;Pascal Hirmer ,&nbsp;Marco Aiello ,&nbsp;Stefan Wagner","doi":"10.1016/j.jss.2025.112764","DOIUrl":"10.1016/j.jss.2025.112764","url":null,"abstract":"<div><div>The automotive domain is shifting to software-centric development to meet regulation, market pressure, and feature velocity. This shift increases embedded systems’ complexity and strains testing capacity. Despite relevant standards, a coherent system-testing methodology that spans heterogeneous, legacy-constrained toolchains remains elusive, and practice often depends on individual expertise rather than a systematic strategy. We derive challenges and requirements from a systematic literature review (SLR), complemented by industry experience and practice. We map them to test case specification techniques and testing tools, evaluating their suitability for automotive testing using PRISMA. Our contribution is a curated catalog that supports technique/tool selection and can inform future testing frameworks and improvements. We synthesize nine recurring challenge areas across the life cycle, such as requirements quality and traceability, variability management, and toolchain fragmentation. We then provide a prioritized criteria catalog that recommends model-based planning, interoperable and traceable toolchains, requirements uplift, pragmatic automation and virtualization, targeted AI and formal methods, actionable metrics, and lightweight organizational practices.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112764"},"PeriodicalIF":4.1,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RLV: LLM-based vulnerability detection by retrieving and refining contextual information RLV:通过检索和精炼上下文信息来进行基于llm的漏洞检测
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-02 DOI: 10.1016/j.jss.2025.112756
Fangcheng Qiu , Zhongxin Liu , Bingde Hu , Zhengong Cai , Lingfeng Bao , Xinyu Wang
Vulnerability detection plays a critical role in ensuring software quality during the processes of software development and maintenance. Automated vulnerability detection methods have been proposed to reduce the consumption of human and material resources. From traditional machine learning-based approaches to deep learning-based approaches, vulnerability detection techniques have continuously evolved and improved. Recently, Large Language Models (LLMs) have been increasingly applied to vulnerability detection. However, deep learning-based approaches and LLM-based approaches suffer from two main problems: (1) They suffer from poor generalization capabilities, which limit their performance in real-world scenarios. (2) They lack accurate contextual information of the target function, which hinders their ability to correctly understand the target function. To tackle these problems, in this paper, we propose a novel vulnerability detection approach, named RLV (Retrieving&Refining Contextual Information for LLM-based Vulnerability Detection), an LLM-based approach that enhances vulnerability detection by integrating project-level contextual information into the analysis process. RLV emulates how programmers reason about code, enabling the LLM to retrieve and refine relevant semantic context from the project repository to better understand the target function. Besides, RLV guides the LLM via effective prompts, avoiding task-specific training and enhancing its practicality in real-world scenarios. We conduct experiments on two vulnerability datasets with a total of 30,436 vulnerable functions and 306,269 non-vulnerable functions. The experimental results demonstrate that our approach achieves state-of-the-art performance. Moreover, our approach achieves a 26.83% improvement in terms of F1-score over state-of-the-art baselines when tested on unseen projects.
在软件开发和维护过程中,漏洞检测对保证软件质量起着至关重要的作用。为了减少人力和物力的消耗,提出了自动化漏洞检测方法。从传统的基于机器学习的方法到基于深度学习的方法,漏洞检测技术不断发展和改进。近年来,大型语言模型(Large Language Models, llm)在漏洞检测中的应用越来越广泛。然而,基于深度学习的方法和基于llm的方法存在两个主要问题:(1)泛化能力差,这限制了它们在现实场景中的性能。(2)缺乏准确的目标函数语境信息,阻碍了他们正确理解目标函数的能力。为了解决这些问题,本文提出了一种新的漏洞检测方法,命名为RLV (Retrieving&Refining Contextual Information for llm based vulnerability detection),这种基于llm的方法通过将项目级上下文信息集成到分析过程中来增强漏洞检测。RLV模拟程序员如何推理代码,使LLM能够从项目存储库中检索和细化相关的语义上下文,以更好地理解目标函数。此外,RLV通过有效的提示来指导LLM,避免了特定任务的培训,增强了其在现实场景中的实用性。我们在两个漏洞数据集上进行实验,总共有30436个漏洞函数和306269个非漏洞函数。实验结果表明,我们的方法达到了最先进的性能。此外,当在未见过的项目上进行测试时,我们的方法在f1得分方面比最先进的基线提高了26.83%。
{"title":"RLV: LLM-based vulnerability detection by retrieving and refining contextual information","authors":"Fangcheng Qiu ,&nbsp;Zhongxin Liu ,&nbsp;Bingde Hu ,&nbsp;Zhengong Cai ,&nbsp;Lingfeng Bao ,&nbsp;Xinyu Wang","doi":"10.1016/j.jss.2025.112756","DOIUrl":"10.1016/j.jss.2025.112756","url":null,"abstract":"<div><div>Vulnerability detection plays a critical role in ensuring software quality during the processes of software development and maintenance. Automated vulnerability detection methods have been proposed to reduce the consumption of human and material resources. From traditional machine learning-based approaches to deep learning-based approaches, vulnerability detection techniques have continuously evolved and improved. Recently, Large Language Models (LLMs) have been increasingly applied to vulnerability detection. However, deep learning-based approaches and LLM-based approaches suffer from two main problems: (1) They suffer from poor generalization capabilities, which limit their performance in real-world scenarios. (2) They lack accurate contextual information of the target function, which hinders their ability to correctly understand the target function. To tackle these problems, in this paper, we propose a novel vulnerability detection approach, named <span>RLV</span> (<u><strong>R</strong></u>etrieving&amp;Refining Contextual Information for <u><strong>L</strong></u>LM-based <u><strong>V</strong></u>ulnerability Detection), an LLM-based approach that enhances vulnerability detection by integrating project-level contextual information into the analysis process. RLV emulates how programmers reason about code, enabling the LLM to retrieve and refine relevant semantic context from the project repository to better understand the target function. Besides, RLV guides the LLM via effective prompts, avoiding task-specific training and enhancing its practicality in real-world scenarios. We conduct experiments on two vulnerability datasets with a total of 30,436 vulnerable functions and 306,269 non-vulnerable functions. The experimental results demonstrate that our approach achieves state-of-the-art performance. Moreover, our approach achieves a 26.83% improvement in terms of <em>F</em><sub>1</sub>-score over state-of-the-art baselines when tested on unseen projects.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112756"},"PeriodicalIF":4.1,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLOps pipeline generation for reinforcement learning: A low-code approach using large language models 用于强化学习的MLOps管道生成:使用大型语言模型的低代码方法
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-01 DOI: 10.1016/j.jss.2025.112760
Stephen John Warnett , Evangelos Ntentos , Uwe Zdun
MLOps (Machine Learning Operations) and its application to Reinforcement Learning (RL) involve various challenges when integrating Machine Learning and RL models into production systems, entailing considerable expertise and manual effort, which can be error-prone and obstruct scalability and rapid deployment. We propose a new approach to address these challenges in generating MLOps pipelines. We present a low-code, template-based approach leveraging Large Language Models (LLMs) to automate RL pipeline generation, validation and deployment. In our approach, the Pipes and Filters pattern allows for the fine-grained generation of MLOps pipeline configuration files. Built-in error detection and correction help maintain high-quality output standards.
To empirically evaluate our solution, we assess the correctness of pipelines generated with seven LLMs for three open-source RL projects. Our initial approach achieved an average error rate of 0.187 across all seven LLMs. OpenAI GPT-4o performed the best with an error rate of just 0.09, followed by Qwen2.5 Coder with an error rate of 0.15. We implemented a single round of improvements to our implementation and low-code template. We reevaluated our solution on the best-performing LLM from the initial evaluation, achieving perfect results with an overall error rate of zero for OpenAI GPT-4o. Our findings indicate that pipelines generated by our approach have low error rates, potentially enabling rapid scaling and deployment of reliable MLOps for RL pipelines, particularly for practitioners lacking advanced software engineering or DevOps skills. Our approach contributes towards demonstrating increased reliability and trustworthiness in LLM-based solutions, despite the uncertainty hitherto associated with LLMs.
在将机器学习和强化学习模型集成到生产系统中时,MLOps(机器学习操作)及其在强化学习(RL)中的应用涉及各种挑战,需要大量的专业知识和手工工作,这可能容易出错,并阻碍可扩展性和快速部署。我们提出了一种新的方法来解决这些挑战,以产生MLOps管道。我们提出了一种低代码、基于模板的方法,利用大型语言模型(llm)来自动化RL管道的生成、验证和部署。在我们的方法中,管道和过滤器模式允许细粒度地生成MLOps管道配置文件。内置的错误检测和纠正有助于保持高质量的输出标准。为了经验性地评估我们的解决方案,我们评估了三个开源RL项目中七个llm生成的管道的正确性。我们最初的方法在所有7个llm中实现了0.187的平均错误率。OpenAI gpt - 40表现最好,错误率仅为0.09,其次是Qwen2.5 Coder,错误率为0.15。我们对我们的实现和低代码模板进行了一轮改进。我们从最初的评估开始,在性能最好的LLM上重新评估了我们的解决方案,在OpenAI gpt - 40上获得了完美的结果,总体错误率为零。我们的研究结果表明,通过我们的方法生成的管道具有较低的错误率,可以为RL管道快速扩展和部署可靠的mlop,特别是对于缺乏高级软件工程或DevOps技能的从业者。我们的方法有助于在基于法学硕士的解决方案中展示更高的可靠性和可信度,尽管迄今为止与法学硕士相关的不确定性。
{"title":"MLOps pipeline generation for reinforcement learning: A low-code approach using large language models","authors":"Stephen John Warnett ,&nbsp;Evangelos Ntentos ,&nbsp;Uwe Zdun","doi":"10.1016/j.jss.2025.112760","DOIUrl":"10.1016/j.jss.2025.112760","url":null,"abstract":"<div><div>MLOps (Machine Learning Operations) and its application to Reinforcement Learning (RL) involve various challenges when integrating Machine Learning and RL models into production systems, entailing considerable expertise and manual effort, which can be error-prone and obstruct scalability and rapid deployment. We propose a new approach to address these challenges in generating MLOps pipelines. We present a low-code, template-based approach leveraging Large Language Models (LLMs) to automate RL pipeline generation, validation and deployment. In our approach, the Pipes and Filters pattern allows for the fine-grained generation of MLOps pipeline configuration files. Built-in error detection and correction help maintain high-quality output standards.</div><div>To empirically evaluate our solution, we assess the correctness of pipelines generated with seven LLMs for three open-source RL projects. Our initial approach achieved an average error rate of 0.187 across all seven LLMs. OpenAI GPT-4o performed the best with an error rate of just 0.09, followed by Qwen2.5 Coder with an error rate of 0.15. We implemented a single round of improvements to our implementation and low-code template. We reevaluated our solution on the best-performing LLM from the initial evaluation, achieving perfect results with an overall error rate of zero for OpenAI GPT-4o. Our findings indicate that pipelines generated by our approach have low error rates, potentially enabling rapid scaling and deployment of reliable MLOps for RL pipelines, particularly for practitioners lacking advanced software engineering or DevOps skills. Our approach contributes towards demonstrating increased reliability and trustworthiness in LLM-based solutions, despite the uncertainty hitherto associated with LLMs.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112760"},"PeriodicalIF":4.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring challenges in test mocking: Developer questions and insights from StackOverflow 探索测试模拟中的挑战:来自StackOverflow的开发人员问题和见解
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-31 DOI: 10.1016/j.jss.2025.112748
Mumtahina Ahmed , Md Nahidul Islam Opu , Chanchal Roy , Sujana Islam Suhi , Shaiful Chowdhury
Mocking is a common unit testing technique that is used to simplify tests, reduce flakiness, and improve coverage by replacing real dependencies with simplified implementations. Despite its widespread use in Open Source Software (OSS) projects, there is limited understanding of how and why developers use mocks and the challenges they face. In this study, we have analyzed 25,302 questions related to Mocking on StackOverflow to identify the challenges faced by developers. We have used Latent Dirichlet Allocation (LDA) for topic modeling, identified 30 key topics, and grouped the topics into five key categories. Consequently, we analyzed the annual and relative probabilities of each category to understand the evolution of mocking-related discussions. Trend analysis reveals that categories such as Mocking Techniques and External Services have remained consistently dominant, highlighting evolving developer priorities and ongoing technical challenges. While the questions on Theoretical category declined after 2010, posts regarding Error Handling grew notably from 2009.
Our findings also show an inverse relationship between a topic’s popularity and its difficulty. Popular topics like Framework Selection tend to have lower difficulty and faster resolution times, while complex topics like HTTP Requests and Responses are more likely to remain unanswered and take longer to resolve. Additionally, we evaluated questions based on the answer status- successful, ordinary, or unsuccessful, and found that topics such as Framework Selection have higher success rates, whereas tool setup and Android-related issues are more often unresolved. A classification of questions into How, Why, What, and Other revealed that over 64 % are How questions, particularly in practical domains like file access, APIs, and databases, indicating a strong need for implementation guidance. Why questions are more prevalent in error-handling contexts, reflecting conceptual challenges in debugging, while What questions are rare and mostly tied to theoretical discussions. These insights offer valuable guidance for improving developer support, tooling, and educational content in the context of mocking and unit testing.
mock是一种常用的单元测试技术,用于简化测试,减少不稳定,并通过用简化的实现取代实际的依赖关系来提高覆盖率。尽管在开源软件(OSS)项目中广泛使用mock,但是对于开发人员如何以及为什么使用mock以及他们面临的挑战,人们的理解是有限的。在这项研究中,我们分析了25,302个与调侃StackOverflow相关的问题,以确定开发人员面临的挑战。我们使用潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)进行主题建模,确定了30个关键主题,并将主题分为五个关键类别。因此,我们分析了每个类别的年度和相对概率,以了解嘲弄相关讨论的演变。趋势分析显示,诸如模拟技术和外部服务之类的类别一直占据主导地位,突出了不断发展的开发人员优先级和正在进行的技术挑战。虽然理论类的问题在2010年后有所下降,但关于错误处理的帖子从2009年开始显著增加。我们的研究结果还表明,一个话题的受欢迎程度与其难度成反比关系。像框架选择这样的热门主题往往具有较低的难度和更快的解决时间,而像HTTP请求和响应这样的复杂主题更有可能没有答案,需要更长的时间才能解决。此外,我们根据答案状态(成功、普通或不成功)对问题进行了评估,发现框架选择等主题的成功率更高,而工具设置和android相关问题往往无法解决。将问题分类为How、Why、What和Other,结果显示,超过64% %的问题是How,特别是在文件访问、api和数据库等实际领域,这表明迫切需要实现指南。为什么问题在错误处理上下文中更普遍,反映了调试中的概念性挑战,而什么问题很少,并且主要与理论讨论有关。这些见解为在模拟和单元测试的上下文中改进开发人员支持、工具和教育内容提供了有价值的指导。
{"title":"Exploring challenges in test mocking: Developer questions and insights from StackOverflow","authors":"Mumtahina Ahmed ,&nbsp;Md Nahidul Islam Opu ,&nbsp;Chanchal Roy ,&nbsp;Sujana Islam Suhi ,&nbsp;Shaiful Chowdhury","doi":"10.1016/j.jss.2025.112748","DOIUrl":"10.1016/j.jss.2025.112748","url":null,"abstract":"<div><div>Mocking is a common unit testing technique that is used to simplify tests, reduce flakiness, and improve coverage by replacing real dependencies with simplified implementations. Despite its widespread use in Open Source Software (OSS) projects, there is limited understanding of how and why developers use mocks and the challenges they face. In this study, we have analyzed 25,302 questions related to <em>Mocking</em> on StackOverflow to identify the challenges faced by developers. We have used Latent Dirichlet Allocation (LDA) for topic modeling, identified 30 key topics, and grouped the topics into five key categories. Consequently, we analyzed the annual and relative probabilities of each category to understand the evolution of mocking-related discussions. Trend analysis reveals that categories such as <em>Mocking Techniques</em> and <em>External Services</em> have remained consistently dominant, highlighting evolving developer priorities and ongoing technical challenges. While the questions on <em>Theoretical</em> category declined after 2010, posts regarding <em>Error Handling</em> grew notably from 2009.</div><div>Our findings also show an inverse relationship between a topic’s popularity and its difficulty. Popular topics like <em>Framework Selection</em> tend to have lower difficulty and faster resolution times, while complex topics like <em>HTTP Requests and Responses</em> are more likely to remain unanswered and take longer to resolve. Additionally, we evaluated questions based on the answer status- successful, ordinary, or unsuccessful, and found that topics such as <em>Framework Selection</em> have higher success rates, whereas tool setup and Android-related issues are more often unresolved. A classification of questions into <em>How, Why, What</em>, and <em>Other</em> revealed that over 64 % are <em>How</em> questions, particularly in practical domains like file access, APIs, and databases, indicating a strong need for implementation guidance. <em>Why</em> questions are more prevalent in error-handling contexts, reflecting conceptual challenges in debugging, while <em>What</em> questions are rare and mostly tied to theoretical discussions. These insights offer valuable guidance for improving developer support, tooling, and educational content in the context of mocking and unit testing.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"235 ","pages":"Article 112748"},"PeriodicalIF":4.1,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Systems and Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1