首页 > 最新文献

Automated Software Engineering最新文献

英文 中文
Knowledge distillation-driven commit-aware multimodal learning for software vulnerability detection 面向软件漏洞检测的知识提取驱动的提交感知多模态学习
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-21 DOI: 10.1007/s10515-026-00595-z
Rim Mahouachi

Software vulnerabilities are a major concern in cybersecurity, as even small flaws can expose systems to serious risks. Most automated detection methods focus on a single source–usually code–limiting their ability to capture the diverse ways vulnerabilities can appear and leaving complementary information underused. To address this limitation, this work investigates how heterogeneous modalities contribute to vulnerability detection and proposes a multimodal deep learning framework that integrates code semantics, commit messages, static metrics, and syntactic structures through an attention-based fusion mechanism. To ensure deployability in real-world scenarios where commit information is often unavailable at inference time, a teacher–student knowledge distillation scheme is employed. The multimodal teacher model, using all modalities, transfers its knowledge to a lightweight student model that relies solely on code features. Experiments on multiple vulnerability datasets show that multimodal fusion significantly improves detection performance and that knowledge distillation preserves much of this gain while enabling practical deployment. Our findings highlight the importance of cross-modal integration and distillation for building robust and scalable vulnerability detection systems.

软件漏洞是网络安全中的一个主要问题,因为即使是很小的缺陷也可能使系统面临严重的风险。大多数自动化检测方法专注于单个源(通常是代码),限制了它们捕捉漏洞可能出现的各种方式的能力,并使补充信息未得到充分利用。为了解决这一限制,本工作研究了异构模式如何有助于漏洞检测,并提出了一个多模式深度学习框架,该框架通过基于注意力的融合机制集成了代码语义、提交消息、静态度量和语法结构。为了确保在推理时提交信息通常不可用的实际场景中的可部署性,采用了教师-学生知识蒸馏方案。多模态教师模型使用所有模态,将其知识传递给仅依赖代码特性的轻量级学生模型。在多个漏洞数据集上的实验表明,多模态融合显著提高了检测性能,知识蒸馏在实现实际部署的同时保留了大部分增益。我们的研究结果强调了跨模态集成和蒸馏对于构建健壮和可扩展的漏洞检测系统的重要性。
{"title":"Knowledge distillation-driven commit-aware multimodal learning for software vulnerability detection","authors":"Rim Mahouachi","doi":"10.1007/s10515-026-00595-z","DOIUrl":"10.1007/s10515-026-00595-z","url":null,"abstract":"<div><p>Software vulnerabilities are a major concern in cybersecurity, as even small flaws can expose systems to serious risks. Most automated detection methods focus on a single source–usually code–limiting their ability to capture the diverse ways vulnerabilities can appear and leaving complementary information underused. To address this limitation, this work investigates how heterogeneous modalities contribute to vulnerability detection and proposes a multimodal deep learning framework that integrates code semantics, commit messages, static metrics, and syntactic structures through an attention-based fusion mechanism. To ensure deployability in real-world scenarios where commit information is often unavailable at inference time, a teacher–student knowledge distillation scheme is employed. The multimodal teacher model, using all modalities, transfers its knowledge to a lightweight student model that relies solely on code features. Experiments on multiple vulnerability datasets show that multimodal fusion significantly improves detection performance and that knowledge distillation preserves much of this gain while enabling practical deployment. Our findings highlight the importance of cross-modal integration and distillation for building robust and scalable vulnerability detection systems.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust vulnerability detection with limited data via training-efficient adversarial reprogramming 基于训练高效对抗性重编程的有限数据鲁棒漏洞检测
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-19 DOI: 10.1007/s10515-026-00590-4
Zhenzhou Tian, Chuang Zhang, Yunpeng Hui, Jiaze Sun, Yanping Chen, Lingwei Chen

The substantial increase in software vulnerabilities poses a significant threat to system security, prompting a surge of interest in applying deep learning (DL) to vulnerability detection. However, current DL-based detectors heavily rely on large-scale labeled data, leading to inefficiency and notable performance degradation in scenarios with limited data. Furthermore, these detectors often lack robustness against adversarial code transformation attacks. To address these challenges, this paper proposes ArVD (Adversarial Reprogramming-Based Vulnerability Detector), which implements a novel and computationally inexpensive approach to reprogram a pre-trained model for detecting vulnerabilities at the function level. Specifically, ArVD first constructs structure-aware token sequences from source code. Given these inputs, the model then exclusively learns universal perturbation elements to be added into the token sequences and leverages self-attention mechanism to enhance non-linear interactions among tokens and perturbations, such that the learning capabilities from the pre-trained model can be adapted to vulnerability detection with less training data and time yet higher detection effectiveness and robustness. Extensive experiments conducted on multiple datasets demonstrate that ArVD significantly reduces the trainable parameters to approximately 20,000 while outperforming DL-based baselines in terms of detection effectiveness, data-limited performance, as well as runtime overhead. Moreover, ArVD effectively counters code transformation attacks; compared to the state-of-the-art ZigZag framework that is designed to enhance detector robustness, ArVD exhibits an averaged relative improvement of 18.89% in F1, with a decrease of 43.25% and 42.65% in FPR and FNR, respectively.

软件漏洞的大量增加对系统安全构成了重大威胁,促使人们对将深度学习(DL)应用于漏洞检测产生了浓厚的兴趣。然而,目前基于dl的检测器严重依赖于大规模标记数据,在数据有限的情况下导致效率低下和显著的性能下降。此外,这些检测器通常对对抗性代码转换攻击缺乏鲁棒性。为了应对这些挑战,本文提出了基于对抗重新编程的漏洞检测器ArVD (Adversarial Reprogramming-Based Vulnerability Detector),它实现了一种新颖且计算成本低廉的方法来重新编程预先训练的模型,用于在功能级别检测漏洞。具体来说,ArVD首先从源代码构造结构感知的令牌序列。在给定这些输入的情况下,模型专门学习将通用扰动元素添加到令牌序列中,并利用自注意机制增强令牌与扰动之间的非线性相互作用,从而使预训练模型的学习能力能够适应较少训练数据和时间的漏洞检测,但具有更高的检测效率和鲁棒性。在多个数据集上进行的大量实验表明,ArVD显着将可训练参数减少到大约20,000,同时在检测效率,数据限制性能以及运行时开销方面优于基于dl的基线。此外,ArVD有效地对抗代码转换攻击;与旨在增强检测器鲁棒性的最先进的ZigZag框架相比,ArVD在F1方面的平均相对提高了18.89%,在FPR和FNR方面分别下降了43.25%和42.65%。
{"title":"Robust vulnerability detection with limited data via training-efficient adversarial reprogramming","authors":"Zhenzhou Tian,&nbsp;Chuang Zhang,&nbsp;Yunpeng Hui,&nbsp;Jiaze Sun,&nbsp;Yanping Chen,&nbsp;Lingwei Chen","doi":"10.1007/s10515-026-00590-4","DOIUrl":"10.1007/s10515-026-00590-4","url":null,"abstract":"<div>\u0000 \u0000 <p>The substantial increase in software vulnerabilities poses a significant threat to system security, prompting a surge of interest in applying deep learning (DL) to vulnerability detection. However, current DL-based detectors heavily rely on large-scale labeled data, leading to inefficiency and notable performance degradation in scenarios with limited data. Furthermore, these detectors often lack robustness against adversarial code transformation attacks. To address these challenges, this paper proposes ArVD (<u>A</u>dversarial <u>R</u>eprogramming-Based <u>V</u>ulnerability <u>D</u>etector), which implements a novel and computationally inexpensive approach to reprogram a pre-trained model for detecting vulnerabilities at the function level. Specifically, ArVD first constructs structure-aware token sequences from source code. Given these inputs, the model then exclusively learns universal perturbation elements to be added into the token sequences and leverages self-attention mechanism to enhance non-linear interactions among tokens and perturbations, such that the learning capabilities from the pre-trained model can be adapted to vulnerability detection with less training data and time yet higher detection effectiveness and robustness. Extensive experiments conducted on multiple datasets demonstrate that ArVD significantly reduces the trainable parameters to approximately 20,000 while outperforming DL-based baselines in terms of detection effectiveness, data-limited performance, as well as runtime overhead. Moreover, ArVD effectively counters code transformation attacks; compared to the state-of-the-art ZigZag framework that is designed to enhance detector robustness, ArVD exhibits an averaged relative improvement of 18.89% in F1, with a decrease of 43.25% and 42.65% in FPR and FNR, respectively.</p>\u0000 </div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From interaction to evolution: A behavior-driven framework for automated software requirements elicitation 从交互到演化:用于自动化软件需求引出的行为驱动框架
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-19 DOI: 10.1007/s10515-026-00593-1
Tong Li, Chaoqun Wen, Xinran Zhang

Eliciting evolutionary requirements is essential for the continuous improvement of software systems. Traditional approaches rely on user feedback from forums and social media; however, these methods often suffer from low engagement rates and subjective biases. In contrast, user interaction behavior data offers more objective and comprehensive insights into user needs throughout task execution. Despite its advantages, the ambiguous relationship between user behaviors and requirements presents significant challenges for behavior-driven requirement inference. This paper proposes a novel approach for eliciting evolution requirements based on user interaction behavior. We first employ the conceptual model and goal model to normalize multi-modal behavior data and associate it with goal contexts. This structured representation reduces complexity and establishes a foundation for further analysis. Next, we extract key behavioral metrics through statistical analysis to quantify task difficulty from a user experience perspective, providing a basis for prioritizing inferred requirements. Finally, pattern mining techniques and large language model are applied to derive evolutionary requirements, identifying meaningful behavioral patterns to strengthen the connection between user behavior and requirements while enhancing automation and effectiveness by large language model inference. A case study with 20 participants was conducted to validate the proposed method. The results demonstrate that the approach successfully captures 95.6% of explicitly stated user requirements and uncovers additional valuable insights. These findings underscore the method’s effectiveness and usefulness in enhancing software design optimization.

引出演化需求对于软件系统的持续改进是必不可少的。传统方法依赖论坛和社交媒体的用户反馈;然而,这些方法往往存在低参与度和主观偏见的问题。相比之下,用户交互行为数据可以在任务执行过程中更客观、更全面地洞察用户需求。尽管用户行为和需求之间的模糊关系有其优点,但它对行为驱动的需求推理提出了重大挑战。本文提出了一种基于用户交互行为的进化需求提取方法。我们首先采用概念模型和目标模型对多模态行为数据进行规范化,并将其与目标上下文相关联。这种结构化表示降低了复杂性,并为进一步分析奠定了基础。接下来,我们通过统计分析提取关键的行为指标,从用户体验的角度量化任务难度,为推断需求的优先级提供基础。最后,应用模式挖掘技术和大语言模型推导演化需求,识别有意义的行为模式,加强用户行为和需求之间的联系,同时通过大语言模型推理提高自动化和有效性。通过20名参与者的案例研究来验证所提出的方法。结果表明,该方法成功捕获了95.6%明确声明的用户需求,并揭示了额外的有价值的见解。这些发现强调了该方法在增强软件设计优化方面的有效性和实用性。
{"title":"From interaction to evolution: A behavior-driven framework for automated software requirements elicitation","authors":"Tong Li,&nbsp;Chaoqun Wen,&nbsp;Xinran Zhang","doi":"10.1007/s10515-026-00593-1","DOIUrl":"10.1007/s10515-026-00593-1","url":null,"abstract":"<div><p>Eliciting evolutionary requirements is essential for the continuous improvement of software systems. Traditional approaches rely on user feedback from forums and social media; however, these methods often suffer from low engagement rates and subjective biases. In contrast, user interaction behavior data offers more objective and comprehensive insights into user needs throughout task execution. Despite its advantages, the ambiguous relationship between user behaviors and requirements presents significant challenges for behavior-driven requirement inference. This paper proposes a novel approach for eliciting evolution requirements based on user interaction behavior. We first employ the conceptual model and goal model to normalize multi-modal behavior data and associate it with goal contexts. This structured representation reduces complexity and establishes a foundation for further analysis. Next, we extract key behavioral metrics through statistical analysis to quantify task difficulty from a user experience perspective, providing a basis for prioritizing inferred requirements. Finally, pattern mining techniques and large language model are applied to derive evolutionary requirements, identifying meaningful behavioral patterns to strengthen the connection between user behavior and requirements while enhancing automation and effectiveness by large language model inference. A case study with 20 participants was conducted to validate the proposed method. The results demonstrate that the approach successfully captures 95.6% of explicitly stated user requirements and uncovers additional valuable insights. These findings underscore the method’s effectiveness and usefulness in enhancing software design optimization.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study of privacy-related data collected by Android apps 一项关于Android应用收集的隐私相关数据的研究
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-12 DOI: 10.1007/s10515-025-00589-3
Mugdha Khedkar, Ambuj Kumar Mondal, Eric Bodden

Many Android apps collect data from users, and the European Union’s General Data Protection Regulation (GDPR) mandates clear disclosures of such data collection. However, apps often use third-party code, complicating accurate disclosures. This paper investigates how accurately current Android apps fulfill these requirements. In this work, we present a multi-layered definition of privacy-related data to correctly report data collection in Android apps. We further create a dataset of privacy-sensitive data classes that may be used as input by an Android app. This dataset takes into account data collected both through the user interface and system APIs. Based on this, we implement a semi-automated prototype that detects and labels privacy-related data collected by a given Android app. We manually examine the data safety sections of 70 Android apps to observe how data collection is reported, identifying instances of over- and under-reporting. We compare our prototype’s results with the data safety sections of 20 apps revealing reporting discrepancies. Using the results from two Messaging and Social Media apps (Signal and Instagram), we discuss how app developers under-report and over-report data collection, respectively, and identify inaccurately reported data categories. A broader study of 7,500 Android apps reveals that apps most frequently collect data that can partially identify users. Although system APIs consistently collect large amounts of privacy-related data, user interfaces exhibit some more diverse data collection patterns. A more focused study on various domains of apps reveals that the largest fraction of apps collecting personal data belong to the domain of Messaging and Social Media. Our findings show that location is collected frequently by apps, specially from the E-commerce and Shopping domain. However, it is often under-reported in app data safety sections. Our results highlight the need for greater consistency in privacy-aware app development and reporting practices.

许多Android应用程序收集用户数据,欧盟的通用数据保护条例(GDPR)要求明确披露此类数据收集。然而,应用程序经常使用第三方代码,使准确的披露变得复杂。本文调查了当前的Android应用程序如何准确地满足这些需求。在这项工作中,我们提出了隐私相关数据的多层定义,以正确报告Android应用程序中的数据收集。我们进一步创建了一个隐私敏感数据类的数据集,可以用作Android应用程序的输入。该数据集考虑了通过用户界面和系统api收集的数据。基于此,我们实现了一个半自动化的原型,可以检测和标记给定Android应用程序收集的与隐私相关的数据。我们手动检查70个Android应用程序的数据安全部分,以观察数据收集的报告方式,识别多报和少报的实例。我们将原型的结果与20款应用的数据安全部分进行比较,发现报告的差异。使用两个消息传递和社交媒体应用程序(Signal和Instagram)的结果,我们讨论了应用程序开发人员如何分别少报和多报数据收集,并确定不准确报告的数据类别。一项针对7500个安卓应用的更广泛研究表明,应用最经常收集可以部分识别用户身份的数据。尽管系统api始终收集大量与隐私相关的数据,但用户界面显示出一些更多样化的数据收集模式。一项针对不同应用领域的更集中的研究表明,收集个人数据的应用中,最大比例的应用属于消息和社交媒体领域。我们的研究结果表明,应用程序经常收集位置信息,尤其是来自电子商务和购物领域的位置信息。然而,在应用数据安全部分,这一点经常被低估。我们的研究结果强调了在具有隐私意识的应用程序开发和报告实践中需要更大的一致性。
{"title":"A study of privacy-related data collected by Android apps","authors":"Mugdha Khedkar,&nbsp;Ambuj Kumar Mondal,&nbsp;Eric Bodden","doi":"10.1007/s10515-025-00589-3","DOIUrl":"10.1007/s10515-025-00589-3","url":null,"abstract":"<div><p>Many Android apps collect data from users, and the European Union’s General Data Protection Regulation (GDPR) mandates clear disclosures of such data collection. However, apps often use third-party code, complicating accurate disclosures. This paper investigates how accurately current Android apps fulfill these requirements. In this work, we present a multi-layered definition of privacy-related data to correctly report data collection in Android apps. We further create a dataset of privacy-sensitive data classes that may be used as input by an Android app. This dataset takes into account data collected both through the user interface and system APIs. Based on this, we implement a semi-automated prototype that detects and labels privacy-related data collected by a given Android app. We manually examine the data safety sections of 70 Android apps to observe how data collection is reported, identifying instances of over- and under-reporting. We compare our prototype’s results with the data safety sections of 20 apps revealing reporting discrepancies. Using the results from two Messaging and Social Media apps (Signal and Instagram), we discuss how app developers under-report and over-report data collection, respectively, and identify inaccurately reported data categories. A broader study of 7,500 Android apps reveals that apps most frequently collect data that can <i>partially identify</i> users. Although system APIs consistently collect large amounts of privacy-related data, user interfaces exhibit some more diverse data collection patterns. A more focused study on various domains of apps reveals that the largest fraction of apps collecting personal data belong to the domain of <i>Messaging and Social Media</i>. Our findings show that location is collected frequently by apps, specially from the <i>E-commerce and Shopping</i> domain. However, it is often under-reported in app data safety sections. Our results highlight the need for greater consistency in privacy-aware app development and reporting practices.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00589-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated summarization of software documents: an LLM-based multi-agent approach 软件文档的自动摘要:基于法学硕士的多代理方法
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-09 DOI: 10.1007/s10515-025-00588-4
Duc S. H. Nguyen, Minh T. Nguyen, Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio

Large Language Models (LLMs) and LLM-based Multi-Agent Systems (MAS) are revolutionizing software engineering (SE) by advancing automation, decision-making, and knowledge processing. Their recent application to SE tasks has already shown promising results. In this paper, we focus on summarization as a key application area. We present Metagente , an LLM-based MAS designed to generate concise and accurate summaries of software documentation. Metagente employs a Teacher–Student architecture where multiple LLM agents collaborate to enhance relevance and precision of produced summaries. An empirical evaluation on real-world datasets demonstrates Metagente ’s effectiveness in streamlining workflows, outperforming the considered baselines. The evaluation provides evidence that Metagente improves summarization for requirement analysis and technical documentation. Moreover, we also demonstrate that compared to a set of single, independent LLMs, the multi-agent architecture is meaningful and beneficial to the summarization of software documents. Our findings underscore the transformative potential of these technologies in SE, while identifying challenges and future research directions for their seamless integration.

大型语言模型(llm)和基于llm的多代理系统(MAS)通过推进自动化、决策和知识处理,正在彻底改变软件工程(SE)。它们最近在SE任务中的应用已经显示出可喜的结果。在本文中,我们重点关注摘要作为一个关键的应用领域。我们介绍Metagente,一个基于llm的MAS,旨在生成简洁准确的软件文档摘要。Metagente采用教师-学生架构,其中多个LLM代理协作以提高生成摘要的相关性和准确性。对真实世界数据集的实证评估证明了Metagente在简化工作流程方面的有效性,优于所考虑的基线。评估提供了Metagente改进需求分析和技术文档总结的证据。此外,我们还证明了与一组单一的、独立的llm相比,多智能体体系结构对软件文档的总结是有意义的,并且有利于软件文档的总结。我们的研究结果强调了这些技术在SE中的变革潜力,同时确定了它们无缝集成的挑战和未来的研究方向。
{"title":"Automated summarization of software documents: an LLM-based multi-agent approach","authors":"Duc S. H. Nguyen,&nbsp;Minh T. Nguyen,&nbsp;Phuong T. Nguyen,&nbsp;Juri Di Rocco,&nbsp;Davide Di Ruscio","doi":"10.1007/s10515-025-00588-4","DOIUrl":"10.1007/s10515-025-00588-4","url":null,"abstract":"<div><p>Large Language Models (LLMs) and LLM-based Multi-Agent Systems (MAS) are revolutionizing software engineering (SE) by advancing automation, decision-making, and knowledge processing. Their recent application to SE tasks has already shown promising results. In this paper, we focus on summarization as a key application area. We present <span>Metagente</span> , an LLM-based MAS designed to generate concise and accurate summaries of software documentation. <span>Metagente</span> employs a Teacher–Student architecture where multiple LLM agents collaborate to enhance relevance and precision of produced summaries. An empirical evaluation on real-world datasets demonstrates <span>Metagente</span> ’s effectiveness in streamlining workflows, outperforming the considered baselines. The evaluation provides evidence that <span>Metagente</span> improves summarization for requirement analysis and technical documentation. Moreover, we also demonstrate that compared to a set of single, independent LLMs, the multi-agent architecture is meaningful and beneficial to the summarization of software documents. Our findings underscore the transformative potential of these technologies in SE, while identifying challenges and future research directions for their seamless integration.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145930709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QuaRUM: qualitative data analysis-based retrieval-augmented UML domain model from requirements documents QuaRUM:从需求文档中基于定性数据分析的检索增强UML领域模型
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-09 DOI: 10.1007/s10515-025-00587-5
Syed Tauhid Ullah Shah, Mohamad Hussein, Ann Barcomb, Mohammad Moshirpour

Effective Requirements Engineering (RE) is essential for building successful software systems, yet analyzing unstructured stakeholder input remains a persistent challenge. Qualitative Data Analysis (QDA) provides structured methods, open coding (entity extraction), axial coding (relationship discovery), and selective coding (model refinement) to transform natural language requirements into domain models. While manual QDA has proven effective for requirements analysis, it remains time-consuming, repetitive, and difficult to scale. Although individual RE tasks have been automated, no prior work has automated the complete QDA methodology for domain modeling. In this paper, we present QuaRUM, the first framework to automate end-to-end QDA for UML domain model generation by combining large language models with retrieval-augmented generation. QuaRUM processes requirements through document ingestion, semantic indexing, and retrieval-augmented coding, and helps ground each model element in the source text to mitigate hallucination risks. Empirical results show that QuaRUM performs with high accuracy across three domains. It achieves F1-scores between 0.85 and 0.98. Cohen’s (kappa) reaches up to 0.92, surpassing human inter-coder agreement. Notably, QuaRUM recovers 37 valid attributes and 23 relationships initially missed by human analysts. A cost-benefit analysis shows a 218% Return on Investment (ROI) for initial use, increasing to 1,131% in repeated deployments, demonstrating strong economic scalability.

有效的需求工程(RE)对于构建成功的软件系统是必不可少的,然而分析非结构化的涉众输入仍然是一个持续的挑战。定性数据分析(QDA)提供结构化方法、开放编码(实体提取)、轴向编码(关系发现)和选择性编码(模型精化)来将自然语言需求转换为领域模型。虽然手工QDA已被证明对需求分析是有效的,但它仍然是耗时的、重复的,并且难以扩展。尽管单个的RE任务已经自动化了,但是之前的工作还没有将用于领域建模的完整的QDA方法自动化。在本文中,我们提出了QuaRUM,这是第一个通过将大型语言模型与检索增强生成相结合来自动生成UML领域模型的端到端QDA的框架。QuaRUM通过文档摄取、语义索引和检索增强编码来处理需求,并帮助建立源文本中的每个模型元素,以减轻幻觉风险。实验结果表明,QuaRUM在三个领域具有较高的精度。在0.85 - 0.98之间达到f1。科恩的(kappa)达到了0.92,超过了人类的编码间协议。值得注意的是,QuaRUM恢复了人类分析师最初错过的37个有效属性和23个关系。成本效益分析显示为218% Return on Investment (ROI) for initial use, increasing to 1,131% in repeated deployments, demonstrating strong economic scalability.
{"title":"QuaRUM: qualitative data analysis-based retrieval-augmented UML domain model from requirements documents","authors":"Syed Tauhid Ullah Shah,&nbsp;Mohamad Hussein,&nbsp;Ann Barcomb,&nbsp;Mohammad Moshirpour","doi":"10.1007/s10515-025-00587-5","DOIUrl":"10.1007/s10515-025-00587-5","url":null,"abstract":"<div><p>Effective Requirements Engineering (RE) is essential for building successful software systems, yet analyzing unstructured stakeholder input remains a persistent challenge. Qualitative Data Analysis (QDA) provides structured methods, open coding (entity extraction), axial coding (relationship discovery), and selective coding (model refinement) to transform natural language requirements into domain models. While manual QDA has proven effective for requirements analysis, it remains time-consuming, repetitive, and difficult to scale. Although individual RE tasks have been automated, no prior work has automated the complete QDA methodology for domain modeling. In this paper, we present QuaRUM, the first framework to automate end-to-end QDA for UML domain model generation by combining large language models with retrieval-augmented generation. QuaRUM processes requirements through document ingestion, semantic indexing, and retrieval-augmented coding, and helps ground each model element in the source text to mitigate hallucination risks. Empirical results show that QuaRUM performs with high accuracy across three domains. It achieves F1-scores between 0.85 and 0.98. Cohen’s <span>(kappa)</span> reaches up to 0.92, surpassing human inter-coder agreement. Notably, QuaRUM recovers 37 valid attributes and 23 relationships initially missed by human analysts. A cost-benefit analysis shows a 218% Return on Investment (ROI) for initial use, increasing to 1,131% in repeated deployments, demonstrating strong economic scalability.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145930710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling hidden permissions: an LLM framework for detecting privacy and security concerns in AI mobile apps reviews 揭开隐藏的权限:用于检测AI移动应用程序审查中的隐私和安全问题的LLM框架
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-09 DOI: 10.1007/s10515-025-00567-9
Rhodes Massenon, Ishaya Gambo, Javed Ali Khan

Mobile AI applications enhance functionality but introduce complex privacy and security challenges. This research develops and evaluates an automated framework that leverages Large Language Models (LLMs) to analyze user reviews and unveil “hidden permissions” defined not as technically undeclared functionalities, but as declared permissions whose purpose or necessity is opaque to users, leading to perceived privacy risks. The framework integrates static analysis of permission manifests with a hybrid Natural Language Processing (NLP) pipeline that combines Term Frequency-Inverse Document Frequency (TF-IDF) with BERT embeddings. A fine-tuned RoBERTa model then classifies user-reported concerns into predefined risk categories. We correlate these user-reported behaviors with declared permissions to identify potential mismatches and prioritize them using a risk-scoring methodology validated against the MITRE Common Weakness Enumeration (CWE) database. In an evaluation against other LLM architectures (GPT-3.5, DistilBERT, XLNet, and LLaMA-2), our fine-tuned RoBERTa model demonstrates superior performance, achieving an F1-score of 0.90 in classifying reviews related to unauthorized tracking. The framework effectively surfaces and prioritizes user-perceived privacy risks, offering actionable insights for developers to address mismatches between an app’s declared permissions and its user-experienced behavior, thereby fostering a more secure and trustworthy AI mobile ecosystem.

移动人工智能应用增强了功能,但也带来了复杂的隐私和安全挑战。本研究开发并评估了一个自动化框架,该框架利用大型语言模型(llm)来分析用户评论并揭示“隐藏权限”,其定义不是技术上未声明的功能,而是作为目的或必要性对用户不透明的声明权限,从而导致可感知的隐私风险。该框架将许可清单的静态分析与混合自然语言处理(NLP)管道集成在一起,该管道将术语频率-逆文档频率(TF-IDF)与BERT嵌入相结合。然后,经过微调的RoBERTa模型将用户报告的关注点分类到预定义的风险类别中。我们将这些用户报告的行为与声明的权限关联起来,以识别潜在的不匹配,并使用针对MITRE常见弱点枚举(CWE)数据库验证的风险评分方法对它们进行优先级排序。在对其他LLM体系结构(GPT-3.5、蒸馏器、XLNet和LLaMA-2)的评估中,我们经过微调的RoBERTa模型表现出了卓越的性能,在对与未经授权跟踪相关的评论进行分类时获得了f1分0.90。该框架有效地揭示并优先考虑用户感知的隐私风险,为开发人员提供可操作的见解,以解决应用程序声明的权限与其用户体验行为之间的不匹配,从而培养更安全、更值得信赖的人工智能移动生态系统。
{"title":"Unveiling hidden permissions: an LLM framework for detecting privacy and security concerns in AI mobile apps reviews","authors":"Rhodes Massenon,&nbsp;Ishaya Gambo,&nbsp;Javed Ali Khan","doi":"10.1007/s10515-025-00567-9","DOIUrl":"10.1007/s10515-025-00567-9","url":null,"abstract":"<div><p>Mobile AI applications enhance functionality but introduce complex privacy and security challenges. This research develops and evaluates an automated framework that leverages Large Language Models (LLMs) to analyze user reviews and unveil “hidden permissions” defined not as technically undeclared functionalities, but as declared permissions whose purpose or necessity is opaque to users, leading to perceived privacy risks. The framework integrates static analysis of permission manifests with a hybrid Natural Language Processing (NLP) pipeline that combines Term Frequency-Inverse Document Frequency (TF-IDF) with BERT embeddings. A fine-tuned RoBERTa model then classifies user-reported concerns into predefined risk categories. We correlate these user-reported behaviors with declared permissions to identify potential mismatches and prioritize them using a risk-scoring methodology validated against the MITRE Common Weakness Enumeration (CWE) database. In an evaluation against other LLM architectures (GPT-3.5, DistilBERT, XLNet, and LLaMA-2), our fine-tuned RoBERTa model demonstrates superior performance, achieving an F1-score of 0.90 in classifying reviews related to unauthorized tracking. The framework effectively surfaces and prioritizes user-perceived privacy risks, offering actionable insights for developers to address mismatches between an app’s declared permissions and its user-experienced behavior, thereby fostering a more secure and trustworthy AI mobile ecosystem.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145930737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interactive and AI-enhanced framework for semi-automatically generating iStar goal models 用于半自动生成iStar目标模型的交互式和人工智能增强框架
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-05 DOI: 10.1007/s10515-025-00586-6
Tong Li, Qixiang Zhou, Fangqi Dong, Tianai Zhang, Yunduo Wang

For decades, goal modeling has been significant in the early stages of requirements engineering. Numerous studies have demonstrated the effectiveness and practicality of utilizing requirement goal models. However, creating goal models is usually done manually, which is a challenging task for large-scale systems. Our research objective is to simplify and optimize the creation process of iStar goal models with a semi-automatic framework, assisting in the practical adoption of goal modeling approaches. In this paper, we significantly extend our previous work to better deal with the above problem. Specifically, we utilize the Design Science Research paradigm to propose an interactive and iterative modeling process that combines human decision-making steps with automated analysis steps, minimizing modeling costs while ensuring the quality of the models. This research is based on conducting interviews on the practicality of iStar modeling and conducting a literature review on the automation of iStar modeling for goal modeling. To fulfill the modeling process, we propose a novel hybrid approach that features highly customizable logical reasoning rules and deep learning techniques, allowing for tailored selection and design according to specific needs. To pragmatically promote our approach, we develop a prototype tool with a user-friendly interface. In this paper, we select BERT as the deep learning component and design a series of rules based on BERT as an example. Then, an evaluation is conducted using this specific implementation of this approach. Besides, we conduct a case study on a real-life scenario to evaluate the effectiveness of our modeling approach, showing the efficiency of the modeling process. These results indicate that our proposal efficiently establishes high-quality goal models and thus pragmatically contributes to adopting goal model analysis approaches.

几十年来,目标建模在需求工程的早期阶段一直很重要。大量的研究已经证明了利用需求目标模型的有效性和实用性。然而,创建目标模型通常是手动完成的,这对于大型系统来说是一项具有挑战性的任务。我们的研究目标是在半自动框架下简化和优化iStar目标模型的创建过程,协助目标建模方法的实际采用。在本文中,我们大大扩展了之前的工作,以更好地处理上述问题。具体来说,我们利用设计科学研究范式提出了一个交互式和迭代的建模过程,将人工决策步骤与自动分析步骤相结合,在确保模型质量的同时最大限度地降低建模成本。本研究的基础是对iStar建模的实用性进行访谈,并对iStar建模对目标建模的自动化进行文献综述。为了完成建模过程,我们提出了一种新颖的混合方法,该方法具有高度可定制的逻辑推理规则和深度学习技术,允许根据特定需求进行定制选择和设计。为了实用地推广我们的方法,我们开发了一个具有用户友好界面的原型工具。本文选择BERT作为深度学习组件,并以BERT为例设计了一系列规则。然后,使用该方法的具体实现进行评估。此外,我们对一个现实场景进行了案例研究,以评估我们的建模方法的有效性,展示了建模过程的效率。这些结果表明我们的建议有效地建立了高质量的目标模型,从而为目标模型分析方法的采用做出了实用的贡献。
{"title":"An interactive and AI-enhanced framework for semi-automatically generating iStar goal models","authors":"Tong Li,&nbsp;Qixiang Zhou,&nbsp;Fangqi Dong,&nbsp;Tianai Zhang,&nbsp;Yunduo Wang","doi":"10.1007/s10515-025-00586-6","DOIUrl":"10.1007/s10515-025-00586-6","url":null,"abstract":"<div><p>For decades, goal modeling has been significant in the early stages of requirements engineering. Numerous studies have demonstrated the effectiveness and practicality of utilizing requirement goal models. However, creating goal models is usually done manually, which is a challenging task for large-scale systems. Our research objective is to simplify and optimize the creation process of iStar goal models with a semi-automatic framework, assisting in the practical adoption of goal modeling approaches. In this paper, we significantly extend our previous work to better deal with the above problem. Specifically, we utilize the Design Science Research paradigm to propose an interactive and iterative modeling process that combines human decision-making steps with automated analysis steps, minimizing modeling costs while ensuring the quality of the models. This research is based on conducting interviews on the practicality of iStar modeling and conducting a literature review on the automation of iStar modeling for goal modeling. To fulfill the modeling process, we propose a novel hybrid approach that features highly customizable logical reasoning rules and deep learning techniques, allowing for tailored selection and design according to specific needs. To pragmatically promote our approach, we develop a prototype tool with a user-friendly interface. In this paper, we select BERT as the deep learning component and design a series of rules based on BERT as an example. Then, an evaluation is conducted using this specific implementation of this approach. Besides, we conduct a case study on a real-life scenario to evaluate the effectiveness of our modeling approach, showing the efficiency of the modeling process. These results indicate that our proposal efficiently establishes high-quality goal models and thus pragmatically contributes to adopting goal model analysis approaches.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145930075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing static analyses for improved semantic conflict detection 比较静态分析改进语义冲突检测
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-23 DOI: 10.1007/s10515-025-00580-y
Galileu Santos de Jesus, Paulo Borba, Rodrigo Bonifácio, Matheus Barbosa de Oliveira

Version control systems are essential in software development, allowing teams to collaborate simultaneously without interfering with each other’s work. Tools like Git facilitate code integration through merge operations, which automatically detect textual conflicts. However, these systems focus solely on source code differences, overlooking more complex semantic conflicts that can lead to failures or unexpected behavior after integration. To address this challenge, static analysis emerges as an effective solution, detecting semantic conflicts that traditional merge tools might miss, providing an additional layer of security and quality to the code integration process. In this study, we explore combinations of static analysis techniques to improve the detection of semantic conflicts. Our approach was evaluated on a dataset from 32 real-world GitHub projects, all manually labeled to include ground truth information. These outcomes highlight the adaptability of our approach: in contexts where minimizing false positives is essential, high-precision techniques can be prioritized; in contrast, recall-focused techniques are preferable for broader conflict coverage. The results show that combining static analysis strategies delivers superior performance in terms of precision, recall, F1 score, and accuracy compared to previous methods, and is a more lightweight and flexible approach to adapt to the application context.

版本控制系统在软件开发中是必不可少的,它允许团队同时协作而不干扰彼此的工作。像Git这样的工具通过合并操作来促进代码集成,这些操作可以自动检测文本冲突。然而,这些系统只关注源代码差异,忽略了更复杂的语义冲突,这些冲突可能导致集成后的失败或意外行为。为了应对这一挑战,静态分析作为一种有效的解决方案出现了,它检测传统合并工具可能遗漏的语义冲突,为代码集成过程提供了额外的安全和质量层。在这项研究中,我们探索了静态分析技术的组合来改进语义冲突的检测。我们的方法在来自32个真实世界GitHub项目的数据集上进行了评估,所有项目都手动标记以包含地面真实信息。这些结果突出了我们方法的适应性:在最小化误报至关重要的情况下,可以优先考虑高精度技术;相比之下,以回忆为中心的技术更适合于更广泛的冲突报道。结果表明,与以前的方法相比,结合静态分析策略在精密度、召回率、F1分数和准确性方面具有更好的性能,并且是一种更轻量级、更灵活的适应应用环境的方法。
{"title":"Comparing static analyses for improved semantic conflict detection","authors":"Galileu Santos de Jesus,&nbsp;Paulo Borba,&nbsp;Rodrigo Bonifácio,&nbsp;Matheus Barbosa de Oliveira","doi":"10.1007/s10515-025-00580-y","DOIUrl":"10.1007/s10515-025-00580-y","url":null,"abstract":"<div>\u0000 \u0000 <p>Version control systems are essential in software development, allowing teams to collaborate simultaneously without interfering with each other’s work. Tools like Git facilitate code integration through merge operations, which automatically detect textual conflicts. However, these systems focus solely on source code differences, overlooking more complex semantic conflicts that can lead to failures or unexpected behavior after integration. To address this challenge, static analysis emerges as an effective solution, detecting semantic conflicts that traditional merge tools might miss, providing an additional layer of security and quality to the code integration process. In this study, we explore combinations of static analysis techniques to improve the detection of semantic conflicts. Our approach was evaluated on a dataset from 32 real-world GitHub projects, all manually labeled to include ground truth information. These outcomes highlight the adaptability of our approach: in contexts where minimizing false positives is essential, high-precision techniques can be prioritized; in contrast, recall-focused techniques are preferable for broader conflict coverage. The results show that combining static analysis strategies delivers superior performance in terms of precision, recall, F1 score, and accuracy compared to previous methods, and is a more lightweight and flexible approach to adapt to the application context.</p>\u0000 </div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145802393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPVR: syntax-to-prompt vulnerability repair based on large language models SPVR:基于大型语言模型的语法到提示漏洞修复
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-18 DOI: 10.1007/s10515-025-00579-5
Ruoke Wang, Zongjie Li, Cuiyun Gao, Chaozheng Wang, Yang Xiao, Xuan Wang

Purpose: In the field of vulnerability repair, previous research has leveraged pre-trained models and LLM-based prompt engineering, among which LLM-based approaches show better generalizability and achieve the best performance. However, the LLM-based approaches generally regard vulnerability repair as a sequence-to-sequence task, and do not explicitly capture the syntax patterns for different vulnerability types, leading to limited accuracy. We aim to create a method that ensures the specificity of prompts targeting vulnerable code while also leveraging the generative capabilities of Large Language Models. Methods: We propose SPVR (Syntax-to-Prompt Vulnerability Repair), a novel framework that collects information from syntax trees, and generates corresponding prompts. Our method consists of three steps: rule design, prompt generation, and patch generation. In the rule design step, our method parses code patches and designs rules to extract relevant contextual information. These rules aid in identifying vulnerability-related issues. In the prompt generation step, our method extracts information from vulnerable code with pre-defined rules, automatically converting them into prompts. We also incorporate the description of CWE (Common Weakness Enumeration) as known information into the prompts. Finally, in the patch generation step, this prompt will serve as input to any conversational LLM to obtain code patches. Results: Extensive experiments validate that our method achieves excellent results in assisting LLMs to fix vulnerabilities accurately. We utilize multiple Large Language Models to validate the effectiveness of our work, repairing 143 of 547 vulnerable code using ChatGPT-4. We conducted a comparison of our approach against several existing vulnerability repair approaches (including fine-tuning-based and prompt-based), across multiple metrics. Conclusion: Our method is a novel framework that combines the Abstract Syntax Tree structure of code, providing targeted prompts of repair code for vulnerabilities. Our method demonstrates promising potential for real-world code vulnerability repair.

目的:在漏洞修复领域,以往的研究利用了预训练模型和基于llm的提示工程,其中基于llm的方法具有更好的泛化能力和最佳的性能。然而,基于llm的方法通常将漏洞修复视为序列到序列的任务,并且没有显式地捕获不同漏洞类型的语法模式,导致准确性有限。我们的目标是创建一种方法,确保针对脆弱代码的提示的特殊性,同时也利用大型语言模型的生成能力。方法:提出SPVR (syntaxto - prompt Vulnerability Repair)框架,从语法树中收集信息并生成相应的提示符。我们的方法包括三个步骤:规则设计、提示生成和补丁生成。在规则设计步骤中,我们的方法解析代码补丁并设计规则以提取相关的上下文信息。这些规则有助于识别与漏洞相关的问题。在提示符生成步骤中,我们的方法根据预定义的规则从脆弱代码中提取信息,并自动将其转换为提示符。我们还将CWE (Common Weakness Enumeration)的描述作为已知信息合并到提示符中。最后,在补丁生成步骤中,该提示符将作为任何会话LLM获取代码补丁的输入。结果:大量的实验验证了我们的方法在帮助法学硕士准确修复漏洞方面取得了很好的效果。我们利用多个大型语言模型来验证我们工作的有效性,使用ChatGPT-4修复了547个易受攻击代码中的143个。我们将我们的方法与几种现有的漏洞修复方法(包括基于微调和基于提示的)进行了比较。结论:我们的方法是一个新颖的框架,结合代码的抽象语法树结构,提供有针对性的漏洞修复代码提示。我们的方法展示了修复现实世界代码漏洞的潜力。
{"title":"SPVR: syntax-to-prompt vulnerability repair based on large language models","authors":"Ruoke Wang,&nbsp;Zongjie Li,&nbsp;Cuiyun Gao,&nbsp;Chaozheng Wang,&nbsp;Yang Xiao,&nbsp;Xuan Wang","doi":"10.1007/s10515-025-00579-5","DOIUrl":"10.1007/s10515-025-00579-5","url":null,"abstract":"<div><p>Purpose: In the field of vulnerability repair, previous research has leveraged pre-trained models and LLM-based prompt engineering, among which LLM-based approaches show better generalizability and achieve the best performance. However, the LLM-based approaches generally regard vulnerability repair as a sequence-to-sequence task, and do not explicitly capture the syntax patterns for different vulnerability types, leading to limited accuracy. We aim to create a method that ensures the specificity of prompts targeting vulnerable code while also leveraging the generative capabilities of Large Language Models. Methods: We propose SPVR (Syntax-to-Prompt Vulnerability Repair), a novel framework that collects information from syntax trees, and generates corresponding prompts. Our method consists of three steps: rule design, prompt generation, and patch generation. In the rule design step, our method parses code patches and designs rules to extract relevant contextual information. These rules aid in identifying vulnerability-related issues. In the prompt generation step, our method extracts information from vulnerable code with pre-defined rules, automatically converting them into prompts. We also incorporate the description of CWE (Common Weakness Enumeration) as known information into the prompts. Finally, in the patch generation step, this prompt will serve as input to any conversational LLM to obtain code patches. Results: Extensive experiments validate that our method achieves excellent results in assisting LLMs to fix vulnerabilities accurately. We utilize multiple Large Language Models to validate the effectiveness of our work, repairing 143 of 547 vulnerable code using ChatGPT-4. We conducted a comparison of our approach against several existing vulnerability repair approaches (including fine-tuning-based and prompt-based), across multiple metrics. Conclusion: Our method is a novel framework that combines the Abstract Syntax Tree structure of code, providing targeted prompts of repair code for vulnerabilities. Our method demonstrates promising potential for real-world code vulnerability repair.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Automated Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1