首页 > 最新文献

Automated Software Engineering最新文献

英文 中文
Causes and effects of fitness landscapes in system test generation: a replication study 系统测试生成中适应度景观的因果关系:一项复制研究
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-18 DOI: 10.1007/s10515-025-00539-z
Omur Sahin, Man Zhang, Andrea Arcuri

Search-Based Software Testing (SBST) has seen several success stories in academia and industry. The effectiveness of a search algorithm at solving a software engineering problem strongly depends on how such algorithm can navigate the fitness landscape of the addressed problem. The fitness landscape depends on the used fitness function. Understanding the properties of a fitness landscape can help to provide insight on how a search algorithm behaves on it. Such insight can provide valuable information to researchers to being able to design novel, more effective search algorithms and fitness functions tailored for a specific problem. Due to its importance, few fitness landscape analyses have been carried out in the scientific literature of SBST. However, those have been focusing on the problem of unit test generation, e.g., with state-of-the-art tools such as EvoSuite. In this paper, we replicate one such existing study. However, in our work we focus on system test generation, with the state-of-the-art tool EvoMaster. Based on an empirical study involving the testing of 23 web services, this enables us to provide valuable insight into this important testing domain of practical industrial relevance. Our results indicate that fitness landscapes are largely dominated by neutral regions (e.g., plateaus), which make the search process challenging. We observe that the presence of information content in the landscape can improve search guidance, while boolean flags are a primary contributor to neutrality. These findings confirm prior results in unit testing but also reveal system-level differences, particularly in how branch types impact search effectiveness. These insights suggest the need for improved fitness functions, testability transformations, and search operators tailored to system-level testing.

基于搜索的软件测试(SBST)在学术界和工业界都有一些成功的案例。在解决软件工程问题时,搜索算法的有效性很大程度上取决于该算法如何导航所处理问题的适应度景观。适应度景观取决于所使用的适应度函数。了解健身景观的属性有助于深入了解搜索算法在其中的行为。这样的见解可以为研究人员提供有价值的信息,使他们能够针对特定问题设计新颖、更有效的搜索算法和适合度函数。由于其重要性,在科学文献中很少对SBST的适应度景观进行分析。然而,这些都集中在单元测试生成的问题上,例如,使用最先进的工具,如EvoSuite。在本文中,我们复制了一个这样的现有研究。然而,在我们的工作中,我们专注于系统测试生成,使用最先进的工具EvoMaster。基于一项涉及23个web服务测试的实证研究,这使我们能够对这个与实际工业相关的重要测试领域提供有价值的见解。我们的研究结果表明,适应度景观在很大程度上由中性区域(如高原)主导,这使得搜索过程具有挑战性。我们观察到,景观中信息内容的存在可以改善搜索指导,而布尔标志是中立性的主要贡献者。这些发现证实了先前在单元测试中的结果,但也揭示了系统级别的差异,特别是分支类型如何影响搜索效率。这些见解表明需要改进适应度函数、可测试性转换和针对系统级测试量身定制的搜索操作符。
{"title":"Causes and effects of fitness landscapes in system test generation: a replication study","authors":"Omur Sahin,&nbsp;Man Zhang,&nbsp;Andrea Arcuri","doi":"10.1007/s10515-025-00539-z","DOIUrl":"10.1007/s10515-025-00539-z","url":null,"abstract":"<div><p>Search-Based Software Testing (SBST) has seen several success stories in academia and industry. The effectiveness of a search algorithm at solving a software engineering problem strongly depends on how such algorithm can navigate the <i>fitness landscape</i> of the addressed problem. The fitness landscape depends on the used fitness function. Understanding the properties of a fitness landscape can help to provide insight on how a search algorithm behaves on it. Such insight can provide valuable information to researchers to being able to design novel, more effective search algorithms and fitness functions tailored for a specific problem. Due to its importance, few fitness landscape analyses have been carried out in the scientific literature of SBST. However, those have been focusing on the problem of <i>unit test</i> generation, e.g., with state-of-the-art tools such as EvoSuite. In this paper, we <i>replicate</i> one such existing study. However, in our work we focus on <i>system test</i> generation, with the state-of-the-art tool <span>EvoMaster</span>. Based on an empirical study involving the testing of 23 web services, this enables us to provide valuable insight into this important testing domain of practical industrial relevance. Our results indicate that fitness landscapes are largely dominated by neutral regions (e.g., plateaus), which make the search process challenging. We observe that the presence of information content in the landscape can improve search guidance, while boolean flags are a primary contributor to neutrality. These findings confirm prior results in unit testing but also reveal system-level differences, particularly in how branch types impact search effectiveness. These insights suggest the need for improved fitness functions, testability transformations, and search operators tailored to system-level testing.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00539-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing large language models for virtual reality exploration testing: a case study 利用大型语言模型进行虚拟现实探索测试:一个案例研究
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-18 DOI: 10.1007/s10515-025-00535-3
Zhenyu Qi, Haotang Li, Hao Qin, Kebin Peng, Sen He, Xue Qin

As the Virtual Reality (VR) industry expands, the need for automated GUI testing is growing rapidly. Large Language Models (LLMs), capable of retaining information long-term and analyzing both visual and textual data, are emerging as a potential key to deciphering the complexities of VR’s evolving user interfaces. In this paper, we conduct a case study to investigate the capability of using LLMs, particularly GPT-4o, for field of view (FOV) analysis in VR exploration testing. Specifically, we validate that LLMs can identify test entities in FOVs and that prompt engineering can effectively enhance the accuracy of test entity identification from (varvec{41.67%}) to (varvec{71.30%}). Our study also shows that LLMs can accurately describe identified entities’ features with at least a (varvec{90%}) accuracy rate. We further find out that the core features that effectively represent an entity are color, placement, and shape. Furthermore, the combination of the three features can especially be used to improve the accuracy of determining identical entities in multiple FOVs with the highest F1-score of (varvec{0.70}). Additionally, our study demonstrates that LLMs are capable of scene recognition and spatial understanding in VR with precisely designed structured prompts. Finally, we find that LLMs fail to label the identified test entities, and we discuss potential solutions as future research directions.

随着虚拟现实(VR)行业的发展,对自动化GUI测试的需求正在迅速增长。大型语言模型(llm)能够长期保存信息并分析视觉和文本数据,正在成为破解VR不断发展的用户界面复杂性的潜在关键。在本文中,我们进行了一个案例研究,以调查使用llm,特别是gpt - 40,在VR勘探测试中的视场(FOV)分析能力。具体来说,我们验证了llm可以识别fov中的测试实体,并且提示工程可以有效地提高从(varvec{41.67%})到(varvec{71.30%})的测试实体识别的准确性。我们的研究还表明,llm可以准确地描述识别实体的特征,准确率至少达到(varvec{90%})。我们进一步发现,有效表示实体的核心特征是颜色、位置和形状。此外,这三个特征的结合尤其可以提高在f1得分最高的(varvec{0.70})的多个fov中确定相同实体的准确性。此外,我们的研究表明,llm能够通过精确设计的结构化提示在VR中进行场景识别和空间理解。最后,我们发现llm未能对识别的测试实体进行标记,并讨论了潜在的解决方案作为未来的研究方向。
{"title":"Harnessing large language models for virtual reality exploration testing: a case study","authors":"Zhenyu Qi,&nbsp;Haotang Li,&nbsp;Hao Qin,&nbsp;Kebin Peng,&nbsp;Sen He,&nbsp;Xue Qin","doi":"10.1007/s10515-025-00535-3","DOIUrl":"10.1007/s10515-025-00535-3","url":null,"abstract":"<div><p>As the Virtual Reality (VR) industry expands, the need for automated GUI testing is growing rapidly. Large Language Models (LLMs), capable of retaining information long-term and analyzing both visual and textual data, are emerging as a potential key to deciphering the complexities of VR’s evolving user interfaces. In this paper, we conduct a case study to investigate the capability of using LLMs, particularly GPT-4o, for field of view (FOV) analysis in VR exploration testing. Specifically, we validate that LLMs can identify test entities in FOVs and that prompt engineering can effectively enhance the accuracy of test entity identification from <span>(varvec{41.67%})</span> to <span>(varvec{71.30%})</span>. Our study also shows that LLMs can accurately describe identified entities’ features with at least a <span>(varvec{90%})</span> accuracy rate. We further find out that the core features that effectively represent an entity are color, placement, and shape. Furthermore, the combination of the three features can especially be used to improve the accuracy of determining identical entities in multiple FOVs with the highest F1-score of <span>(varvec{0.70})</span>. Additionally, our study demonstrates that LLMs are capable of scene recognition and spatial understanding in VR with precisely designed structured prompts. Finally, we find that LLMs fail to label the identified test entities, and we discuss potential solutions as future research directions.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00535-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intelligent test case generation method for fuzzing IoT protocols based on LLM 基于LLM的物联网协议模糊测试用例智能生成方法
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-18 DOI: 10.1007/s10515-025-00557-x
Ming Zhong, Zisheng Zeng, Yijia Guo, Dandan Zhao, Bo Zhang, Shenghong Li, Hao Peng, Zhiguo Ding

The Internet of Things (IoT) protocols are a core element of IoT systems, providing the fundamental support for communication and data exchange between devices. These protocols enable various devices to connect and work together. However, potential errors and vulnerabilities in IoT protocol implementations can make devices easily attacked. Therefore, ensuring the security of IoT protocols is of utmost importance. Common vulnerability detection methods, such as fuzzing, encounter significant challenges in evaluating these implementations, mainly due to the need for extensive protocol knowledge, high time and resource consumption, as well as the difficulty of generating high-quality and targeted test cases. In order to solve the above issues, this paper presents an intelligent fuzzer, LIPFuzzer, for testing IoT protocols. Unlike common methods that heavily rely on the user’s understanding of the protocol to generate test cases, LIPFuzzer, with the assistance of Large Language Models (LLMs), mutates real IoT protocol communication messages to automatically generate more targeted test cases. Specifically, it utilizes LLMs to understand the relative knowledge of protocols, analyze different categories of protocol messages, and identify recommended mutation fields in combination with the characteristics of IoT protocols, providing targeted mutation strategies for each category. In addition, we evaluate LIPFuzzer on several widely-used implementations of well-known IoT protocols (e.g., Modbus-TCP, MQTT, and CoAP). Experimental results indicate that, compared to widely-used protocol fuzzers such as Peach, LIPFuzzer generates test cases more conveniently and efficiently, while also discovering vulnerabilities more effectively.

物联网(IoT)协议是物联网系统的核心要素,为设备之间的通信和数据交换提供基本支持。这些协议使各种设备能够连接并一起工作。然而,物联网协议实现中的潜在错误和漏洞可能使设备容易受到攻击。因此,确保物联网协议的安全性至关重要。常见的漏洞检测方法,如模糊测试,在评估这些实现时遇到了巨大的挑战,主要是因为需要大量的协议知识,高时间和资源消耗,以及难以生成高质量和有针对性的测试用例。为了解决上述问题,本文提出了一种用于测试物联网协议的智能fuzzer, LIPFuzzer。与严重依赖用户对协议的理解来生成测试用例的常见方法不同,LIPFuzzer在大型语言模型(llm)的帮助下,改变真实的物联网协议通信消息,以自动生成更有针对性的测试用例。具体而言,它利用llm了解协议的相关知识,分析不同类别的协议消息,并结合物联网协议的特点确定推荐的突变字段,为每个类别提供有针对性的突变策略。此外,我们还对LIPFuzzer在几种广泛使用的知名物联网协议(例如Modbus-TCP, MQTT和CoAP)的实现进行了评估。实验结果表明,与目前广泛使用的协议fuzzer(如Peach)相比,LIPFuzzer生成测试用例更方便、更高效,同时也能更有效地发现漏洞。
{"title":"Intelligent test case generation method for fuzzing IoT protocols based on LLM","authors":"Ming Zhong,&nbsp;Zisheng Zeng,&nbsp;Yijia Guo,&nbsp;Dandan Zhao,&nbsp;Bo Zhang,&nbsp;Shenghong Li,&nbsp;Hao Peng,&nbsp;Zhiguo Ding","doi":"10.1007/s10515-025-00557-x","DOIUrl":"10.1007/s10515-025-00557-x","url":null,"abstract":"<div><p>The Internet of Things (IoT) protocols are a core element of IoT systems, providing the fundamental support for communication and data exchange between devices. These protocols enable various devices to connect and work together. However, potential errors and vulnerabilities in IoT protocol implementations can make devices easily attacked. Therefore, ensuring the security of IoT protocols is of utmost importance. Common vulnerability detection methods, such as fuzzing, encounter significant challenges in evaluating these implementations, mainly due to the need for extensive protocol knowledge, high time and resource consumption, as well as the difficulty of generating high-quality and targeted test cases. In order to solve the above issues, this paper presents an intelligent fuzzer, LIPFuzzer, for testing IoT protocols. Unlike common methods that heavily rely on the user’s understanding of the protocol to generate test cases, LIPFuzzer, with the assistance of Large Language Models (LLMs), mutates real IoT protocol communication messages to automatically generate more targeted test cases. Specifically, it utilizes LLMs to understand the relative knowledge of protocols, analyze different categories of protocol messages, and identify recommended mutation fields in combination with the characteristics of IoT protocols, providing targeted mutation strategies for each category. In addition, we evaluate LIPFuzzer on several widely-used implementations of well-known IoT protocols (e.g., Modbus-TCP, MQTT, and CoAP). Experimental results indicate that, compared to widely-used protocol fuzzers such as Peach, LIPFuzzer generates test cases more conveniently and efficiently, while also discovering vulnerabilities more effectively.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HMF: Enhancing reentrancy vulnerability detection and repair with a hybrid model framework HMF:使用混合模型框架增强可重入漏洞检测和修复
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-13 DOI: 10.1007/s10515-025-00546-0
Mengliang Li, Qiang Shen, Xiaoxue Ren, Han Fu, Zhuo Li, Jianling Sun

Smart contracts have revolutionized the credit landscape. However, their security remains intensely scrutinized due to numerous hacking incidents and inherent logical challenges. One well-known issue is reentrancy vulnerability, exemplified by DAO attacks that lead to substantial economic losses. Previous approaches have employed rule-based and deep learning-based (DL) algorithms to detect and repair reentrancy vulnerability. Large language models (LLM) have been distinguished in recent years for their excellent understanding of text and code. However, less attention has been paid to LLM-based reentrancy vulnerability detection and repair, and direct prompt-based approaches often suffer from inefficiencies and high false positives. To overcome the above shortcomings, this paper proposes a hybrid model framework combining LLM with DL to enhance the detection and repair of reentrancy vulnerabilities. This unified framework comprises three crucial phases: the data processing phase, the vulnerability detection phase, and the vulnerability repair phase. Extensive experimental results validate the superiority of our approach over state-of-the-art baselines, and ablation studies demonstrate the effectiveness of each component. Our approach demonstrates significant improvements in vulnerability detection, with increases of 3.51% in accuracy, 2.31% in recall, 0.42% in precision, and 0.85% in F1-score. Furthermore, our approach can achieve a notable 9.62% enhancement in the repair rate. Finally, we also conducted a user study to emphasize its potential to fortify the security of smart contracts.

智能合约已经彻底改变了信贷格局。然而,由于大量的黑客事件和固有的逻辑挑战,它们的安全性仍然受到严格审查。一个众所周知的问题是可重入性漏洞,DAO攻击就是一个例子,它会导致巨大的经济损失。以前的方法采用基于规则和基于深度学习(DL)的算法来检测和修复重入漏洞。近年来,大型语言模型(LLM)因其对文本和代码的出色理解而备受瞩目。然而,对基于llm的可重入漏洞检测和修复的关注较少,直接基于提示的方法往往存在效率低下和误报率高的问题。为了克服上述不足,本文提出了一种结合LLM和DL的混合模型框架,以增强对可重入漏洞的检测和修复。该统一框架包括三个关键阶段:数据处理阶段、漏洞检测阶段和漏洞修复阶段。广泛的实验结果验证了我们的方法优于最先进的基线,烧蚀研究证明了每个组件的有效性。我们的方法在漏洞检测方面有了显著的改进,准确率提高了3.51%,召回率提高了2.31%,精确度提高了0.42%,f1得分提高了0.85%。此外,我们的方法可以使修复率显著提高9.62%。最后,我们还进行了一项用户研究,以强调其加强智能合约安全性的潜力。
{"title":"HMF: Enhancing reentrancy vulnerability detection and repair with a hybrid model framework","authors":"Mengliang Li,&nbsp;Qiang Shen,&nbsp;Xiaoxue Ren,&nbsp;Han Fu,&nbsp;Zhuo Li,&nbsp;Jianling Sun","doi":"10.1007/s10515-025-00546-0","DOIUrl":"10.1007/s10515-025-00546-0","url":null,"abstract":"<div>\u0000 \u0000 <p>Smart contracts have revolutionized the credit landscape. However, their security remains intensely scrutinized due to numerous hacking incidents and inherent logical challenges. One well-known issue is reentrancy vulnerability, exemplified by DAO attacks that lead to substantial economic losses. Previous approaches have employed rule-based and deep learning-based (DL) algorithms to detect and repair reentrancy vulnerability. Large language models (LLM) have been distinguished in recent years for their excellent understanding of text and code. However, less attention has been paid to LLM-based reentrancy vulnerability detection and repair, and direct prompt-based approaches often suffer from inefficiencies and high false positives. To overcome the above shortcomings, this paper proposes a hybrid model framework combining LLM with DL to enhance the detection and repair of reentrancy vulnerabilities. This unified framework comprises three crucial phases: the data processing phase, the vulnerability detection phase, and the vulnerability repair phase. Extensive experimental results validate the superiority of our approach over state-of-the-art baselines, and ablation studies demonstrate the effectiveness of each component. Our approach demonstrates significant improvements in vulnerability detection, with increases of 3.51% in accuracy, 2.31% in recall, 0.42% in precision, and 0.85% in F1-score. Furthermore, our approach can achieve a notable 9.62% enhancement in the repair rate. Finally, we also conducted a user study to emphasize its potential to fortify the security of smart contracts.</p>\u0000 </div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00546-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Code Generation Techniques: A Systematic Literature Review 自动代码生成技术:系统的文献综述
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-12 DOI: 10.1007/s10515-025-00551-3
Maha Alharbi, Mohammad Alshayeb

As modern software systems become complex and the demand for rapid development cycles increases, automatic code generation techniques have attained a prominent focus in academic research and industrial practice. These techniques can significantly reduce human error, increase productivity, and ensure consistency across large codebases. However, the task of generating code automatically presents significant challenges. In this study, we investigate, identify, and analyze the existing automatic techniques for generating code from various input formats, highlighting their efficiencies and areas for potential improvement. A Systematic Literature Review (SLR) is conducted to systematically summarize and review 76 primary studies related to automatic code generation in the software engineering domain. The selected studies are investigated from several dimensions: paradigms, techniques, input types, intermediate representations, tool support, targeted programming languages, and validation methods, including performance metrics, datasets, and benchmarking status. Our investigation identified 12 main techniques, categorized into five paradigms, where the Model-to-Code paradigm and model-driven techniques are the most prevalent. Notably, 57% of the studies utilized Java, and a limited number of studies showed multilingual support. Furthermore, 72% of the selected studies did not compare their results with existing techniques, and 17% lacked validation of the proposed techniques. We also noticed a lack of detailed information about the datasets used in the validation process, where 52% of the studies omitted these details. This SLR provides several recommendations to enhance methodological rigor in future research, and it highlights opportunities for leveraging emerging technologies to improve the efficiency of the identified automatic code generation techniques.

随着现代软件系统的复杂化和对快速开发周期的需求增加,自动代码生成技术在学术研究和工业实践中得到了突出的关注。这些技术可以显著减少人为错误,提高生产力,并确保大型代码库之间的一致性。然而,自动生成代码的任务带来了巨大的挑战。在本研究中,我们调查、识别和分析了现有的用于从各种输入格式生成代码的自动技术,强调了它们的效率和潜在改进的领域。系统文献综述(SLR)对软件工程领域中与自动代码生成相关的76项主要研究进行了系统总结和回顾。所选的研究从几个方面进行了调查:范式、技术、输入类型、中间表示、工具支持、目标编程语言和验证方法,包括性能指标、数据集和基准状态。我们的调查确定了12种主要技术,分为5种范式,其中模型到代码范式和模型驱动技术最为流行。值得注意的是,57%的研究使用了Java,而有限数量的研究显示了多语言支持。此外,72%的入选研究没有将其结果与现有技术进行比较,17%的研究缺乏对拟议技术的验证。我们还注意到缺乏验证过程中使用的数据集的详细信息,其中52%的研究省略了这些细节。该SLR提供了一些建议,以增强未来研究方法的严谨性,并强调了利用新兴技术来提高已确定的自动代码生成技术的效率的机会。
{"title":"Automatic Code Generation Techniques: A Systematic Literature Review","authors":"Maha Alharbi,&nbsp;Mohammad Alshayeb","doi":"10.1007/s10515-025-00551-3","DOIUrl":"10.1007/s10515-025-00551-3","url":null,"abstract":"<div><p>As modern software systems become complex and the demand for rapid development cycles increases, automatic code generation techniques have attained a prominent focus in academic research and industrial practice. These techniques can significantly reduce human error, increase productivity, and ensure consistency across large codebases. However, the task of generating code automatically presents significant challenges. In this study, we investigate, identify, and analyze the existing automatic techniques for generating code from various input formats, highlighting their efficiencies and areas for potential improvement. A Systematic Literature Review (SLR) is conducted to systematically summarize and review 76 primary studies related to automatic code generation in the software engineering domain. The selected studies are investigated from several dimensions: paradigms, techniques, input types, intermediate representations, tool support, targeted programming languages, and validation methods, including performance metrics, datasets, and benchmarking status. Our investigation identified 12 main techniques, categorized into five paradigms, where the Model-to-Code paradigm and model-driven techniques are the most prevalent. Notably, 57% of the studies utilized Java, and a limited number of studies showed multilingual support. Furthermore, 72% of the selected studies did not compare their results with existing techniques, and 17% lacked validation of the proposed techniques. We also noticed a lack of detailed information about the datasets used in the validation process, where 52% of the studies omitted these details. This SLR provides several recommendations to enhance methodological rigor in future research, and it highlights opportunities for leveraging emerging technologies to improve the efficiency of the identified automatic code generation techniques.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145037280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPTVD: vulnerability detection and analysis method based on LLM’s chain of thoughts GPTVD:基于LLM思维链的漏洞检测与分析方法
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-09 DOI: 10.1007/s10515-025-00550-4
Yinan Chen, Yuan Huang, Xiangping Chen, Pengfei Shen, Lei Yun

Traditional vulnerability detection methods based on rules or learning primarily focus on coarse-grained predictions, often lacking precise localization and interpretability regarding the root causes of vulnerabilities. The growing availability of open-source vulnerability databases calls for advanced methods that can reason about vulnerabilities at a finer slice-level granularity. GPTVD, which leverages large language models’ (LLMs) in-context learning (ICL) and chain-of-thought (COT) reasoning capabilities. The goal is to enhance both detection performance and explainability. GPTVD extracts threat code slices through static code analysis, focusing on data and control dependencies. Positive and negative samples are clustered based on heuristic features and semantic feature vectors, and representative samples are manually annotated with reasoning processes to build COT prompts. These prompts are combined with target samples to form LLM input queries, enabling slice-level vulnerability inference and explanation using LLM. The method was evaluated on 18,062 programs from a public dataset. GPTVD achieved superior performance compared to existing methods, with 92.21% accuracy, 93.20% precision, and 92.28% recall. Ablation studies confirm that clustering-based prompt selection, explicit threat code slices, and human expert reasoning significantly improve detection effectiveness and interpretability. GPTVD demonstrates that combining static code analysis with LLM-based COT reasoning can effectively detect vulnerabilities at the slice level with high accuracy and interpretability.

传统的基于规则或学习的漏洞检测方法主要关注粗粒度的预测,往往缺乏对漏洞根源的精确定位和可解释性。开源漏洞数据库的可用性越来越高,需要能够在更细的片级粒度上分析漏洞的高级方法。GPTVD,它利用了大型语言模型(llm)的上下文学习(ICL)和思维链(COT)推理能力。目标是提高检测性能和可解释性。GPTVD通过静态代码分析提取威胁代码片,专注于数据和控制依赖关系。基于启发式特征和语义特征向量对正样本和负样本进行聚类,并通过推理过程对代表性样本进行手动注释以构建COT提示。这些提示与目标样本相结合,形成LLM输入查询,支持使用LLM进行片级漏洞推断和解释。该方法在公共数据集中的18062个程序上进行了评估。与现有方法相比,GPTVD的准确率为92.21%,精密度为93.20%,召回率为92.28%。消融研究证实,基于聚类的提示选择、明确的威胁代码切片和人类专家推理显著提高了检测效率和可解释性。GPTVD表明,将静态代码分析与基于llm的COT推理相结合,可以有效地在片级检测漏洞,具有较高的准确率和可解释性。
{"title":"GPTVD: vulnerability detection and analysis method based on LLM’s chain of thoughts","authors":"Yinan Chen,&nbsp;Yuan Huang,&nbsp;Xiangping Chen,&nbsp;Pengfei Shen,&nbsp;Lei Yun","doi":"10.1007/s10515-025-00550-4","DOIUrl":"10.1007/s10515-025-00550-4","url":null,"abstract":"<div><p>Traditional vulnerability detection methods based on rules or learning primarily focus on coarse-grained predictions, often lacking precise localization and interpretability regarding the root causes of vulnerabilities. The growing availability of open-source vulnerability databases calls for advanced methods that can reason about vulnerabilities at a finer slice-level granularity. GPTVD, which leverages large language models’ (LLMs) in-context learning (ICL) and chain-of-thought (COT) reasoning capabilities. The goal is to enhance both detection performance and explainability. GPTVD extracts threat code slices through static code analysis, focusing on data and control dependencies. Positive and negative samples are clustered based on heuristic features and semantic feature vectors, and representative samples are manually annotated with reasoning processes to build COT prompts. These prompts are combined with target samples to form LLM input queries, enabling slice-level vulnerability inference and explanation using LLM. The method was evaluated on 18,062 programs from a public dataset. GPTVD achieved superior performance compared to existing methods, with 92.21% accuracy, 93.20% precision, and 92.28% recall. Ablation studies confirm that clustering-based prompt selection, explicit threat code slices, and human expert reasoning significantly improve detection effectiveness and interpretability. GPTVD demonstrates that combining static code analysis with LLM-based COT reasoning can effectively detect vulnerabilities at the slice level with high accuracy and interpretability.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145011642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced neighborhood metric for spreadsheet fault prediction 电子表格故障预测的增强邻域度量
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-09 DOI: 10.1007/s10515-025-00552-2
Haitao Sun, Ying Wang, Hai Yu, Zhiliang Zhu

Spreadsheets are widely used in business and scientific domains, yet they are prone to input errors that can lead to significant risks. Faults often occur due to the use of formulas that are syntactically correct but semantically incorrect. This issue is particularly challenging for formula cells that are physically close and exhibit minor logical differences, which traditional fault prediction methods struggle to detect. To address these challenges, this paper introduces an enhanced neighborhood metric approach, which extends traditional formula-based metrics by incorporating neighborhood-based metrics. This approach analyzes the dependencies between adjacent formula cells, considering factors such as formula diversity, content dissimilarity, and structural consistency. This study introduces eight new neighborhood-based spreadsheet indicators to improve fault prediction, building on previous metric-based methods. Extensive experiments conducted on three widely used datasets–Enron, INFO1, and EUSES–demonstrated that integrating the enhanced neighborhood metrics with traditional ones significantly improves fault prediction performance. The approach shows notable improvements in precision, recall, and F1-scores, particularly for medium and large datasets. This study highlights the importance of incorporating neighborhood metrics for spreadsheet fault detection. The enhanced neighborhood metric approach improves fault detection accuracy by capturing subtle logical variations between formula cells that are physically close. This method offers a robust and effective approach for improving the reliability of spreadsheets and can be applied in various real-world data analysis tasks.

电子表格广泛应用于商业和科学领域,但它们容易出现可能导致重大风险的输入错误。错误常常是由于使用了语法正确但语义不正确的公式而发生的。对于物理上接近且逻辑上存在微小差异的公式单元,这一问题尤其具有挑战性,而传统的故障预测方法很难检测到这些问题。为了解决这些挑战,本文介绍了一种增强的邻域度量方法,该方法通过纳入基于邻域的度量来扩展传统的基于公式的度量。该方法分析了相邻公式单元格之间的依赖关系,考虑了公式多样性、内容不相似性和结构一致性等因素。本研究在先前基于度量的方法的基础上引入了八个新的基于邻域的电子表格指标来改进故障预测。在安然、INFO1和eus3个广泛使用的数据集上进行的大量实验表明,将增强的邻域指标与传统的邻域指标相结合可以显著提高故障预测性能。该方法在精度、召回率和f1分数方面有显著提高,特别是对于大中型数据集。本研究强调了将邻域度量纳入电子表格故障检测的重要性。增强的邻域度量方法通过捕获物理上接近的公式单元之间的微妙逻辑变化来提高故障检测的准确性。该方法为提高电子表格的可靠性提供了一种稳健有效的方法,可以应用于各种现实世界的数据分析任务。
{"title":"Enhanced neighborhood metric for spreadsheet fault prediction","authors":"Haitao Sun,&nbsp;Ying Wang,&nbsp;Hai Yu,&nbsp;Zhiliang Zhu","doi":"10.1007/s10515-025-00552-2","DOIUrl":"10.1007/s10515-025-00552-2","url":null,"abstract":"<div><p>Spreadsheets are widely used in business and scientific domains, yet they are prone to input errors that can lead to significant risks. Faults often occur due to the use of formulas that are syntactically correct but semantically incorrect. This issue is particularly challenging for formula cells that are physically close and exhibit minor logical differences, which traditional fault prediction methods struggle to detect. To address these challenges, this paper introduces an enhanced neighborhood metric approach, which extends traditional formula-based metrics by incorporating neighborhood-based metrics. This approach analyzes the dependencies between adjacent formula cells, considering factors such as formula diversity, content dissimilarity, and structural consistency. This study introduces eight new neighborhood-based spreadsheet indicators to improve fault prediction, building on previous metric-based methods. Extensive experiments conducted on three widely used datasets–<i>Enron</i>, <i>INFO1</i>, and <i>EUSES</i>–demonstrated that integrating the enhanced neighborhood metrics with traditional ones significantly improves fault prediction performance. The approach shows notable improvements in precision, recall, and F1-scores, particularly for medium and large datasets. This study highlights the importance of incorporating neighborhood metrics for spreadsheet fault detection. The enhanced neighborhood metric approach improves fault detection accuracy by capturing subtle logical variations between formula cells that are physically close. This method offers a robust and effective approach for improving the reliability of spreadsheets and can be applied in various real-world data analysis tasks.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145011645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing the ability of LLMs for spaceborne equipment code generation via retrieval-augmented generation and contrastive learning 通过检索增强生成和对比学习,增强了llm对星载设备代码生成的能力
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-08-29 DOI: 10.1007/s10515-025-00545-1
Rui He, Liang Zhang, Liangqing Lyu, Changbin Xue

In the code generation field, Large Language Models (LLMs) pre-trained on numerous open-source code fragments show powerful reasoning abilities and remarkable downstream performance. They assist code generation by combining retrieval techniques like retrieving relevant code fragments as templates or using retrieval results to supplement natural language descriptions and get code examples. However, in domains like aerospace equipment, existing code generation technologies perform suboptimally. Different aerospace equipment has different functions and significant data processing and loading differences. There is a lack of effective retrieval methods to provide semantically similar code contexts for LLMs, hindering code generation from meeting complex task requirements. To address this, we propose CodeCLARE, a retrieval-augmented code generation framework. It first fine-tunes UniXcoder via contrastive learning and uses it as a semantic encoder for code fragment retrieval. Then, the NL2Code search strategy is adopted with program requirements as queries. In the final stage of the code generation process, through a “Few-Shots Selection” mechanism, the prompt templates effectively integrate both the retrieved code examples and the specific requirement information, enabling the successful generation of highly accurate C++ code through the advanced capabilities of LLMs. Experimental results show that our approach significantly improves code quality compared to traditional ones and provides an effective solution for spacecraft control code generation.

在代码生成领域,对大量开源代码片段进行预训练的大型语言模型(Large Language Models, llm)显示出强大的推理能力和显著的下游性能。它们通过结合检索技术来辅助代码生成,比如检索相关的代码片段作为模板,或者使用检索结果来补充自然语言描述并获得代码示例。然而,在像航空航天设备这样的领域,现有的代码生成技术的性能不是最优的。不同的航天设备具有不同的功能和显著的数据处理和加载差异。缺乏有效的检索方法来为llm提供语义相似的代码上下文,阻碍了代码生成满足复杂任务需求。为了解决这个问题,我们提出了CodeCLARE,一个检索增强的代码生成框架。它首先通过对比学习对UniXcoder进行微调,并将其用作代码片段检索的语义编码器。然后,采用NL2Code搜索策略,将程序需求作为查询。在代码生成过程的最后阶段,通过“Few-Shots Selection”机制,提示模板有效地整合了检索到的代码示例和特定的需求信息,使llm的高级功能能够成功生成高精度的c++代码。实验结果表明,该方法显著提高了编码质量,为航天器控制码生成提供了有效的解决方案。
{"title":"Enhancing the ability of LLMs for spaceborne equipment code generation via retrieval-augmented generation and contrastive learning","authors":"Rui He,&nbsp;Liang Zhang,&nbsp;Liangqing Lyu,&nbsp;Changbin Xue","doi":"10.1007/s10515-025-00545-1","DOIUrl":"10.1007/s10515-025-00545-1","url":null,"abstract":"<div><p>In the code generation field, Large Language Models (LLMs) pre-trained on numerous open-source code fragments show powerful reasoning abilities and remarkable downstream performance. They assist code generation by combining retrieval techniques like retrieving relevant code fragments as templates or using retrieval results to supplement natural language descriptions and get code examples. However, in domains like aerospace equipment, existing code generation technologies perform suboptimally. Different aerospace equipment has different functions and significant data processing and loading differences. There is a lack of effective retrieval methods to provide semantically similar code contexts for LLMs, hindering code generation from meeting complex task requirements. To address this, we propose CodeCLARE, a retrieval-augmented code generation framework. It first fine-tunes UniXcoder via contrastive learning and uses it as a semantic encoder for code fragment retrieval. Then, the NL2Code search strategy is adopted with program requirements as queries. In the final stage of the code generation process, through a “Few-Shots Selection” mechanism, the prompt templates effectively integrate both the retrieved code examples and the specific requirement information, enabling the successful generation of highly accurate C++ code through the advanced capabilities of LLMs. Experimental results show that our approach significantly improves code quality compared to traditional ones and provides an effective solution for spacecraft control code generation.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive and accessible user interfaces for seniors through model-driven engineering 通过模型驱动工程为老年人提供自适应和可访问的用户界面
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-08-11 DOI: 10.1007/s10515-025-00547-z
Shavindra Wickramathilaka, John Grundy, Kashumi Madampe, Omar Haggag

The use of diverse mobile applications among senior users is becoming increasingly widespread. However, many of these apps contain accessibility problems that result in negative user experiences for seniors. A key reason is that software practitioners often lack the time or resources to address the broad spectrum of age-related accessibility and personalisation needs. As current developer tools and practices encourage one-size-fits-all interfaces with limited potential to address the diversity of senior needs, there is a growing demand for approaches that support the systematic creation of adaptive, accessible app experiences. To this end, we present AdaptForge, a novel model-driven engineering (MDE) approach that enables advanced design-time adaptations of mobile application interfaces and behaviours tailored to the accessibility needs of senior users. AdaptForge uses two domain-specific languages (DSLs) to address age-related accessibility needs. The first model defines users’ context-of-use parameters, while the second defines conditional accessibility scenarios and corresponding UI adaptation rules. These rules are interpreted by an MDE workflow to transform an app’s original source code into personalised instances. We also report evaluations with professional software developers and senior end-users, demonstrating the feasibility and practical utility of AdaptForge.

老年用户使用各种移动应用程序的情况越来越普遍。然而,这些应用程序中的许多都包含可访问性问题,导致老年人的负面用户体验。一个关键的原因是软件从业者经常缺乏时间或资源来处理与年龄相关的广泛的可访问性和个性化需求。由于目前的开发工具和实践鼓励使用“一刀切”的界面来满足老年人需求的多样性,因此对支持系统地创建自适应的、可访问的应用程序体验的方法的需求越来越大。为此,我们提出了AdaptForge,这是一种新颖的模型驱动工程(MDE)方法,可以在设计时对移动应用程序接口和行为进行高级调整,以满足高级用户的可访问性需求。AdaptForge使用两种特定于领域的语言(dsl)来处理与年龄相关的可访问性需求。第一个模型定义了用户的使用上下文参数,第二个模型定义了条件访问场景和相应的UI适配规则。这些规则由MDE工作流解释,以将应用程序的原始源代码转换为个性化实例。我们还报告了与专业软件开发人员和高级最终用户的评估,展示了AdaptForge的可行性和实用性。
{"title":"Adaptive and accessible user interfaces for seniors through model-driven engineering","authors":"Shavindra Wickramathilaka,&nbsp;John Grundy,&nbsp;Kashumi Madampe,&nbsp;Omar Haggag","doi":"10.1007/s10515-025-00547-z","DOIUrl":"10.1007/s10515-025-00547-z","url":null,"abstract":"<div><p>The use of diverse mobile applications among senior users is becoming increasingly widespread. However, many of these apps contain accessibility problems that result in negative user experiences for seniors. A key reason is that software practitioners often lack the time or resources to address the broad spectrum of age-related accessibility and personalisation needs. As current developer tools and practices encourage one-size-fits-all interfaces with limited potential to address the diversity of senior needs, there is a growing demand for approaches that support the systematic creation of adaptive, accessible app experiences. To this end, we present <i>AdaptForge</i>, a novel model-driven engineering (MDE) approach that enables advanced design-time adaptations of mobile application interfaces and behaviours tailored to the accessibility needs of senior users. <i>AdaptForge</i> uses two domain-specific languages (DSLs) to address age-related accessibility needs. The first model defines users’ context-of-use parameters, while the second defines conditional accessibility scenarios and corresponding UI adaptation rules. These rules are interpreted by an MDE workflow to transform an app’s original source code into personalised instances. We also report evaluations with professional software developers and senior end-users, demonstrating the feasibility and practical utility of <i>AdaptForge</i>.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00547-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144810821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated detection of affected libraries from vulnerability reports 从漏洞报告中自动检测受影响的库
IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-08-11 DOI: 10.1007/s10515-025-00540-6
Jinwei Xu, He Zhang, Xin Zhou, Yanjing Yang, Runfeng Mao, Xiaokang Li, Lanxin Yang, Haifeng Shen

The growing reuse of third-party libraries in software supply chains increases the risk of being affected by the involved vulnerabilities. To strengthen software security, security vendors such as Snyk manage up-to-date vulnerability databases by associating reported vulnerabilities with their affected libraries, and contemporary digital organizations such as banking and software enterprises detect the third-party libraries they use if affected by these reported vulnerabilities. Existing studies focus on automating the detection process but make few efforts on detecting newly affected libraries, although new libraries (previously healthy) are constantly disclosed to be affected by new vulnerabilities. Moreover, existing studies do not seriously consider digital organizations’ concerns only about the libraries they use. In this paper, we propose an approach LibAlarm to address these challenges. We implement LibAlarm as a large language model-powered approach and compare it with the baseline approaches from multiple perspectives. Our experimental evaluation using 16,238 NVD reports indicates that LibAlarm improves the F1 by over 14% compared with baselines and detects over 40% newly affected libraries. For contemporary digital organizations, LibAlarm performs better than the baseline approaches with the F1 above 70% and the reduced false alarm ratio to 20%. Our case analysis using 540 NVD reports and 20 projects from Microsoft and Google demonstrates the effectiveness of LibAlarm. These results indicate that LibAlarm can help security vendors and digital organizations detect affected libraries from vulnerability reports.

软件供应链中不断增长的第三方库重用增加了受相关漏洞影响的风险。为了加强软件安全性,Snyk等安全供应商通过将报告的漏洞与受影响的库相关联来管理最新的漏洞数据库,而当代数字组织(如银行和软件企业)则检测他们使用的第三方库,如果受到这些报告的漏洞的影响。现有的研究侧重于自动化检测过程,但很少致力于检测新受影响的库,尽管新库(以前健康的)不断被披露受到新漏洞的影响。此外,现有的研究并没有认真考虑数字组织只关心他们使用的图书馆。在本文中,我们提出了一种方法LibAlarm来解决这些挑战。我们将LibAlarm作为一种大型语言模型支持的方法来实现,并从多个角度将其与基线方法进行比较。我们使用16,238份NVD报告进行的实验评估表明,与基线相比,LibAlarm将F1提高了14%以上,并检测到超过40%的新受影响的库。对于当代数字组织,LibAlarm的性能优于基线方法,F1高于70%,误报率降至20%。我们使用540份NVD报告和来自Microsoft和b谷歌的20个项目进行案例分析,证明了LibAlarm的有效性。这些结果表明,LibAlarm可以帮助安全供应商和数字组织从漏洞报告中检测受影响的库。
{"title":"Automated detection of affected libraries from vulnerability reports","authors":"Jinwei Xu,&nbsp;He Zhang,&nbsp;Xin Zhou,&nbsp;Yanjing Yang,&nbsp;Runfeng Mao,&nbsp;Xiaokang Li,&nbsp;Lanxin Yang,&nbsp;Haifeng Shen","doi":"10.1007/s10515-025-00540-6","DOIUrl":"10.1007/s10515-025-00540-6","url":null,"abstract":"<div><p>The growing reuse of third-party libraries in software supply chains increases the risk of being affected by the involved vulnerabilities. To strengthen software security, <i>security vendors</i> such as Snyk manage up-to-date vulnerability databases by associating reported vulnerabilities with their affected libraries, and <i>contemporary digital organizations</i> such as banking and software enterprises detect the third-party libraries they use if affected by these reported vulnerabilities. Existing studies focus on automating the detection process but make few efforts on detecting newly affected libraries, although new libraries (previously healthy) are constantly disclosed to be affected by new vulnerabilities. Moreover, existing studies do not seriously consider digital organizations’ concerns only about the libraries they use. In this paper, we propose an approach <b>LibAlarm</b> to address these challenges. We implement LibAlarm as a large language model-powered approach and compare it with the baseline approaches from multiple perspectives. Our experimental evaluation using 16,238 NVD reports indicates that LibAlarm improves the F1 by over 14% compared with baselines and detects over 40% newly affected libraries. For contemporary digital organizations, LibAlarm performs better than the baseline approaches with the F1 above 70% and the reduced false alarm ratio to 20%. Our case analysis using 540 NVD reports and 20 projects from Microsoft and Google demonstrates the effectiveness of LibAlarm. These results indicate that LibAlarm can help security vendors and digital organizations detect affected libraries from vulnerability reports.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144810831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Automated Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1