首页 > 最新文献

Empirical Software Engineering最新文献

英文 中文
Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments? 工具和基准的发展:它们对SBSE实验中的参数调优有什么影响?
IF 3.6 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-01 Epub Date: 2025-11-04 DOI: 10.1007/s10664-025-10733-y
Amid Golmohammadi, Man Zhang, Andrea Arcuri

In this article, we explore the impact of tool development and its evolution in Search-Based Software Engineering (SBSE) research. As a research tool evolves throughout the years, experiments with novel techniques might require reevaluation of previous studies, especially regarding parameter tuning. These reevaluations also give the opportunity to address the threats to external validity of these previous studies by employing a larger selection of artifacts. To conduct the replicated experiments in this study, the search-based fuzzer EvoMaster is chosen. This SBSE tool has been developed and extended throughout several years (since 2016) and tens of scientific studies. Among the chosen tool's parameters, 6 were carefully selected based on 5 previous studies that we replicate in this article with the latest version of EvoMaster. The replication is applied across an expanded set of artifacts compared to the original replicated studies. Our objective is to validate the robustness and validity of previous findings and to determine the need for parameter tuning in response to the tool's continuous development. Beyond replication, we explored parameter tuning by testing 729 different configurations to find a more performant parameter set, which is later validated through additional rounds of experiments. Additionally, we analyzed the impact of individual parameters on test generation performance using machine learning models, providing insights into their relative effects. Our findings indicate that, although most parameters maintain their efficacy, 2 of them require adjustment. Furthermore, the investigation into the effects of combining different parameter values reveals that carefully optimized configurations can outperform default settings. These findings highlight the importance of regularly reevaluating parameter settings to enhance tool performance in SBSE research.

在本文中,我们探讨了工具开发的影响及其在基于搜索的软件工程(SBSE)研究中的演变。随着多年来研究工具的发展,新技术的实验可能需要重新评估以前的研究,特别是关于参数调整。这些重新评估也提供了机会,通过采用更大的人工制品选择来解决对这些先前研究的外部有效性的威胁。为了在本研究中进行重复实验,我们选择了基于搜索的模糊器EvoMaster。这个SBSE工具已经开发和扩展了几年(自2016年以来)和数十项科学研究。在选择的工具参数中,有6个是根据之前的5项研究精心选择的,我们在本文中使用最新版本的EvoMaster重复了这些研究。与原始复制研究相比,复制应用于扩展的工件集。我们的目标是验证先前发现的稳健性和有效性,并确定参数调整的需求,以响应工具的持续开发。除了复制之外,我们还通过测试729种不同的配置来探索参数调优,以找到一个性能更高的参数集,然后通过其他几轮实验对其进行验证。此外,我们使用机器学习模型分析了单个参数对测试生成性能的影响,提供了对它们的相对影响的见解。我们的研究结果表明,虽然大多数参数保持其有效性,但其中2个参数需要调整。此外,对组合不同参数值的影响的研究表明,精心优化的配置可以优于默认设置。这些发现强调了在SBSE研究中定期重新评估参数设置以提高工具性能的重要性。
{"title":"Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments?","authors":"Amid Golmohammadi, Man Zhang, Andrea Arcuri","doi":"10.1007/s10664-025-10733-y","DOIUrl":"10.1007/s10664-025-10733-y","url":null,"abstract":"<p><p>In this article, we explore the impact of tool development and its evolution in Search-Based Software Engineering (SBSE) research. As a research tool evolves throughout the years, experiments with novel techniques might require reevaluation of previous studies, especially regarding parameter tuning. These reevaluations also give the opportunity to address the threats to external validity of these previous studies by employing a larger selection of artifacts. To conduct the replicated experiments in this study, the search-based fuzzer EvoMaster is chosen. This SBSE tool has been developed and extended throughout several years (since 2016) and tens of scientific studies. Among the chosen tool's parameters, 6 were carefully selected based on 5 previous studies that we replicate in this article with the latest version of EvoMaster. The replication is applied across an expanded set of artifacts compared to the original replicated studies. Our objective is to validate the robustness and validity of previous findings and to determine the need for parameter tuning in response to the tool's continuous development. Beyond replication, we explored parameter tuning by testing 729 different configurations to find a more performant parameter set, which is later validated through additional rounds of experiments. Additionally, we analyzed the impact of individual parameters on test generation performance using machine learning models, providing insights into their relative effects. Our findings indicate that, although most parameters maintain their efficacy, 2 of them require adjustment. Furthermore, the investigation into the effects of combining different parameter values reveals that carefully optimized configurations can outperform default settings. These findings highlight the importance of regularly reevaluating parameter settings to enhance tool performance in SBSE research.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"31 1","pages":"8"},"PeriodicalIF":3.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12583389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145451308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simplifying software compliance: AI technologies in drafting technical documentation for the AI Act. 简化软件遵从性:AI法案技术文档起草中的AI技术。
IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2025-04-02 DOI: 10.1007/s10664-025-10645-x
Francesco Sovrano, Emmie Hine, Stefano Anzolut, Alberto Bacchelli

The European AI Act has introduced specific technical documentation requirements for AI systems. Compliance with them is challenging due to the need for advanced knowledge of both legal and technical aspects, which is rare among software developers and legal professionals. Consequently, small and medium-sized enterprises may face high costs in meeting these requirements. In this study, we explore how contemporary AI technologies, including ChatGPT and an existing compliance tool (DoXpert), can aid software developers in creating technical documentation that complies with the AI Act. We specifically demonstrate how these AI tools can identify gaps in existing documentation according to the provisions of the AI Act. Using open-source high-risk AI systems as case studies, we collaborated with legal experts to evaluate how closely tool-generated assessments align with expert opinions. Findings show partial alignment, important issues with ChatGPT (3.5 and 4), and a moderate (and statistically significant) correlation between DoXpert and expert judgments, according to the Rank Biserial Correlation analysis. Nonetheless, these findings underscore the potential of AI to combine with human analysis and alleviate the compliance burden, supporting the broader goal of fostering responsible and transparent AI development under emerging regulatory frameworks.

《欧洲人工智能法案》对人工智能系统提出了具体的技术文档要求。遵守它们是具有挑战性的,因为需要法律和技术方面的高级知识,这在软件开发人员和法律专业人员中很少见。因此,中小型企业在满足这些要求时可能面临较高的成本。在本研究中,我们探讨了当代人工智能技术,包括ChatGPT和现有的合规工具(DoXpert),如何帮助软件开发人员创建符合人工智能法案的技术文档。我们具体展示了这些人工智能工具如何根据《人工智能法案》的规定识别现有文档中的空白。使用开源高风险人工智能系统作为案例研究,我们与法律专家合作,评估工具生成的评估与专家意见的一致程度。调查结果显示部分对齐,ChatGPT(3.5和4)的重要问题,以及DoXpert和专家判断之间的适度(和统计显著)相关性,根据秩双列相关分析。尽管如此,这些发现强调了人工智能与人类分析相结合并减轻合规负担的潜力,支持在新兴监管框架下促进负责任和透明的人工智能发展的更广泛目标。
{"title":"Simplifying software compliance: AI technologies in drafting technical documentation for the AI Act.","authors":"Francesco Sovrano, Emmie Hine, Stefano Anzolut, Alberto Bacchelli","doi":"10.1007/s10664-025-10645-x","DOIUrl":"10.1007/s10664-025-10645-x","url":null,"abstract":"<p><p>The European AI Act has introduced specific technical documentation requirements for AI systems. Compliance with them is challenging due to the need for advanced knowledge of both legal and technical aspects, which is rare among software developers and legal professionals. Consequently, small and medium-sized enterprises may face high costs in meeting these requirements. In this study, we explore how contemporary AI technologies, including ChatGPT and an existing compliance tool (DoXpert), can aid software developers in creating technical documentation that complies with the AI Act. We specifically demonstrate how these AI tools can identify gaps in existing documentation according to the provisions of the AI Act. Using open-source high-risk AI systems as case studies, we collaborated with legal experts to evaluate how closely tool-generated assessments align with expert opinions. Findings show partial alignment, important issues with ChatGPT (3.5 and 4), and a moderate (and statistically significant) correlation between DoXpert and expert judgments, according to the Rank Biserial Correlation analysis. Nonetheless, these findings underscore the potential of AI to combine with human analysis and alleviate the compliance burden, supporting the broader goal of fostering responsible and transparent AI development under emerging regulatory frameworks.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"91"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11965209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143794942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the effects of program slicing for vulnerability detection during code inspection. 论程序切片在代码检查过程中漏洞检测中的作用。
IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2025-04-05 DOI: 10.1007/s10664-025-10636-y
Aurora Papotti, Katja Tuma, Fabio Massacci

Slicing is a fault localization technique that has been proposed to support debugging and program comprehension. Yet, its empirical effectiveness during code inspection by humans has received limited attention. The goal of our study is two-fold. First, we aim to define what it means for a code reviewer to identify the vulnerable lines correctly. Second, we investigate whether reducing the number of to-be-inspected lines by method-level slicing supports code reviewers in detecting security vulnerabilities. We propose a novel approach based on the notion of a δ -neighborhood (intuitively based on the idea of the context size of the command git  diff) to define correctly identified lines. Then, we conducted a multi-year controlled experiment (2017-2023) in which MSc students attending security courses ( n = 236 ) were tasked with identifying vulnerable lines in original or sliced Java files from Apache Tomcat. We provide perfect seed lines for a slicing algorithm to control for confounding factors. Each treatment differs in the pair (Vulnerability, Original/Sliced) with a balanced design with vulnerabilities from the OWASP Top 10 2017: A1 (Injection), A5 (Broken Access Control), A6 (Security Misconfiguration), and A7 (Cross-Site Scripting). To generate smaller slices for human consumption, we used a variant of intra-procedural thin slicing. We report the results for δ = 0 which corresponds to exactly matching the vulnerable ground truth lines, and δ = 3 which represents the scenario of identifying the vulnerable area. For both cases, we found that slicing helps in 'finding something' (the participant has found at least some vulnerable lines) as opposed to 'finding nothing'. For the case of δ = 0 analyzing a slice and analyzing the original file are statistically equivalent from the perspective of lines found by those who found something. With δ = 3 slicing helps to find more vulnerabilities compared to analyzing an original file, as we would normally expect. Given the type of population, additional experiments are necessary to be generalized to experienced developers.

切片是一种支持调试和程序理解的故障定位技术。然而,它在人类代码检查中的经验有效性受到了有限的关注。我们研究的目的是双重的。首先,我们的目标是定义代码审查者正确识别易受攻击的行意味着什么。其次,我们研究了通过方法级切片减少待检查行的数量是否支持代码审查者检测安全漏洞。我们提出了一种基于δ邻域概念的新方法(直观地基于命令git diff的上下文大小的想法)来定义正确识别的行。然后,我们进行了一项多年的对照实验(2017-2023),其中参加安全课程的理学硕士学生(n = 236)的任务是识别来自Apache Tomcat的原始或切片Java文件中的脆弱行。我们为切片算法提供了完美的种子线,以控制混杂因素。每种处理方法都不同(漏洞,原始/切片),并采用平衡设计,其中包含OWASP 2017年十大漏洞:A1(注入),A5(破坏访问控制),A6(安全错误配置)和A7(跨站点脚本)。为了生成供人类食用的更小的切片,我们使用了程序内薄切片的一种变体。我们报告了δ = 0和δ = 3的结果,δ = 0对应于脆弱地面真实线的精确匹配,δ = 3代表识别脆弱区域的场景。在这两种情况下,我们都发现切片有助于“找到一些东西”(参与者至少找到了一些脆弱的线条),而不是“什么都找不到”。对于δ = 0的情况,从发现某些东西的人发现的行的角度来看,分析切片和分析原始文件在统计上是相等的。与分析原始文件相比,δ = 3切片有助于发现更多漏洞,正如我们通常所期望的那样。考虑到人口的类型,有必要向有经验的开发人员推广额外的实验。
{"title":"On the effects of program slicing for vulnerability detection during code inspection.","authors":"Aurora Papotti, Katja Tuma, Fabio Massacci","doi":"10.1007/s10664-025-10636-y","DOIUrl":"10.1007/s10664-025-10636-y","url":null,"abstract":"<p><p>Slicing is a fault localization technique that has been proposed to support debugging and program comprehension. Yet, its empirical effectiveness during code inspection by humans has received limited attention. The goal of our study is two-fold. First, we aim to define what it means for a code reviewer to identify the vulnerable lines correctly. Second, we investigate whether reducing the number of to-be-inspected lines by method-level slicing supports code reviewers in detecting security vulnerabilities. We propose a novel approach based on the notion of a <math><mi>δ</mi></math> -neighborhood (intuitively based on the idea of the context size of the command git  diff) to define correctly identified lines. Then, we conducted a multi-year controlled experiment (2017-2023) in which MSc students attending security courses ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>236</mn></mrow> </math> ) were tasked with identifying vulnerable lines in original or sliced Java files from Apache Tomcat. We provide perfect seed lines for a slicing algorithm to control for confounding factors. Each treatment differs in the pair (Vulnerability, Original/Sliced) with a balanced design with vulnerabilities from the OWASP Top 10 2017: A1 (Injection), A5 (Broken Access Control), A6 (Security Misconfiguration), and A7 (Cross-Site Scripting). To generate smaller slices for human consumption, we used a variant of intra-procedural thin slicing. We report the results for <math><mrow><mi>δ</mi> <mo>=</mo> <mn>0</mn></mrow> </math> which corresponds to exactly matching the vulnerable ground truth lines, and <math><mrow><mi>δ</mi> <mo>=</mo> <mn>3</mn></mrow> </math> which represents the scenario of identifying the vulnerable area. For both cases, we found that slicing helps in 'finding something' (the participant has found at least some vulnerable lines) as opposed to 'finding nothing'. For the case of <math><mrow><mi>δ</mi> <mo>=</mo> <mn>0</mn></mrow> </math> analyzing a slice and analyzing the original file are statistically equivalent from the perspective of lines found by those who found something. With <math><mrow><mi>δ</mi> <mo>=</mo> <mn>3</mn></mrow> </math> slicing helps to find more vulnerabilities compared to analyzing an original file, as we would normally expect. Given the type of population, additional experiments are necessary to be generalized to experienced developers.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"93"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143802692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning for online testing of autonomous driving systems: a replication and extension study. 用于自动驾驶系统在线测试的强化学习:一项复制和扩展研究。
IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2024-11-05 DOI: 10.1007/s10664-024-10562-5
Luca Giamattei, Matteo Biagiola, Roberto Pietrantuono, Stefano Russo, Paolo Tonella

In a recent study, Reinforcement Learning (RL) used in combination with many-objective search, has been shown to outperform alternative techniques (random search and many-objective search) for online testing of Deep Neural Network-enabled systems. The empirical evaluation of these techniques was conducted on a state-of-the-art Autonomous Driving System (ADS). This work is a replication and extension of that empirical study. Our replication shows that RL does not outperform pure random test generation in a comparison conducted under the same settings of the original study, but with no confounding factor coming from the way collisions are measured. Our extension aims at eliminating some of the possible reasons for the poor performance of RL observed in our replication: (1) the presence of reward components providing contrasting feedback to the RL agent; (2) the usage of an RL algorithm (Q-learning) which requires discretization of an intrinsically continuous state space. Results show that our new RL agent is able to converge to an effective policy that outperforms random search. Results also highlight other possible improvements, which open to further investigations on how to best leverage RL for online ADS testing.

在最近的一项研究中,强化学习(RL)与多目标搜索结合使用,在深度神经网络支持系统的在线测试中表现优于其他技术(随机搜索和多目标搜索)。对这些技术的实证评估是在最先进的自动驾驶系统(ADS)上进行的。这项工作是该实证研究的复制和扩展。我们的重复研究表明,在与原始研究相同的设置下进行的比较中,RL 并没有优于纯粹的随机测试生成,但碰撞测量的方式并没有带来混杂因素。我们的扩展旨在消除在复制中观察到的 RL 性能不佳的一些可能原因:(1) 向 RL 代理提供对比反馈的奖励成分的存在;(2) RL 算法(Q-learning)的使用要求对本质上连续的状态空间进行离散化。结果表明,我们的新 RL 代理能够收敛到优于随机搜索的有效策略。结果还凸显了其他可能的改进,这为进一步研究如何最好地利用 RL 进行在线 ADS 测试提供了可能。
{"title":"Reinforcement learning for online testing of autonomous driving systems: a replication and extension study.","authors":"Luca Giamattei, Matteo Biagiola, Roberto Pietrantuono, Stefano Russo, Paolo Tonella","doi":"10.1007/s10664-024-10562-5","DOIUrl":"10.1007/s10664-024-10562-5","url":null,"abstract":"<p><p>In a recent study, Reinforcement Learning (RL) used in combination with many-objective search, has been shown to outperform alternative techniques (random search and many-objective search) for online testing of Deep Neural Network-enabled systems. The empirical evaluation of these techniques was conducted on a state-of-the-art Autonomous Driving System (ADS). This work is a replication and extension of that empirical study. Our replication shows that RL does not outperform pure random test generation in a comparison conducted under the same settings of the original study, but with no confounding factor coming from the way collisions are measured. Our extension aims at eliminating some of the possible reasons for the poor performance of RL observed in our replication: (1) the presence of reward components providing contrasting feedback to the RL agent; (2) the usage of an RL algorithm (Q-learning) which requires discretization of an intrinsically continuous state space. Results show that our new RL agent is able to converge to an effective policy that outperforms random search. Results also highlight other possible improvements, which open to further investigations on how to best leverage RL for online ADS testing.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 1","pages":"19"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142602130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An empirical study of fault localisation techniques for deep neural networks. 深度神经网络故障定位技术的实证研究。
IF 3.6 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2025-06-10 DOI: 10.1007/s10664-025-10657-7
Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella

With the increased popularity of Deep Neural Networks (DNNs), increases also the need for tools to assist developers in the DNN implementation, testing and debugging process. Several approaches have been proposed that automatically analyse and localise potential faults in DNNs under test. In this work, we evaluate and compare existing state-of-the-art fault localisation techniques, which operate based on both dynamic and static analysis of the DNN. The evaluation is performed on a benchmark consisting of both real faults obtained from bug reporting platforms and faulty models produced by a mutation tool. Our findings indicate that the usage of a single, specific ground truth (e.g. the human-defined one) for the evaluation of DNN fault localisation tools results in pretty low performance (maximum average recall of 0.33 and precision of 0.21). However, such figures increase when considering alternative, equivalent patches that exist for a given faulty DNN. The results indicate that DeepFD is the most effective tool, achieving an average recall of 0.55 and a precision of 0.37 on our benchmark.

随着深度神经网络(DNN)的日益普及,对工具的需求也在增加,以帮助开发人员在DNN的实现、测试和调试过程中。已经提出了几种自动分析和定位被测深度神经网络潜在故障的方法。在这项工作中,我们评估和比较了现有的最先进的故障定位技术,这些技术基于深度神经网络的动态和静态分析。评估是在一个基准上执行的,该基准包括从bug报告平台获得的真实故障和由突变工具产生的故障模型。我们的研究结果表明,使用单个特定的基础真值(例如人类定义的真值)来评估DNN故障定位工具的性能非常低(最大平均召回率为0.33,精度为0.21)。然而,当考虑到一个给定错误的深度神经网络存在的替代、等效补丁时,这个数字会增加。结果表明,DeepFD是最有效的工具,在我们的基准上实现了0.55的平均召回率和0.37的精度。
{"title":"An empirical study of fault localisation techniques for deep neural networks.","authors":"Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella","doi":"10.1007/s10664-025-10657-7","DOIUrl":"10.1007/s10664-025-10657-7","url":null,"abstract":"<p><p>With the increased popularity of Deep Neural Networks (DNNs), increases also the need for tools to assist developers in the DNN implementation, testing and debugging process. Several approaches have been proposed that automatically analyse and localise potential faults in DNNs under test. In this work, we evaluate and compare existing state-of-the-art fault localisation techniques, which operate based on both dynamic and static analysis of the DNN. The evaluation is performed on a benchmark consisting of both real faults obtained from bug reporting platforms and faulty models produced by a mutation tool. Our findings indicate that the usage of a single, specific ground truth (e.g. the human-defined one) for the evaluation of DNN fault localisation tools results in pretty low performance (maximum average recall of 0.33 and precision of 0.21). However, such figures increase when considering alternative, equivalent patches that exist for a given faulty DNN. The results indicate that DeepFD is the most effective tool, achieving an average recall of 0.55 and a precision of 0.37 on our benchmark.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"124"},"PeriodicalIF":3.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12152046/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144283001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI support for data scientists: An empirical study on workflow and alternative code recommendations. 对数据科学家的人工智能支持:关于工作流和替代代码建议的实证研究。
IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2025-07-04 DOI: 10.1007/s10664-025-10622-4
Dhivyabharathi Ramasamy, Cristina Sarasua, Abraham Bernstein

Despite the popularity of AI assistants for coding activities, there is limited empirical work on whether these coding assistants can help users complete data science tasks. Moreover, in data science programming, exploring alternative paths has been widely advocated, as such paths may lead to diverse understandings and conclusions (Gelman and Loken 2013; Kale et al. 2019). Whether existing AI-based coding assistants can support data scientists in exploring the relevant alternative paths remains unexplored. To fill this gap, we conducted a mixed-methods study to understand how data scientists solved different data science tasks with the help of an AI-based coding assistant that provides explicit alternatives as recommendations throughout the data science workflow. Specifically, we quantitatively investigated whether the users accept the code recommendations, including alternative recommendations, by the AI assistant and whether the recommendations are helpful when completing descriptive and predictive data science tasks. Through the empirical study, we also investigated if including information about the data science step (e.g., data exploration) they seek recommendations for in a prompt leads to helpful recommendations. In our study, we found that including the data science step in a prompt had a statistically significant improvement in the acceptance of recommendations, whereas the presence of alternatives did not lead to any significant differences. Our study also shows a statistically significant difference in the acceptance and usefulness of recommendations between descriptive and predictive tasks. Participants generally had positive sentiments regarding AI assistance and our proposed interface. We share further insights on the interactions that emerged during the study and the challenges that our users encountered while solving their data science tasks.

Supplementary information: The online version contains supplementary material available at 10.1007/s10664-025-10622-4.

尽管人工智能助手在编码活动中很受欢迎,但关于这些编码助手是否能帮助用户完成数据科学任务的实证研究有限。此外,在数据科学编程中,探索替代路径已经被广泛提倡,因为这些路径可能导致不同的理解和结论(Gelman和Loken 2013;Kale et al. 2019)。现有的基于人工智能的编码助手是否能够支持数据科学家探索相关的替代路径仍未被探索。为了填补这一空白,我们进行了一项混合方法研究,以了解数据科学家如何在基于人工智能的编码助手的帮助下解决不同的数据科学任务,该助手在整个数据科学工作流程中提供明确的替代方案作为建议。具体来说,我们定量地调查了用户是否接受人工智能助手的代码建议,包括替代建议,以及这些建议在完成描述性和预测性数据科学任务时是否有帮助。通过实证研究,我们还调查了是否包括有关数据科学步骤(例如,数据探索)的信息,他们寻求建议在一个提示导致有用的建议。在我们的研究中,我们发现在提示中包含数据科学步骤在接受建议方面有统计学上显著的改善,而替代方案的存在并没有导致任何显着差异。我们的研究还显示,在描述性任务和预测性任务之间,推荐的接受度和有用性在统计上有显著差异。与会者普遍对人工智能协助和我们建议的界面持积极态度。我们将分享在研究过程中出现的交互以及用户在解决数据科学任务时遇到的挑战的进一步见解。补充信息:在线版本包含补充资料,可在10.1007/s10664-025-10622-4获得。
{"title":"AI support for data scientists: An empirical study on workflow and alternative code recommendations.","authors":"Dhivyabharathi Ramasamy, Cristina Sarasua, Abraham Bernstein","doi":"10.1007/s10664-025-10622-4","DOIUrl":"10.1007/s10664-025-10622-4","url":null,"abstract":"<p><p>Despite the popularity of AI assistants for coding activities, there is limited empirical work on whether these coding assistants can help users complete data science tasks. Moreover, in data science programming, exploring alternative paths has been widely advocated, as such paths may lead to diverse understandings and conclusions (Gelman and Loken 2013; Kale et al. 2019). Whether existing AI-based coding assistants can support data scientists in exploring the relevant alternative paths remains unexplored. To fill this gap, we conducted a mixed-methods study to understand how data scientists solved different data science tasks with the help of an AI-based coding assistant that provides explicit alternatives as recommendations throughout the data science workflow. Specifically, we quantitatively investigated whether the users accept the code recommendations, including alternative recommendations, by the AI assistant and whether the recommendations are helpful when completing descriptive and predictive data science tasks. Through the empirical study, we also investigated if including information about the data science step (e.g., data exploration) they seek recommendations for in a prompt leads to helpful recommendations. In our study, we found that including the data science step in a prompt had a statistically significant improvement in the acceptance of recommendations, whereas the presence of alternatives did not lead to any significant differences. Our study also shows a statistically significant difference in the acceptance and usefulness of recommendations between descriptive and predictive tasks. Participants generally had positive sentiments regarding AI assistance and our proposed interface. We share further insights on the interactions that emerged during the study and the challenges that our users encountered while solving their data science tasks.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s10664-025-10622-4.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"133"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12227384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144575073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding security tactics in microservice APIs using annotated software architecture decomposition models - a controlled experiment. 使用带注释的软件架构分解模型理解微服务api中的安全策略——一个受控实验。
IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2025-02-14 DOI: 10.1007/s10664-024-10601-1
Patric Genfer, Souhaila Serbout, Georg Simhandl, Uwe Zdun, Cesare Pautasso

While microservice architectures have become a widespread option for designing distributed applications, designing secure microservice systems remains challenging. Although various security-related guidelines and practices exist, these systems' sheer size, complex communication structures, and polyglot tech stacks make it difficult to manually validate whether adequate security tactics are applied throughout their architecture. To address these challenges, we have devised a novel solution that involves the automatic generation of security-annotated software decomposition models and the utilization of security-based metrics to guide software architectures through the assessment of security tactics employed within microservice systems. To evaluate the effectiveness of our artifacts, we conducted a controlled experiment where we asked 60 students from two universities and ten experts from the industry to identify and assess the security features of two microservice reference systems. During the experiment, we tracked the correctness of their answers and the time they needed to solve the given tasks to measure how well they could understand the security tactics applied in the reference systems. Our results indicate that the supplemental material significantly improved the correctness of the participants' answers without requiring them to consult the documentation more. Most participants also stated in a self-assessment that their understanding of the security tactics used in the systems improved significantly because of the provided material, with the additional diagrams considered very helpful. In contrast, the perception of architectural metrics varied widely. We could also show that novice developers benefited most from the supplementary diagrams. In contrast, senior developers could rely on their experience to compensate for the lack of additional help. Contrary to our expectations, we found no significant correlation between the time spent solving the tasks and the overall correctness score achieved, meaning that participants who took more time to read the documentation did not automatically achieve better results. As far as we know, this empirical study is the first analysis that explores the influence of security annotations in component diagrams to guide software developers when assessing microservice system security.

虽然微服务架构已经成为设计分布式应用程序的广泛选择,但设计安全的微服务系统仍然具有挑战性。尽管存在各种与安全相关的指导方针和实践,但这些系统的庞大规模、复杂的通信结构和多语言技术堆栈使得很难手动验证是否在整个体系结构中应用了适当的安全策略。为了应对这些挑战,我们设计了一个新颖的解决方案,包括自动生成带有安全注释的软件分解模型,并利用基于安全的度量来指导软件架构,通过评估微服务系统中使用的安全策略。为了评估我们的工件的有效性,我们进行了一个控制实验,我们要求来自两所大学的60名学生和来自业界的10名专家来识别和评估两个微服务参考系统的安全特性。在实验过程中,我们跟踪了他们答案的正确性以及他们解决给定任务所需的时间,以衡量他们对参考系统中应用的安全策略的理解程度。我们的结果表明,补充材料显著提高了参与者的答案的正确性,而不需要他们更多地查阅文档。大多数参与者还在自我评估中表示,由于提供了材料,他们对系统中使用的安全策略的理解得到了显著提高,附加的图表被认为非常有帮助。相比之下,对体系结构度量的理解差异很大。我们还可以说明新手开发人员从补充图中获益最多。相比之下,高级开发人员可以依靠他们的经验来弥补额外帮助的缺乏。与我们的预期相反,我们发现解决任务所花费的时间与获得的总体正确性得分之间没有显著的相关性,这意味着花费更多时间阅读文档的参与者不会自动获得更好的结果。据我们所知,这项实证研究首次探讨了组件图中的安全注释对软件开发人员评估微服务系统安全性的影响。
{"title":"Understanding security tactics in microservice APIs using annotated software architecture decomposition models - a controlled experiment.","authors":"Patric Genfer, Souhaila Serbout, Georg Simhandl, Uwe Zdun, Cesare Pautasso","doi":"10.1007/s10664-024-10601-1","DOIUrl":"10.1007/s10664-024-10601-1","url":null,"abstract":"<p><p>While microservice architectures have become a widespread option for designing distributed applications, designing secure microservice systems remains challenging. Although various security-related guidelines and practices exist, these systems' sheer size, complex communication structures, and polyglot tech stacks make it difficult to manually validate whether adequate security tactics are applied throughout their architecture. To address these challenges, we have devised a novel solution that involves the automatic generation of security-annotated software decomposition models and the utilization of security-based metrics to guide software architectures through the assessment of security tactics employed within microservice systems. To evaluate the effectiveness of our artifacts, we conducted a controlled experiment where we asked 60 students from two universities and ten experts from the industry to identify and assess the security features of two microservice reference systems. During the experiment, we tracked the correctness of their answers and the time they needed to solve the given tasks to measure how well they could understand the security tactics applied in the reference systems. Our results indicate that the supplemental material significantly improved the correctness of the participants' answers without requiring them to consult the documentation more. Most participants also stated in a self-assessment that their understanding of the security tactics used in the systems improved significantly because of the provided material, with the additional diagrams considered very helpful. In contrast, the perception of architectural metrics varied widely. We could also show that novice developers benefited most from the supplementary diagrams. In contrast, senior developers could rely on their experience to compensate for the lack of additional help. Contrary to our expectations, we found no significant correlation between the time spent solving the tasks and the overall correctness score achieved, meaning that participants who took more time to read the documentation did not automatically achieve better results. As far as we know, this empirical study is the first analysis that explores the influence of security annotations in component diagrams to guide software developers when assessing microservice system security.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"66"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11828814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143432508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing and mitigating (with LLMs) the security misconfigurations of Helm charts from Artifact Hub. 分析和减轻(使用llm)来自Artifact Hub的Helm图表的安全错误配置。
IF 3.6 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2025-07-04 DOI: 10.1007/s10664-025-10688-0
Francesco Minna, Fabio Massacci, Katja Tuma

Helm is a package manager that allows defining, installing, and upgrading applications with Kubernetes (K8s), a popular container orchestration platform. A Helm chart is a collection of files describing all dependencies, resources, and parameters required for deploying an application within a K8s cluster. This study aimed to mine and empirically evaluate the security of Helm charts, comparing the performance of existing tools in terms of misconfigurations reported by policies available by default, and measuring to what extent LLMs could be used for removing misconfigurations. For these reasons, we proposed a pipeline to mine Helm charts from Artifact Hub, a popular centralized repository, and analyze them using state-of-the-art open-source tools like Checkov and KICS. First, the pipeline runs several chart analyzers and identifies the common and unique misconfigurations reported by each tool. Secondly, it uses LLMs to suggest a mitigation for each misconfiguration. Finally, the LLM refactored chart previously generated is analyzed again by the same tools to see whether it satisfies the tool's policies. We also performed a manual analysis on a subset of charts to evaluate whether there are false positive misconfigurations from the tool's reporting and in the LLM refactoring. We found that (i) there is a significant difference between LLMs, (ii) providing a snippet of the YAML template as input might be insufficient compared to all resources, and (iii) even though LLMs can generate correct fixes, they may also delete other irrelevant configurations that break the application.

Helm是一个包管理器,它允许使用Kubernetes (k8)定义、安装和升级应用程序,Kubernetes是一个流行的容器编排平台。Helm图是描述在K8s集群中部署应用程序所需的所有依赖项、资源和参数的文件集合。本研究旨在挖掘和经验评估Helm图表的安全性,比较现有工具在默认可用策略报告的错误配置方面的性能,并衡量llm可用于消除错误配置的程度。由于这些原因,我们提出了从Artifact Hub(一个流行的集中式存储库)挖掘Helm图表的管道,并使用最先进的开源工具(如Checkov和KICS)分析它们。首先,管道运行几个图表分析器,并识别每个工具报告的常见和唯一的错误配置。其次,它使用llm来建议每个错误配置的缓解措施。最后,由相同的工具再次分析之前生成的LLM重构图,以查看它是否满足工具的策略。我们还对图表子集执行了手动分析,以评估工具报告和LLM重构中是否存在误报的错误配置。我们发现(i) llm之间存在显著差异,(ii)与所有资源相比,提供YAML模板的一个片段作为输入可能不够,以及(iii)即使llm可以生成正确的修复,它们也可能删除破坏应用程序的其他不相关配置。
{"title":"Analyzing and mitigating (with LLMs) the security misconfigurations of Helm charts from Artifact Hub.","authors":"Francesco Minna, Fabio Massacci, Katja Tuma","doi":"10.1007/s10664-025-10688-0","DOIUrl":"10.1007/s10664-025-10688-0","url":null,"abstract":"<p><p>Helm is a package manager that allows defining, installing, and upgrading applications with Kubernetes (K8s), a popular container orchestration platform. A Helm chart is a collection of files describing all dependencies, resources, and parameters required for deploying an application within a K8s cluster. This study aimed to mine and empirically evaluate the security of Helm charts, comparing the performance of existing tools in terms of misconfigurations reported by policies available by default, and measuring to what extent LLMs could be used for removing misconfigurations. For these reasons, we proposed a pipeline to mine Helm charts from Artifact Hub, a popular centralized repository, and analyze them using state-of-the-art open-source tools like Checkov and KICS. First, the pipeline runs several chart analyzers and identifies the common and unique misconfigurations reported by each tool. Secondly, it uses LLMs to suggest a mitigation for each misconfiguration. Finally, the LLM refactored chart previously generated is analyzed again by the same tools to see whether it satisfies the tool's policies. We also performed a manual analysis on a subset of charts to evaluate whether there are false positive misconfigurations from the tool's reporting and in the LLM refactoring. We found that (i) there is a significant difference between LLMs, (ii) providing a snippet of the YAML template as input might be insufficient compared to all resources, and (iii) even though LLMs can generate correct fixes, they may also delete other irrelevant configurations that break the application.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"132"},"PeriodicalIF":3.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12227474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144575074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of data complexity on classifier performance. 数据复杂性对分类器性能的影响。
IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2024-10-31 DOI: 10.1007/s10664-024-10554-5
Jonas Eberlein, Daniel Rodriguez, Rachel Harrison

The research area of Software Defect Prediction (SDP) is both extensive and popular, and is often treated as a classification problem. Improvements in classification, pre-processing and tuning techniques, (together with many factors which can influence model performance) have encouraged this trend. However, no matter the effort in these areas, it seems that there is a ceiling in the performance of the classification models used in SDP. In this paper, the issue of classifier performance is analysed from the perspective of data complexity. Specifically, data complexity metrics are calculated using the Unified Bug Dataset, a collection of well-known SDP datasets, and then checked for correlation with the defect prediction performance of machine learning classifiers (in particular, the classifiers C5.0, Naive Bayes, Artificial Neural Networks, Random Forests, and Support Vector Machines). In this work, different domains of competence and incompetence are identified for the classifiers. Similarities and differences between the classifiers and the performance metrics are found and the Unified Bug Dataset is analysed from the perspective of data complexity. We found that certain classifiers work best in certain situations and that all data complexity metrics can be problematic, although certain classifiers did excel in some situations.

软件缺陷预测(SDP)研究领域既广泛又流行,通常被视为一个分类问题。分类、预处理和调整技术(以及许多可能影响模型性能的因素)的改进推动了这一趋势。然而,无论在这些领域做出怎样的努力,SDP 中使用的分类模型的性能似乎都有一个上限。本文从数据复杂性的角度分析了分类器的性能问题。具体地说,数据复杂度指标是利用著名的 SDP 数据集 "统一错误数据集 "计算得出的,然后检查其与机器学习分类器(特别是分类器 C5.0、奈夫贝叶、人工神经网络、随机森林和支持向量机)的缺陷预测性能之间的相关性。在这项工作中,为分类器确定了能力和不称职的不同领域。我们发现了分类器和性能指标之间的异同,并从数据复杂性的角度对统一错误数据集进行了分析。我们发现,某些分类器在某些情况下效果最佳,尽管某些分类器在某些情况下表现出色,但所有数据复杂度指标都可能存在问题。
{"title":"The effect of data complexity on classifier performance.","authors":"Jonas Eberlein, Daniel Rodriguez, Rachel Harrison","doi":"10.1007/s10664-024-10554-5","DOIUrl":"10.1007/s10664-024-10554-5","url":null,"abstract":"<p><p>The research area of Software Defect Prediction (SDP) is both extensive and popular, and is often treated as a classification problem. Improvements in classification, pre-processing and tuning techniques, (together with many factors which can influence model performance) have encouraged this trend. However, no matter the effort in these areas, it seems that there is a ceiling in the performance of the classification models used in SDP. In this paper, the issue of classifier performance is analysed from the perspective of data complexity. Specifically, data complexity metrics are calculated using the Unified Bug Dataset, a collection of well-known SDP datasets, and then checked for correlation with the defect prediction performance of machine learning classifiers (in particular, the classifiers C5.0, Naive Bayes, Artificial Neural Networks, Random Forests, and Support Vector Machines). In this work, different domains of competence and incompetence are identified for the classifiers. Similarities and differences between the classifiers and the performance metrics are found and the Unified Bug Dataset is analysed from the perspective of data complexity. We found that certain classifiers work best in certain situations and that all data complexity metrics can be problematic, although certain classifiers did excel in some situations.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 1","pages":"16"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11527945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the adoption of security policies by developers in terraform across different cloud providers. 评估开发人员在跨不同云提供商的平台中采用的安全策略。
IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2025-02-27 DOI: 10.1007/s10664-024-10610-0
Alexandre Verdet, Mohammad Hamdaqa, Leuson Da Silva, Foutse Khomh

Cloud computing has become popular thanks to the widespread use of Infrastructure as Code (IaC) tools, allowing the community to manage and configure cloud infrastructure using scripts. However, the scripting process does not automatically prevent practitioners from introducing misconfigurations, vulnerabilities, or privacy risks. As a result, ensuring security relies on practitioners' understanding and the adoption of explicit policies. To understand how practitioners deal with this problem, we perform an empirical study analyzing the adoption of scripted security best practices present in Terraform files, applied on AWS, Azure, and Google Cloud. We assess the adoption of these practices by analyzing a sample of 812 open-source GitHub projects. We scan each project's configuration files, looking for policy implementation through static analysis (Checkov and Tfsec). The category Access policy emerges as the most widely adopted in all providers, while Encryption at rest presents the most neglected policies. Regarding the cloud providers, we observe that AWS and Azure present similar behavior regarding attended and neglected policies. Finally, we provide guidelines for cloud practitioners to limit infrastructure vulnerability and discuss further aspects associated with policies that have yet to be extensively embraced within the industry.

由于基础设施即代码(IaC)工具的广泛使用,云计算变得流行起来,它允许社区使用脚本来管理和配置云基础设施。然而,脚本编写过程并不能自动防止从业者引入错误配置、漏洞或隐私风险。因此,确保安全性依赖于从业者的理解和明确策略的采用。为了了解从业者如何处理这个问题,我们执行了一项实证研究,分析了在AWS、Azure和谷歌Cloud上应用的Terraform文件中脚本化安全最佳实践的采用情况。我们通过分析812个开源GitHub项目的样本来评估这些实践的采用情况。我们扫描每个项目的配置文件,通过静态分析(Checkov和Tfsec)寻找策略实现。类别访问策略是所有提供商中最广泛采用的策略,而静态加密策略是最容易被忽视的策略。关于云提供商,我们观察到AWS和Azure在参与和忽略策略方面表现出类似的行为。最后,我们为云计算从业者提供了限制基础设施漏洞的指导方针,并讨论了与行业内尚未广泛接受的策略相关的进一步方面。
{"title":"Assessing the adoption of security policies by developers in terraform across different cloud providers.","authors":"Alexandre Verdet, Mohammad Hamdaqa, Leuson Da Silva, Foutse Khomh","doi":"10.1007/s10664-024-10610-0","DOIUrl":"https://doi.org/10.1007/s10664-024-10610-0","url":null,"abstract":"<p><p>Cloud computing has become popular thanks to the widespread use of Infrastructure as Code (IaC) tools, allowing the community to manage and configure cloud infrastructure using scripts. However, the scripting process does not automatically prevent practitioners from introducing misconfigurations, vulnerabilities, or privacy risks. As a result, ensuring security relies on practitioners' understanding and the adoption of explicit policies. To understand how practitioners deal with this problem, we perform an empirical study analyzing the adoption of scripted security best practices present in Terraform files, applied on AWS, Azure, and Google Cloud. We assess the adoption of these practices by analyzing a sample of 812 open-source GitHub projects. We scan each project's configuration files, looking for policy implementation through static analysis (Checkov and Tfsec). The category <i>Access policy</i> emerges as the most widely adopted in all providers, while <i>Encryption at rest</i> presents the most neglected policies. Regarding the cloud providers, we observe that AWS and Azure present similar behavior regarding attended and neglected policies. Finally, we provide guidelines for cloud practitioners to limit infrastructure vulnerability and discuss further aspects associated with policies that have yet to be extensively embraced within the industry.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"74"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11868142/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143540588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Empirical Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1