Automated Software Engineering最新文献_第2页

Automated system-level testing of unmanned aerial systems 无人驾驶航空系统的自动化系统级测试

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-08-01 DOI: 10.1007/s10515-024-00462-9

Hassan Sartaj, Asmar Muqeet, Muhammad Zohaib Iqbal, Muhammad Uzair Khan

Unmanned aerial systems (UAS) rely on various avionics systems that are safety-critical and mission-critical. A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems. The current industrial practice is to manually create test scenarios, manually/automatically execute these scenarios using simulators, and manually evaluate outcomes. The test scenarios typically consist of setting certain flight or environment conditions and testing the system under test in these settings. The state-of-the-art approaches for this purpose also require manual test scenario development and evaluation. In this paper, we propose a novel approach to automate the system-level testing of the UAS. The proposed approach (namely AITester) utilizes model-based testing and artificial intelligence (AI) techniques to automatically generate, execute, and evaluate various test scenarios. The test scenarios are generated on the fly, i.e., during test execution based on the environmental context at runtime. The approach is supported by a toolset. We empirically evaluated the proposed approach on two core components of UAS, an autopilot system of an unmanned aerial vehicle (UAV) and cockpit display systems (CDS) of the ground control station (GCS). The results show that the AITester effectively generates test scenarios causing deviations from the expected behavior of the UAV autopilot and reveals potential flaws in the GCS-CDS.

无人机系统（UAS）依赖于各种对安全和任务至关重要的航空电子系统。国际安全标准的一个主要要求是对航空电子软件系统进行严格的系统级测试。目前的工业实践是手动创建测试场景，使用模拟器手动/自动执行这些场景，并手动评估结果。测试场景通常包括设置某些飞行或环境条件，并在这些条件下测试被测系统。最先进的方法也需要手动开发和评估测试场景。在本文中，我们提出了一种新颖的无人机系统级自动测试方法。所提出的方法（即 AITester）利用基于模型的测试和人工智能（AI）技术自动生成、执行和评估各种测试场景。测试场景是在运行过程中根据运行时的环境背景即时生成的。该方法由一个工具集提供支持。我们对无人机系统的两个核心组件--无人机（UAV）的自动驾驶系统和地面控制站（GCS）的驾驶舱显示系统（CDS）--进行了实证评估。结果表明，AITester 能有效生成测试场景，使无人机自动驾驶仪的预期行为出现偏差，并揭示地面控制站驾驶舱显示系统的潜在缺陷。

{"title":"Automated system-level testing of unmanned aerial systems","authors":"Hassan Sartaj, Asmar Muqeet, Muhammad Zohaib Iqbal, Muhammad Uzair Khan","doi":"10.1007/s10515-024-00462-9","DOIUrl":"10.1007/s10515-024-00462-9","url":null,"abstract":"<div><p>Unmanned aerial systems (UAS) rely on various avionics systems that are safety-critical and mission-critical. A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems. The current industrial practice is to manually create test scenarios, manually/automatically execute these scenarios using simulators, and manually evaluate outcomes. The test scenarios typically consist of setting certain flight or environment conditions and testing the system under test in these settings. The state-of-the-art approaches for this purpose also require manual test scenario development and evaluation. In this paper, we propose a novel approach to automate the system-level testing of the UAS. The proposed approach (namely <span>AITester</span>) utilizes model-based testing and artificial intelligence (AI) techniques to automatically generate, execute, and evaluate various test scenarios. The test scenarios are generated on the fly, i.e., during test execution based on the environmental context at runtime. The approach is supported by a toolset. We empirically evaluated the proposed approach on two core components of UAS, an autopilot system of an unmanned aerial vehicle (UAV) and cockpit display systems (CDS) of the ground control station (GCS). The results show that the <span>AITester</span> effectively generates test scenarios causing deviations from the expected behavior of the UAV autopilot and reveals potential flaws in the GCS-CDS.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Angels or demons: investigating and detecting decentralized financial traps on ethereum smart contracts 天使还是魔鬼：调查和检测以太坊智能合约上的去中心化金融陷阱

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-29 DOI: 10.1007/s10515-024-00459-4

Jiachi Chen, Jiang Hu, Xin Xia, David Lo, John Grundy, Zhipeng Gao, Ting Chen

Decentralized Finance (DeFi) uses blockchain technologies to transform traditional financial activities into decentralized platforms that run without intermediaries and centralized institutions. Smart contracts are programs that run on the blockchain, and by utilizing smart contracts, developers can more easily develop DeFi applications. Some key features of smart contracts—self-executed and immutability—ensure the trustworthiness, transparency and efficiency of DeFi applications and have led to a fast-growing DeFi market. However, misbehaving developers can add traps or backdoor code snippets to a smart contract, which are hard for contract users to discover. We call these code snippets in a DeFi smart contract as “DeFi Contract Traps” (DCTs). In this paper, we identify five DeFi contract traps and introduce their behaviors, describe how attackers use them to make unfair profits and analyze their prevalence in the Ethereum platform. We propose a symbolic execution tool, DeFiDefender, to detect such traps and use a manually labeled small-scale dataset that consists of 700 smart contracts to evaluate it. Our results show that our tool is not only highly effective but also highly efficient.DeFiDefender only needs 0.48 s to analyze one DeFi smart contract and obtains a high average accuracy (98.17%), precision (99.74%)and recall (89.24%). Among the five DeFi contract traps introduced in this paper, four of them can be detected through contract bytecode without the need for source code. We also apply DeFiDefender to a large-scale dataset that consists of 20,679 real DeFi-related Ethereum smart contracts. We found that 52.13% of these DeFi smart contracts contain at least one contract trap. Although a smart contract that contains contract traps is not necessarily malicious, our finding suggests that DeFi-related contracts have many centralized issues in a zero-trust environment and in the absence of a trusted party.

去中心化金融（DeFi）利用区块链技术将传统金融活动转变为去中心化平台，在没有中介和中心化机构的情况下运行。智能合约是在区块链上运行的程序，通过利用智能合约，开发人员可以更轻松地开发 DeFi 应用程序。智能合约的一些关键特性--自我执行和不可更改性--确保了 DeFi 应用程序的可信度、透明度和效率，并催生了一个快速增长的 DeFi 市场。然而，行为不端的开发者可能会在智能合约中添加陷阱或后门代码片段，而这些代码片段很难被合约用户发现。我们把 DeFi 智能合约中的这些代码片段称为 "DeFi 合约陷阱"（DeFi Contract Traps，DCTs）。在本文中，我们确定了五种 DeFi 合约陷阱并介绍了它们的行为，描述了攻击者如何利用它们来牟取不正当利益，并分析了它们在以太坊平台中的普遍性。我们提出了一个符号执行工具 DeFiDefender 来检测这些陷阱，并使用一个由 700 个智能合约组成的人工标记的小规模数据集对其进行评估。结果表明，我们的工具不仅高效，而且高效。DeFiDefender分析一份DeFi智能合约仅需0.48秒，并获得了较高的平均准确率（98.17%）、精确率（99.74%）和召回率（89.24%）。在本文介绍的五种 DeFi 合约陷阱中，有四种可以通过合约字节码检测出来，无需源代码。我们还将 DeFiDefender 应用于一个大规模数据集，该数据集由 20679 个真实的 DeFi 相关以太坊智能合约组成。我们发现，52.13% 的 DeFi 智能合约至少包含一个合约陷阱。尽管包含合约陷阱的智能合约并不一定是恶意的，但我们的发现表明，在零信任环境和缺乏可信方的情况下，与 DeFi 相关的合约存在许多中心化问题。

{"title":"Angels or demons: investigating and detecting decentralized financial traps on ethereum smart contracts","authors":"Jiachi Chen, Jiang Hu, Xin Xia, David Lo, John Grundy, Zhipeng Gao, Ting Chen","doi":"10.1007/s10515-024-00459-4","DOIUrl":"10.1007/s10515-024-00459-4","url":null,"abstract":"<div><p>Decentralized Finance (DeFi) uses blockchain technologies to transform traditional financial activities into decentralized platforms that run without intermediaries and centralized institutions. Smart contracts are programs that run on the blockchain, and by utilizing smart contracts, developers can more easily develop DeFi applications. Some key features of smart contracts—self-executed and immutability—ensure the trustworthiness, transparency and efficiency of DeFi applications and have led to a fast-growing DeFi market. However, misbehaving developers can add traps or backdoor code snippets to a smart contract, which are hard for contract users to discover. We call these code snippets in a DeFi smart contract as “<i>DeFi Contract Traps</i>” (DCTs). In this paper, we identify five DeFi contract traps and introduce their behaviors, describe how attackers use them to make unfair profits and analyze their prevalence in the Ethereum platform. We propose a symbolic execution tool, <span>DeFiDefender</span>, to detect such traps and use a manually labeled small-scale dataset that consists of 700 smart contracts to evaluate it. Our results show that our tool is not only highly effective but also highly efficient.<span>DeFiDefender</span> only needs 0.48 s to analyze one DeFi smart contract and obtains a high average accuracy (98.17%), precision (99.74%)and recall (89.24%). Among the five DeFi contract traps introduced in this paper, four of them can be detected through contract bytecode without the need for source code. We also apply <span>DeFiDefender</span> to a large-scale dataset that consists of 20,679 real DeFi-related Ethereum smart contracts. We found that 52.13% of these DeFi smart contracts contain at least one contract trap. Although a smart contract that contains contract traps is not necessarily malicious, our finding suggests that DeFi-related contracts have many centralized issues in a zero-trust environment and in the absence of a trusted party.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Revisiting file context for source code summarization 重新审视源代码摘要的文件上下文

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-27 DOI: 10.1007/s10515-024-00460-x

Chia-Yi Su, Aakash Bansal, Collin McMillan

Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder–decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself—that information often resides in other nearby code. In this paper, we revisit the idea of “file context” for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several baselines. We find that file context helps on a subset of challenging examples where traditional approaches struggle.

源代码摘要是编写源代码自然语言描述的任务。一个典型的用例是生成用于 API 文档的子程序简短摘要。目前几乎所有代码摘要研究的核心都是编码器-解码器神经架构，而编码器的输入几乎总是单个子程序或其他简短代码片段。这种设置的问题在于，描述代码所需的信息往往不存在于代码本身--这些信息往往存在于附近的其他代码中。在本文中，我们重新审视了用于代码摘要的 "文件上下文 "理念。文件上下文是指对同一文件中其他子程序的选择信息进行编码。我们对 Transformer 架构提出了一种新的修改方案，专门用于对文件上下文进行编码，并展示了它与几种基线相比的改进。我们发现，文件上下文有助于解决传统方法难以解决的一部分具有挑战性的例子。

引用次数: 0

TM-fuzzer: fuzzing autonomous driving systems through traffic management TM-fuzzer：通过交通管理对自动驾驶系统进行模糊测试

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-27 DOI: 10.1007/s10515-024-00461-w

Shenghao Lin, Fansong Chen, Laile Xi, Gaosheng Wang, Rongrong Xi, Yuyan Sun, Hongsong Zhu

Simulation testing of Autonomous Driving Systems (ADS) is crucial for ensuring the safety of autonomous vehicles. Currently, scenarios searched by ADS simulation testing tools are less likely to expose ADS issues and highly similar. In this paper, we propose TM-fuzzer, a novel approach for searching ADS test scenarios, which utilizes real-time traffic management and diversity analysis to search security-critical and unique scenarios within the infinite scenario space. TM-fuzzer dynamically manages traffic flow by manipulating non-player characters near autonomous vehicle throughout the simulation process to enhance the efficiency of test scenarios. Additionally, the TM-fuzzer utilizes clustering analysis on vehicle trajectory graphs within scenarios to increase the diversity of test scenarios. Compared to the baseline, the TM-fuzzer identified 29 unique violated scenarios more than four times faster and enhanced the incidence of ADS-caused violations by 26.26%. Experiments suggest that the TM-fuzzer demonstrates improved efficiency and accuracy.

自动驾驶系统（ADS）的模拟测试对于确保自动驾驶汽车的安全性至关重要。目前，ADS 仿真测试工具所搜索的场景不太可能暴露出 ADS 问题，而且高度相似。在本文中，我们提出了一种搜索 ADS 测试场景的新方法 TM-fuzzer，它利用实时流量管理和多样性分析，在无限的场景空间中搜索安全关键和独特的场景。在整个模拟过程中，TM-fuzzer 通过操纵自主车辆附近的非玩家角色来动态管理交通流，从而提高测试场景的效率。此外，TM-fuzzer 还对场景中的车辆轨迹图进行聚类分析，以增加测试场景的多样性。与基线相比，TM-fuzzer 识别 29 个独特违规场景的速度提高了四倍多，并将 ADS 引起的违规发生率提高了 26.26%。实验表明，TM-模糊器提高了效率和准确性。

引用次数: 0

Rethinking AI code generation: a one-shot correction approach based on user feedback 反思人工智能代码生成：基于用户反馈的一次性修正方法

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-12 DOI: 10.1007/s10515-024-00451-y

Kim Tuyen Le, Artur Andrzejak

Code generation has become an integral feature of modern IDEs, gathering significant attention. Notable approaches like GitHub Copilot and TabNine have been proposed to tackle this task. However, these tools may shift code writing tasks towards code reviewing, which involves modification from users. Despite the advantages of user feedback, their responses remain transient and lack persistence across interaction sessions. This is attributed to the inherent characteristics of generative AI models, which require explicit re-training for new data integration. Additionally, the non-deterministic and unpredictable nature of AI-powered models limits thorough examination of their unforeseen behaviors. We propose a methodology named One-shot Correction to mitigate these issues in natural language to code translation models with no additional re-training. We utilize decomposition techniques to break down code translation into sub-problems. The final code is constructed using code snippets of each query chunk, extracted from user feedback or selectively generated from a generative model. Our evaluation indicates comparable or improved performance compared to other models. Moreover, the methodology offers straightforward and interpretable approaches, which enable in-depth examination of unexpected results and facilitate insights for potential enhancements. We also illustrate that user feedback can substantially improve code translation models without re-training. Ultimately, we develop a preliminary GUI application to demonstrate the utility of our methodology in simplifying customization and assessment of suggested code for users.

代码生成已成为现代集成开发环境的一项不可或缺的功能，备受关注。GitHub Copilot 和 TabNine 等著名方法已被提出来解决这一任务。然而，这些工具可能会将代码编写任务转向代码审查，这涉及到用户的修改。尽管用户反馈有很多优点，但他们的反应仍然是短暂的，在不同的交互会话中缺乏持久性。这归因于生成式人工智能模型的固有特征，即需要对新的数据整合进行明确的再训练。此外，人工智能模型的非确定性和不可预测性也限制了对其不可预见行为的彻底检查。我们提出了一种名为 "一次性修正"（One-shot Correction）的方法，以缓解自然语言到代码翻译模型中的这些问题，而无需额外的再训练。我们利用分解技术将代码翻译分解为多个子问题。最终的代码是使用每个查询块的代码片段构建的，这些片段从用户反馈中提取，或有选择地从生成模型中生成。我们的评估结果表明，与其他模型相比，该方法的性能相当或有所提高。此外，该方法还提供了简单明了、可解释的方法，从而能够深入研究意想不到的结果，并有助于深入了解潜在的改进措施。我们还说明，用户反馈可以大幅改进代码翻译模型，而无需重新训练。最后，我们开发了一个初步的图形用户界面应用程序，以展示我们的方法在简化用户定制和评估建议代码方面的实用性。

{"title":"Rethinking AI code generation: a one-shot correction approach based on user feedback","authors":"Kim Tuyen Le, Artur Andrzejak","doi":"10.1007/s10515-024-00451-y","DOIUrl":"10.1007/s10515-024-00451-y","url":null,"abstract":"<div><p>Code generation has become an integral feature of modern IDEs, gathering significant attention. Notable approaches like GitHub Copilot and TabNine have been proposed to tackle this task. However, these tools may shift code writing tasks towards code reviewing, which involves modification from users. Despite the advantages of user feedback, their responses remain transient and lack persistence across interaction sessions. This is attributed to the inherent characteristics of generative AI models, which require explicit re-training for new data integration. Additionally, the non-deterministic and unpredictable nature of AI-powered models limits thorough examination of their unforeseen behaviors. We propose a methodology named <i>One-shot Correction</i> to mitigate these issues in natural language to code translation models with no additional re-training. We utilize decomposition techniques to break down code translation into sub-problems. The final code is constructed using code snippets of each query chunk, extracted from user feedback or selectively generated from a generative model. Our evaluation indicates comparable or improved performance compared to other models. Moreover, the methodology offers straightforward and interpretable approaches, which enable in-depth examination of unexpected results and facilitate insights for potential enhancements. We also illustrate that user feedback can substantially improve code translation models without re-training. Ultimately, we develop a preliminary GUI application to demonstrate the utility of our methodology in simplifying customization and assessment of suggested code for users.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00451-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interactive search-based Product Line Architecture design 基于搜索的互动式产品线架构设计

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-09 DOI: 10.1007/s10515-024-00457-6

Willian Marques Freire, Cláudia Tupan Rosa, Aline Maria Malachini Miotto Amaral, Thelma Elita Colanzi

Software Product Line (SPL) is an approach derived from other engineering fields that use reuse techniques for a family of products in a given domain. An essential artifact of SPL is the Product Line Architecture (PLA), which identifies elements characterized by variation points, variability, and variants. The PLA aims to anticipate design decisions to obtain features such as reusability and modularity. Nevertheless, getting a reusable and modular PLA and following pre-defined standards can be a complex task involving several conflicting objectives. In this sense, PLA can be formulated as a multiobjective optimization problem. This research presents an approach that helps DMs (Decision Makers) to interactively optimize the PLAs through several strategies such as interactive optimization and Machine Learning (ML) algorithms. The interactive multiobjective optimization approach for PLA design (iMOA4PLA) uses specific metrics for the PLA optimization problem, implemented through the OPLA-Tool v2.0. In this approach, the architect assumes the role of DM during the search process, guiding the evolution of PLAs through various strategies proposed in previous works. Two quantitative and one qualitative experiments were performed to evaluate the iMOA4PLA. The results showed that this approach can assist the PLA optimization process by meeting more than 90% of DM preferences. The scientific contribution of this work lies in providing an approach for the PLA design and evaluation that leverages the benefits of machine learning algorithms and can serve as a basis for different SE contexts.

软件产品线（SPL）是从其他工程领域衍生出来的一种方法，它将重用技术用于特定领域的产品系列。产品线架构（PLA）是 SPL 的一个重要工具，它可识别以变异点、变异性和变体为特征的元素。产品线架构旨在预测设计决策，以获得可重用性和模块化等特性。然而，要获得可重复使用和模块化的 PLA 并遵循预定义的标准，可能是一项复杂的任务，涉及多个相互冲突的目标。从这个意义上说，PLA 可以表述为一个多目标优化问题。本研究提出了一种方法，通过交互式优化和机器学习（ML）算法等几种策略，帮助 DM（决策者）交互式优化 PLA。PLA 设计的交互式多目标优化方法（iMOA4PLA）使用特定的指标来解决 PLA 优化问题，并通过 OPLA-Tool v2.0 实现。在这种方法中，建筑师在搜索过程中扮演了 DM 的角色，通过之前工作中提出的各种策略指导 PLA 的演化。为了评估 iMOA4PLA，我们进行了两次定量实验和一次定性实验。结果表明，这种方法可以协助 PLA 优化过程，满足 90% 以上的 DM 偏好。这项工作的科学贡献在于为 PLA 设计和评估提供了一种方法，这种方法充分利用了机器学习算法的优势，可作为不同 SE 环境的基础。

{"title":"Interactive search-based Product Line Architecture design","authors":"Willian Marques Freire, Cláudia Tupan Rosa, Aline Maria Malachini Miotto Amaral, Thelma Elita Colanzi","doi":"10.1007/s10515-024-00457-6","DOIUrl":"10.1007/s10515-024-00457-6","url":null,"abstract":"<div><p>Software Product Line (SPL) is an approach derived from other engineering fields that use reuse techniques for a family of products in a given domain. An essential artifact of SPL is the Product Line Architecture (PLA), which identifies elements characterized by variation points, variability, and variants. The PLA aims to anticipate design decisions to obtain features such as reusability and modularity. Nevertheless, getting a reusable and modular PLA and following pre-defined standards can be a complex task involving several conflicting objectives. In this sense, PLA can be formulated as a multiobjective optimization problem. This research presents an approach that helps DMs (Decision Makers) to interactively optimize the PLAs through several strategies such as interactive optimization and Machine Learning (ML) algorithms. The interactive multiobjective optimization approach for PLA design (iMOA4PLA) uses specific metrics for the PLA optimization problem, implemented through the OPLA-Tool v2.0. In this approach, the architect assumes the role of DM during the search process, guiding the evolution of PLAs through various strategies proposed in previous works. Two quantitative and one qualitative experiments were performed to evaluate the iMOA4PLA. The results showed that this approach can assist the PLA optimization process by meeting more than 90% of DM preferences. The scientific contribution of this work lies in providing an approach for the PLA design and evaluation that leverages the benefits of machine learning algorithms and can serve as a basis for different SE contexts.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing regression testing with AHP-TOPSIS metric system for effective technical debt evaluation 利用 AHP-TOPSIS 指标体系优化回归测试，有效评估技术债务

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-08 DOI: 10.1007/s10515-024-00458-5

Anis Zarrad, Rami Bahsoon, Priya Manimaran

Regression testing is essential to ensure that the actual software product confirms the expected requirements following modification. However, it can be costly and time-consuming. To address this issue, various approaches have been proposed for selecting test cases that provide adequate coverage of the modified software. Nonetheless, problems related to omitting and/or rerunning unnecessary test cases continue to pose challenges, particularly with regard to technical debt (TD) resulting from code coverage shortcomings and/or overtesting. In the case of testing-related shortcomings, incurring TD may result in cost and time savings in the short run, but it can lead to future maintenance and testing expenses. Most prior studies have treated test case selection as a single-objective or two-objective optimization problem. This study introduces a multi-objective decision-making approach to quantify and evaluate TD in regression testing. The proposed approach combines the analytic-hierarchy-process (AHP) method and the technique of order preference by similarity to an ideal solution (TOPSIS) to select the most ideal test cases in terms of objective values defined by the test cost, code coverage, and test risk. This approach effectively manages the software regression testing problems. The AHP method was used to eliminate subjective bias when optimizing objective weights, while the TOPSIS method was employed to evaluate and select test-case alternatives based on TD. The effectiveness of this approach was compared to that of a specific multi-objective optimization method and a standard coverage methodology. Unlike other approaches, our proposed approach always accepts solutions based on balanced decisions by considering modifications and using risk analysis and testing costs against potential technical debt. The results demonstrate that our proposed approach reduces both TD and regression testing efforts.

回归测试对于确保修改后的实际软件产品符合预期要求至关重要。然而，回归测试既费钱又费时。为解决这一问题，人们提出了各种方法来选择测试用例，以充分覆盖修改后的软件。然而，与遗漏和/或重新运行不必要的测试用例有关的问题仍然是一个挑战，特别是代码覆盖缺陷和/或过度测试所导致的技术债务（TD）。就与测试相关的缺陷而言，产生 TD 可能会在短期内节省成本和时间，但会导致未来的维护和测试费用。之前的大多数研究都将测试用例选择视为单目标或双目标优化问题。本研究介绍了一种多目标决策方法，用于量化和评估回归测试中的 TD。所提出的方法结合了层次分析法（AHP）和理想解相似度排序偏好技术（TOPSIS），可根据测试成本、代码覆盖率和测试风险定义的目标值选择最理想的测试用例。这种方法能有效管理软件回归测试问题。在优化目标权重时，使用了 AHP 方法来消除主观偏差，同时使用了 TOPSIS 方法来评估和选择基于 TD 的测试用例备选方案。该方法的有效性与特定多目标优化方法和标准覆盖方法进行了比较。与其他方法不同的是，我们提出的方法通过考虑修改和使用风险分析以及测试成本与潜在技术债务的对比，始终接受基于平衡决策的解决方案。结果表明，我们提出的方法可以减少 TD 和回归测试的工作量。

{"title":"Optimizing regression testing with AHP-TOPSIS metric system for effective technical debt evaluation","authors":"Anis Zarrad, Rami Bahsoon, Priya Manimaran","doi":"10.1007/s10515-024-00458-5","DOIUrl":"10.1007/s10515-024-00458-5","url":null,"abstract":"<div><p>Regression testing is essential to ensure that the actual software product confirms the expected requirements following modification. However, it can be costly and time-consuming. To address this issue, various approaches have been proposed for selecting test cases that provide adequate coverage of the modified software. Nonetheless, problems related to omitting and/or rerunning unnecessary test cases continue to pose challenges, particularly with regard to technical debt (TD) resulting from code coverage shortcomings and/or overtesting. In the case of testing-related shortcomings, incurring TD may result in cost and time savings in the short run, but it can lead to future maintenance and testing expenses. Most prior studies have treated test case selection as a single-objective or two-objective optimization problem. This study introduces a multi-objective decision-making approach to quantify and evaluate TD in regression testing. The proposed approach combines the analytic-hierarchy-process (AHP) method and the technique of order preference by similarity to an ideal solution (TOPSIS) to select the most ideal test cases in terms of objective values defined by the test cost, code coverage, and test risk. This approach effectively manages the software regression testing problems. The AHP method was used to eliminate subjective bias when optimizing objective weights, while the TOPSIS method was employed to evaluate and select test-case alternatives based on TD. The effectiveness of this approach was compared to that of a specific multi-objective optimization method and a standard coverage methodology. Unlike other approaches, our proposed approach always accepts solutions based on balanced decisions by considering modifications and using risk analysis and testing costs against potential technical debt. The results demonstrate that our proposed approach reduces both TD and regression testing efforts.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00458-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge-enhanced software refinement: leveraging reinforcement learning for search-based quality engineering 知识强化软件完善：利用强化学习实现基于搜索的质量工程

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-06-25 DOI: 10.1007/s10515-024-00456-7

Maryam Nooraei Abadeh

In the rapidly evolving software development industry, the early identification of optimal design alternatives and accurate performance prediction are critical for developing efficient software products. This paper introduces a novel approach to software refinement, termed Reinforcement Learning-based Software Refinement (RLSR), which leverages Reinforcement Learning techniques to address this challenge. RLSR enables an automated software refinement process that incorporates quality-driven intelligent software development as an early decision-making strategy. By proposing a Q-learning-based approach, RLSR facilitates the automatic refinement of software in dynamic environments while optimizing the utilization of computational resources and time. Additionally, the convergence rate to an optimal policy during the refinement process is investigated. The results demonstrate that training the policy using throughput values leads to significantly faster convergence to optimal rewards. This study evaluates RLSR based on various metrics, including episode length, reward over time, and reward distributions on a running example. Furthermore, to illustrate the effectiveness and applicability of the proposed method, a comparative analysis is applied to three refinable software designs, such as the E-commerce platform, smart booking platform, and Web-based GIS transformation system. The comparison between Q-learning and the proposed algorithm reveals that the refinement outcomes achieved with the proposed algorithm are superior, particularly when an adequate number of learning steps and a comprehensive historical dataset are available. The findings emphasize the potential of leveraging reinforcement learning techniques for automating software refinement and improving the efficiency of the model-driven development process.

在快速发展的软件开发行业中，尽早识别最佳设计方案和准确预测性能对于开发高效的软件产品至关重要。本文介绍了一种新颖的软件完善方法，即基于强化学习的软件完善（RLSR），它利用强化学习技术来应对这一挑战。RLSR 可实现自动化软件完善流程，将质量驱动型智能软件开发作为早期决策策略。通过提出一种基于 Q 学习的方法，RLSR 可促进动态环境中软件的自动完善，同时优化计算资源和时间的利用。此外，还研究了细化过程中最优策略的收敛率。结果表明，使用吞吐量值对策略进行训练可显著加快向最优奖励的收敛速度。本研究基于各种指标对 RLSR 进行了评估，包括运行示例中的插曲长度、随时间变化的奖励和奖励分布。此外，为了说明所提方法的有效性和适用性，还对电子商务平台、智能预订平台和基于网络的地理信息系统转换系统等三个可完善的软件设计进行了对比分析。通过对 Q-learning 和所提算法的比较发现，所提算法取得的精炼结果更优越，尤其是在有足够数量的学习步骤和全面的历史数据集的情况下。研究结果强调了利用强化学习技术实现软件完善自动化和提高模型驱动开发流程效率的潜力。

{"title":"Knowledge-enhanced software refinement: leveraging reinforcement learning for search-based quality engineering","authors":"Maryam Nooraei Abadeh","doi":"10.1007/s10515-024-00456-7","DOIUrl":"10.1007/s10515-024-00456-7","url":null,"abstract":"<div><p>In the rapidly evolving software development industry, the early identification of optimal design alternatives and accurate performance prediction are critical for developing efficient software products. This paper introduces a novel approach to software refinement, termed Reinforcement Learning-based Software Refinement (RLSR), which leverages Reinforcement Learning techniques to address this challenge. RLSR enables an automated software refinement process that incorporates quality-driven intelligent software development as an early decision-making strategy. By proposing a Q-learning-based approach, RLSR facilitates the automatic refinement of software in dynamic environments while optimizing the utilization of computational resources and time. Additionally, the convergence rate to an optimal policy during the refinement process is investigated. The results demonstrate that training the policy using throughput values leads to significantly faster convergence to optimal rewards. This study evaluates RLSR based on various metrics, including episode length, reward over time, and reward distributions on a running example. Furthermore, to illustrate the effectiveness and applicability of the proposed method, a comparative analysis is applied to three refinable software designs, such as the E-commerce platform, smart booking platform, and Web-based GIS transformation system. The comparison between Q-learning and the proposed algorithm reveals that the refinement outcomes achieved with the proposed algorithm are superior, particularly when an adequate number of learning steps and a comprehensive historical dataset are available. The findings emphasize the potential of leveraging reinforcement learning techniques for automating software refinement and improving the efficiency of the model-driven development process.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical study of data sampling techniques for just-in-time software defect prediction 及时软件缺陷预测的数据抽样技术实证研究

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-06-22 DOI: 10.1007/s10515-024-00455-8

Zhiqiang Li, Qiannan Du, Hongyu Zhang, Xiao-Yuan Jing, Fei Wu

Just-in-time software defect prediction (JIT-SDP) is a fine-grained, easy-to-trace, and practical method. Unfortunately, JIT-SDP usually suffers from the class imbalance problem, which affects the performance of the models. Data sampling is one of the commonly used class imbalance techniques to overcome this problem. However, there is a lack of comprehensive empirical studies to compare different data sampling techniques on the performance of JIT-SDP. In this paper, we consider both defect classification and defect ranking, two typical application scenarios. To this end, we performed an empirical comparison of 10 data sampling algorithms on the performance of JIT-SDP. Extensive experiments on 10 open-source projects with 12 performance measures show that the effectiveness of data sampling techniques can indeed vary relying on the specific evaluation measures in both defect classification and defect ranking scenarios. Specifically, the RUM algorithm has demonstrated superior performance overall in the context of defect classification, particularly in F-measure, AUC, and MCC. On the other hand, for defect ranking, the ENN algorithm has emerged as the most favorable option, exhibiting perfect results in (P_{opt}), Recall@20%, and F-measure@20%. However, data sampling techniques can lead to an increase in false alarms and require the inspection of a higher number of changes. These findings highlight the importance of carefully selecting the appropriate data sampling technique based on the specific evaluation measures for different scenarios.

准时软件缺陷预测（JIT-SDP）是一种细粒度、易于跟踪且实用的方法。遗憾的是，JIT-SDP 通常存在类不平衡问题，这会影响模型的性能。数据抽样是克服这一问题的常用类不平衡技术之一。然而，目前还缺乏全面的实证研究来比较不同数据抽样技术对 JIT-SDP 性能的影响。在本文中，我们考虑了缺陷分类和缺陷排序这两种典型的应用场景。为此，我们对 10 种数据采样算法在 JIT-SDP 性能方面的表现进行了实证比较。在 10 个开源项目中对 12 个性能指标进行的广泛实验表明，在缺陷分类和缺陷排序场景中，数据抽样技术的有效性确实会因具体评估指标的不同而不同。具体来说，RUM 算法在缺陷分类方面表现出了更优越的整体性能，尤其是在 F-measure、AUC 和 MCC 方面。另一方面，在缺陷排序方面，ENN 算法成为最有利的选择，在 (P_{opt})、Recall@20% 和 F-measure@20% 方面都表现出完美的结果。然而，数据采样技术可能会导致误报增加，并需要检查更多的变化。这些发现凸显了根据不同场景的具体评估指标仔细选择合适的数据抽样技术的重要性。

{"title":"An empirical study of data sampling techniques for just-in-time software defect prediction","authors":"Zhiqiang Li, Qiannan Du, Hongyu Zhang, Xiao-Yuan Jing, Fei Wu","doi":"10.1007/s10515-024-00455-8","DOIUrl":"10.1007/s10515-024-00455-8","url":null,"abstract":"<div><p>Just-in-time software defect prediction (JIT-SDP) is a fine-grained, easy-to-trace, and practical method. Unfortunately, JIT-SDP usually suffers from the class imbalance problem, which affects the performance of the models. Data sampling is one of the commonly used class imbalance techniques to overcome this problem. However, there is a lack of comprehensive empirical studies to compare different data sampling techniques on the performance of JIT-SDP. In this paper, we consider both defect classification and defect ranking, two typical application scenarios. To this end, we performed an empirical comparison of 10 data sampling algorithms on the performance of JIT-SDP. Extensive experiments on 10 open-source projects with 12 performance measures show that the effectiveness of data sampling techniques can indeed vary relying on the specific evaluation measures in both defect classification and defect ranking scenarios. Specifically, the RUM algorithm has demonstrated superior performance overall in the context of defect classification, particularly in <i>F-measure</i>, <i>AUC</i>, and <i>MCC</i>. On the other hand, for defect ranking, the ENN algorithm has emerged as the most favorable option, exhibiting perfect results in <span>(P_{opt})</span>, <i>Recall@20%</i>, and <i>F-measure@20%</i>. However, data sampling techniques can lead to an increase in false alarms and require the inspection of a higher number of changes. These findings highlight the importance of carefully selecting the appropriate data sampling technique based on the specific evaluation measures for different scenarios.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration SimAC：模拟敏捷协作，在用户故事阐述中生成验收标准

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-06-21 DOI: 10.1007/s10515-024-00448-7

Yishu Li, Jacky Keung, Zhen Yang, Xiaoxue Ma, Jingyu Zhang, Shuo Liu

In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.

在敏捷需求工程中，生成验收标准（GAC）以阐述用户故事在冲刺计划阶段起着关键作用，它为交付功能解决方案提供了参考。GAC 需要广泛的协作和人工参与。然而，由于缺乏为附有验收标准的用户故事（US-AC）量身定制的标记数据集，这给试图将这一过程自动化的监督学习技术带来了巨大挑战。大型语言模型（LLMs）的最新进展展示了其卓越的文本生成能力，绕过了监督微调的需要。因此，LLM 具备克服上述挑战的潜力。受此启发，我们提出了 SimAC，一个利用 LLM 模拟敏捷协作的框架，其中包含三个不同的角色组：需求分析师、质量分析师和其他。在基于角色的提示启动下，LLMs 按照 GAC 中的创建-更新-再创建-再更新模式依次扮演这些角色。由于无法获得基本事实，我们邀请从业人员建立了一个黄金标准，作为评估自动生成的 US-AC 与人工创建的 US-AC 的完整性和有效性的基准。此外，我们还邀请了八位经验丰富的敏捷实践者使用 INVEST 框架评估 US-AC 的质量。结果表明，所有测试过的 LLM（包括 LLaMA 和 GPT-3.5 系列）都得到了一致的改进。值得注意的是，SimAC 显著增强了 GPT-3.5-turbo 在 GAC 中的能力，在完整性和有效性方面分别提高了 29.48% 和 15.56%，INVEST 满意度得分最高，分别为 3.21/4。此外，本研究还通过案例分析说明了 SimAC 的有效性和局限性，揭示了 LLM 在自动化敏捷需求工程中的潜力。

{"title":"SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration","authors":"Yishu Li, Jacky Keung, Zhen Yang, Xiaoxue Ma, Jingyu Zhang, Shuo Liu","doi":"10.1007/s10515-024-00448-7","DOIUrl":"10.1007/s10515-024-00448-7","url":null,"abstract":"<div><p>In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0