Automated Software Engineering最新文献_第4页

Revisiting file context for source code summarization 重新审视源代码摘要的文件上下文

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-27 DOI: 10.1007/s10515-024-00460-x

Chia-Yi Su, Aakash Bansal, Collin McMillan

Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder–decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself—that information often resides in other nearby code. In this paper, we revisit the idea of “file context” for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several baselines. We find that file context helps on a subset of challenging examples where traditional approaches struggle.

源代码摘要是编写源代码自然语言描述的任务。一个典型的用例是生成用于 API 文档的子程序简短摘要。目前几乎所有代码摘要研究的核心都是编码器-解码器神经架构，而编码器的输入几乎总是单个子程序或其他简短代码片段。这种设置的问题在于，描述代码所需的信息往往不存在于代码本身--这些信息往往存在于附近的其他代码中。在本文中，我们重新审视了用于代码摘要的 "文件上下文 "理念。文件上下文是指对同一文件中其他子程序的选择信息进行编码。我们对 Transformer 架构提出了一种新的修改方案，专门用于对文件上下文进行编码，并展示了它与几种基线相比的改进。我们发现，文件上下文有助于解决传统方法难以解决的一部分具有挑战性的例子。

引用次数: 0

TM-fuzzer: fuzzing autonomous driving systems through traffic management TM-fuzzer：通过交通管理对自动驾驶系统进行模糊测试

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-27 DOI: 10.1007/s10515-024-00461-w

Shenghao Lin, Fansong Chen, Laile Xi, Gaosheng Wang, Rongrong Xi, Yuyan Sun, Hongsong Zhu

Simulation testing of Autonomous Driving Systems (ADS) is crucial for ensuring the safety of autonomous vehicles. Currently, scenarios searched by ADS simulation testing tools are less likely to expose ADS issues and highly similar. In this paper, we propose TM-fuzzer, a novel approach for searching ADS test scenarios, which utilizes real-time traffic management and diversity analysis to search security-critical and unique scenarios within the infinite scenario space. TM-fuzzer dynamically manages traffic flow by manipulating non-player characters near autonomous vehicle throughout the simulation process to enhance the efficiency of test scenarios. Additionally, the TM-fuzzer utilizes clustering analysis on vehicle trajectory graphs within scenarios to increase the diversity of test scenarios. Compared to the baseline, the TM-fuzzer identified 29 unique violated scenarios more than four times faster and enhanced the incidence of ADS-caused violations by 26.26%. Experiments suggest that the TM-fuzzer demonstrates improved efficiency and accuracy.

自动驾驶系统（ADS）的模拟测试对于确保自动驾驶汽车的安全性至关重要。目前，ADS 仿真测试工具所搜索的场景不太可能暴露出 ADS 问题，而且高度相似。在本文中，我们提出了一种搜索 ADS 测试场景的新方法 TM-fuzzer，它利用实时流量管理和多样性分析，在无限的场景空间中搜索安全关键和独特的场景。在整个模拟过程中，TM-fuzzer 通过操纵自主车辆附近的非玩家角色来动态管理交通流，从而提高测试场景的效率。此外，TM-fuzzer 还对场景中的车辆轨迹图进行聚类分析，以增加测试场景的多样性。与基线相比，TM-fuzzer 识别 29 个独特违规场景的速度提高了四倍多，并将 ADS 引起的违规发生率提高了 26.26%。实验表明，TM-模糊器提高了效率和准确性。

引用次数: 0

Rethinking AI code generation: a one-shot correction approach based on user feedback 反思人工智能代码生成：基于用户反馈的一次性修正方法

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-12 DOI: 10.1007/s10515-024-00451-y

Kim Tuyen Le, Artur Andrzejak

Code generation has become an integral feature of modern IDEs, gathering significant attention. Notable approaches like GitHub Copilot and TabNine have been proposed to tackle this task. However, these tools may shift code writing tasks towards code reviewing, which involves modification from users. Despite the advantages of user feedback, their responses remain transient and lack persistence across interaction sessions. This is attributed to the inherent characteristics of generative AI models, which require explicit re-training for new data integration. Additionally, the non-deterministic and unpredictable nature of AI-powered models limits thorough examination of their unforeseen behaviors. We propose a methodology named One-shot Correction to mitigate these issues in natural language to code translation models with no additional re-training. We utilize decomposition techniques to break down code translation into sub-problems. The final code is constructed using code snippets of each query chunk, extracted from user feedback or selectively generated from a generative model. Our evaluation indicates comparable or improved performance compared to other models. Moreover, the methodology offers straightforward and interpretable approaches, which enable in-depth examination of unexpected results and facilitate insights for potential enhancements. We also illustrate that user feedback can substantially improve code translation models without re-training. Ultimately, we develop a preliminary GUI application to demonstrate the utility of our methodology in simplifying customization and assessment of suggested code for users.

代码生成已成为现代集成开发环境的一项不可或缺的功能，备受关注。GitHub Copilot 和 TabNine 等著名方法已被提出来解决这一任务。然而，这些工具可能会将代码编写任务转向代码审查，这涉及到用户的修改。尽管用户反馈有很多优点，但他们的反应仍然是短暂的，在不同的交互会话中缺乏持久性。这归因于生成式人工智能模型的固有特征，即需要对新的数据整合进行明确的再训练。此外，人工智能模型的非确定性和不可预测性也限制了对其不可预见行为的彻底检查。我们提出了一种名为 "一次性修正"（One-shot Correction）的方法，以缓解自然语言到代码翻译模型中的这些问题，而无需额外的再训练。我们利用分解技术将代码翻译分解为多个子问题。最终的代码是使用每个查询块的代码片段构建的，这些片段从用户反馈中提取，或有选择地从生成模型中生成。我们的评估结果表明，与其他模型相比，该方法的性能相当或有所提高。此外，该方法还提供了简单明了、可解释的方法，从而能够深入研究意想不到的结果，并有助于深入了解潜在的改进措施。我们还说明，用户反馈可以大幅改进代码翻译模型，而无需重新训练。最后，我们开发了一个初步的图形用户界面应用程序，以展示我们的方法在简化用户定制和评估建议代码方面的实用性。

{"title":"Rethinking AI code generation: a one-shot correction approach based on user feedback","authors":"Kim Tuyen Le, Artur Andrzejak","doi":"10.1007/s10515-024-00451-y","DOIUrl":"10.1007/s10515-024-00451-y","url":null,"abstract":"<div><p>Code generation has become an integral feature of modern IDEs, gathering significant attention. Notable approaches like GitHub Copilot and TabNine have been proposed to tackle this task. However, these tools may shift code writing tasks towards code reviewing, which involves modification from users. Despite the advantages of user feedback, their responses remain transient and lack persistence across interaction sessions. This is attributed to the inherent characteristics of generative AI models, which require explicit re-training for new data integration. Additionally, the non-deterministic and unpredictable nature of AI-powered models limits thorough examination of their unforeseen behaviors. We propose a methodology named <i>One-shot Correction</i> to mitigate these issues in natural language to code translation models with no additional re-training. We utilize decomposition techniques to break down code translation into sub-problems. The final code is constructed using code snippets of each query chunk, extracted from user feedback or selectively generated from a generative model. Our evaluation indicates comparable or improved performance compared to other models. Moreover, the methodology offers straightforward and interpretable approaches, which enable in-depth examination of unexpected results and facilitate insights for potential enhancements. We also illustrate that user feedback can substantially improve code translation models without re-training. Ultimately, we develop a preliminary GUI application to demonstrate the utility of our methodology in simplifying customization and assessment of suggested code for users.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00451-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interactive search-based Product Line Architecture design 基于搜索的互动式产品线架构设计

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-09 DOI: 10.1007/s10515-024-00457-6

Willian Marques Freire, Cláudia Tupan Rosa, Aline Maria Malachini Miotto Amaral, Thelma Elita Colanzi

Software Product Line (SPL) is an approach derived from other engineering fields that use reuse techniques for a family of products in a given domain. An essential artifact of SPL is the Product Line Architecture (PLA), which identifies elements characterized by variation points, variability, and variants. The PLA aims to anticipate design decisions to obtain features such as reusability and modularity. Nevertheless, getting a reusable and modular PLA and following pre-defined standards can be a complex task involving several conflicting objectives. In this sense, PLA can be formulated as a multiobjective optimization problem. This research presents an approach that helps DMs (Decision Makers) to interactively optimize the PLAs through several strategies such as interactive optimization and Machine Learning (ML) algorithms. The interactive multiobjective optimization approach for PLA design (iMOA4PLA) uses specific metrics for the PLA optimization problem, implemented through the OPLA-Tool v2.0. In this approach, the architect assumes the role of DM during the search process, guiding the evolution of PLAs through various strategies proposed in previous works. Two quantitative and one qualitative experiments were performed to evaluate the iMOA4PLA. The results showed that this approach can assist the PLA optimization process by meeting more than 90% of DM preferences. The scientific contribution of this work lies in providing an approach for the PLA design and evaluation that leverages the benefits of machine learning algorithms and can serve as a basis for different SE contexts.

软件产品线（SPL）是从其他工程领域衍生出来的一种方法，它将重用技术用于特定领域的产品系列。产品线架构（PLA）是 SPL 的一个重要工具，它可识别以变异点、变异性和变体为特征的元素。产品线架构旨在预测设计决策，以获得可重用性和模块化等特性。然而，要获得可重复使用和模块化的 PLA 并遵循预定义的标准，可能是一项复杂的任务，涉及多个相互冲突的目标。从这个意义上说，PLA 可以表述为一个多目标优化问题。本研究提出了一种方法，通过交互式优化和机器学习（ML）算法等几种策略，帮助 DM（决策者）交互式优化 PLA。PLA 设计的交互式多目标优化方法（iMOA4PLA）使用特定的指标来解决 PLA 优化问题，并通过 OPLA-Tool v2.0 实现。在这种方法中，建筑师在搜索过程中扮演了 DM 的角色，通过之前工作中提出的各种策略指导 PLA 的演化。为了评估 iMOA4PLA，我们进行了两次定量实验和一次定性实验。结果表明，这种方法可以协助 PLA 优化过程，满足 90% 以上的 DM 偏好。这项工作的科学贡献在于为 PLA 设计和评估提供了一种方法，这种方法充分利用了机器学习算法的优势，可作为不同 SE 环境的基础。

{"title":"Interactive search-based Product Line Architecture design","authors":"Willian Marques Freire, Cláudia Tupan Rosa, Aline Maria Malachini Miotto Amaral, Thelma Elita Colanzi","doi":"10.1007/s10515-024-00457-6","DOIUrl":"10.1007/s10515-024-00457-6","url":null,"abstract":"<div><p>Software Product Line (SPL) is an approach derived from other engineering fields that use reuse techniques for a family of products in a given domain. An essential artifact of SPL is the Product Line Architecture (PLA), which identifies elements characterized by variation points, variability, and variants. The PLA aims to anticipate design decisions to obtain features such as reusability and modularity. Nevertheless, getting a reusable and modular PLA and following pre-defined standards can be a complex task involving several conflicting objectives. In this sense, PLA can be formulated as a multiobjective optimization problem. This research presents an approach that helps DMs (Decision Makers) to interactively optimize the PLAs through several strategies such as interactive optimization and Machine Learning (ML) algorithms. The interactive multiobjective optimization approach for PLA design (iMOA4PLA) uses specific metrics for the PLA optimization problem, implemented through the OPLA-Tool v2.0. In this approach, the architect assumes the role of DM during the search process, guiding the evolution of PLAs through various strategies proposed in previous works. Two quantitative and one qualitative experiments were performed to evaluate the iMOA4PLA. The results showed that this approach can assist the PLA optimization process by meeting more than 90% of DM preferences. The scientific contribution of this work lies in providing an approach for the PLA design and evaluation that leverages the benefits of machine learning algorithms and can serve as a basis for different SE contexts.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing regression testing with AHP-TOPSIS metric system for effective technical debt evaluation 利用 AHP-TOPSIS 指标体系优化回归测试，有效评估技术债务

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-07-08 DOI: 10.1007/s10515-024-00458-5

Anis Zarrad, Rami Bahsoon, Priya Manimaran

Regression testing is essential to ensure that the actual software product confirms the expected requirements following modification. However, it can be costly and time-consuming. To address this issue, various approaches have been proposed for selecting test cases that provide adequate coverage of the modified software. Nonetheless, problems related to omitting and/or rerunning unnecessary test cases continue to pose challenges, particularly with regard to technical debt (TD) resulting from code coverage shortcomings and/or overtesting. In the case of testing-related shortcomings, incurring TD may result in cost and time savings in the short run, but it can lead to future maintenance and testing expenses. Most prior studies have treated test case selection as a single-objective or two-objective optimization problem. This study introduces a multi-objective decision-making approach to quantify and evaluate TD in regression testing. The proposed approach combines the analytic-hierarchy-process (AHP) method and the technique of order preference by similarity to an ideal solution (TOPSIS) to select the most ideal test cases in terms of objective values defined by the test cost, code coverage, and test risk. This approach effectively manages the software regression testing problems. The AHP method was used to eliminate subjective bias when optimizing objective weights, while the TOPSIS method was employed to evaluate and select test-case alternatives based on TD. The effectiveness of this approach was compared to that of a specific multi-objective optimization method and a standard coverage methodology. Unlike other approaches, our proposed approach always accepts solutions based on balanced decisions by considering modifications and using risk analysis and testing costs against potential technical debt. The results demonstrate that our proposed approach reduces both TD and regression testing efforts.

回归测试对于确保修改后的实际软件产品符合预期要求至关重要。然而，回归测试既费钱又费时。为解决这一问题，人们提出了各种方法来选择测试用例，以充分覆盖修改后的软件。然而，与遗漏和/或重新运行不必要的测试用例有关的问题仍然是一个挑战，特别是代码覆盖缺陷和/或过度测试所导致的技术债务（TD）。就与测试相关的缺陷而言，产生 TD 可能会在短期内节省成本和时间，但会导致未来的维护和测试费用。之前的大多数研究都将测试用例选择视为单目标或双目标优化问题。本研究介绍了一种多目标决策方法，用于量化和评估回归测试中的 TD。所提出的方法结合了层次分析法（AHP）和理想解相似度排序偏好技术（TOPSIS），可根据测试成本、代码覆盖率和测试风险定义的目标值选择最理想的测试用例。这种方法能有效管理软件回归测试问题。在优化目标权重时，使用了 AHP 方法来消除主观偏差，同时使用了 TOPSIS 方法来评估和选择基于 TD 的测试用例备选方案。该方法的有效性与特定多目标优化方法和标准覆盖方法进行了比较。与其他方法不同的是，我们提出的方法通过考虑修改和使用风险分析以及测试成本与潜在技术债务的对比，始终接受基于平衡决策的解决方案。结果表明，我们提出的方法可以减少 TD 和回归测试的工作量。

{"title":"Optimizing regression testing with AHP-TOPSIS metric system for effective technical debt evaluation","authors":"Anis Zarrad, Rami Bahsoon, Priya Manimaran","doi":"10.1007/s10515-024-00458-5","DOIUrl":"10.1007/s10515-024-00458-5","url":null,"abstract":"<div><p>Regression testing is essential to ensure that the actual software product confirms the expected requirements following modification. However, it can be costly and time-consuming. To address this issue, various approaches have been proposed for selecting test cases that provide adequate coverage of the modified software. Nonetheless, problems related to omitting and/or rerunning unnecessary test cases continue to pose challenges, particularly with regard to technical debt (TD) resulting from code coverage shortcomings and/or overtesting. In the case of testing-related shortcomings, incurring TD may result in cost and time savings in the short run, but it can lead to future maintenance and testing expenses. Most prior studies have treated test case selection as a single-objective or two-objective optimization problem. This study introduces a multi-objective decision-making approach to quantify and evaluate TD in regression testing. The proposed approach combines the analytic-hierarchy-process (AHP) method and the technique of order preference by similarity to an ideal solution (TOPSIS) to select the most ideal test cases in terms of objective values defined by the test cost, code coverage, and test risk. This approach effectively manages the software regression testing problems. The AHP method was used to eliminate subjective bias when optimizing objective weights, while the TOPSIS method was employed to evaluate and select test-case alternatives based on TD. The effectiveness of this approach was compared to that of a specific multi-objective optimization method and a standard coverage methodology. Unlike other approaches, our proposed approach always accepts solutions based on balanced decisions by considering modifications and using risk analysis and testing costs against potential technical debt. The results demonstrate that our proposed approach reduces both TD and regression testing efforts.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00458-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge-enhanced software refinement: leveraging reinforcement learning for search-based quality engineering 知识强化软件完善：利用强化学习实现基于搜索的质量工程

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-06-25 DOI: 10.1007/s10515-024-00456-7

Maryam Nooraei Abadeh

In the rapidly evolving software development industry, the early identification of optimal design alternatives and accurate performance prediction are critical for developing efficient software products. This paper introduces a novel approach to software refinement, termed Reinforcement Learning-based Software Refinement (RLSR), which leverages Reinforcement Learning techniques to address this challenge. RLSR enables an automated software refinement process that incorporates quality-driven intelligent software development as an early decision-making strategy. By proposing a Q-learning-based approach, RLSR facilitates the automatic refinement of software in dynamic environments while optimizing the utilization of computational resources and time. Additionally, the convergence rate to an optimal policy during the refinement process is investigated. The results demonstrate that training the policy using throughput values leads to significantly faster convergence to optimal rewards. This study evaluates RLSR based on various metrics, including episode length, reward over time, and reward distributions on a running example. Furthermore, to illustrate the effectiveness and applicability of the proposed method, a comparative analysis is applied to three refinable software designs, such as the E-commerce platform, smart booking platform, and Web-based GIS transformation system. The comparison between Q-learning and the proposed algorithm reveals that the refinement outcomes achieved with the proposed algorithm are superior, particularly when an adequate number of learning steps and a comprehensive historical dataset are available. The findings emphasize the potential of leveraging reinforcement learning techniques for automating software refinement and improving the efficiency of the model-driven development process.

在快速发展的软件开发行业中，尽早识别最佳设计方案和准确预测性能对于开发高效的软件产品至关重要。本文介绍了一种新颖的软件完善方法，即基于强化学习的软件完善（RLSR），它利用强化学习技术来应对这一挑战。RLSR 可实现自动化软件完善流程，将质量驱动型智能软件开发作为早期决策策略。通过提出一种基于 Q 学习的方法，RLSR 可促进动态环境中软件的自动完善，同时优化计算资源和时间的利用。此外，还研究了细化过程中最优策略的收敛率。结果表明，使用吞吐量值对策略进行训练可显著加快向最优奖励的收敛速度。本研究基于各种指标对 RLSR 进行了评估，包括运行示例中的插曲长度、随时间变化的奖励和奖励分布。此外，为了说明所提方法的有效性和适用性，还对电子商务平台、智能预订平台和基于网络的地理信息系统转换系统等三个可完善的软件设计进行了对比分析。通过对 Q-learning 和所提算法的比较发现，所提算法取得的精炼结果更优越，尤其是在有足够数量的学习步骤和全面的历史数据集的情况下。研究结果强调了利用强化学习技术实现软件完善自动化和提高模型驱动开发流程效率的潜力。

{"title":"Knowledge-enhanced software refinement: leveraging reinforcement learning for search-based quality engineering","authors":"Maryam Nooraei Abadeh","doi":"10.1007/s10515-024-00456-7","DOIUrl":"10.1007/s10515-024-00456-7","url":null,"abstract":"<div><p>In the rapidly evolving software development industry, the early identification of optimal design alternatives and accurate performance prediction are critical for developing efficient software products. This paper introduces a novel approach to software refinement, termed Reinforcement Learning-based Software Refinement (RLSR), which leverages Reinforcement Learning techniques to address this challenge. RLSR enables an automated software refinement process that incorporates quality-driven intelligent software development as an early decision-making strategy. By proposing a Q-learning-based approach, RLSR facilitates the automatic refinement of software in dynamic environments while optimizing the utilization of computational resources and time. Additionally, the convergence rate to an optimal policy during the refinement process is investigated. The results demonstrate that training the policy using throughput values leads to significantly faster convergence to optimal rewards. This study evaluates RLSR based on various metrics, including episode length, reward over time, and reward distributions on a running example. Furthermore, to illustrate the effectiveness and applicability of the proposed method, a comparative analysis is applied to three refinable software designs, such as the E-commerce platform, smart booking platform, and Web-based GIS transformation system. The comparison between Q-learning and the proposed algorithm reveals that the refinement outcomes achieved with the proposed algorithm are superior, particularly when an adequate number of learning steps and a comprehensive historical dataset are available. The findings emphasize the potential of leveraging reinforcement learning techniques for automating software refinement and improving the efficiency of the model-driven development process.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical study of data sampling techniques for just-in-time software defect prediction 及时软件缺陷预测的数据抽样技术实证研究

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-06-22 DOI: 10.1007/s10515-024-00455-8

Zhiqiang Li, Qiannan Du, Hongyu Zhang, Xiao-Yuan Jing, Fei Wu

Just-in-time software defect prediction (JIT-SDP) is a fine-grained, easy-to-trace, and practical method. Unfortunately, JIT-SDP usually suffers from the class imbalance problem, which affects the performance of the models. Data sampling is one of the commonly used class imbalance techniques to overcome this problem. However, there is a lack of comprehensive empirical studies to compare different data sampling techniques on the performance of JIT-SDP. In this paper, we consider both defect classification and defect ranking, two typical application scenarios. To this end, we performed an empirical comparison of 10 data sampling algorithms on the performance of JIT-SDP. Extensive experiments on 10 open-source projects with 12 performance measures show that the effectiveness of data sampling techniques can indeed vary relying on the specific evaluation measures in both defect classification and defect ranking scenarios. Specifically, the RUM algorithm has demonstrated superior performance overall in the context of defect classification, particularly in F-measure, AUC, and MCC. On the other hand, for defect ranking, the ENN algorithm has emerged as the most favorable option, exhibiting perfect results in (P_{opt}), Recall@20%, and F-measure@20%. However, data sampling techniques can lead to an increase in false alarms and require the inspection of a higher number of changes. These findings highlight the importance of carefully selecting the appropriate data sampling technique based on the specific evaluation measures for different scenarios.

准时软件缺陷预测（JIT-SDP）是一种细粒度、易于跟踪且实用的方法。遗憾的是，JIT-SDP 通常存在类不平衡问题，这会影响模型的性能。数据抽样是克服这一问题的常用类不平衡技术之一。然而，目前还缺乏全面的实证研究来比较不同数据抽样技术对 JIT-SDP 性能的影响。在本文中，我们考虑了缺陷分类和缺陷排序这两种典型的应用场景。为此，我们对 10 种数据采样算法在 JIT-SDP 性能方面的表现进行了实证比较。在 10 个开源项目中对 12 个性能指标进行的广泛实验表明，在缺陷分类和缺陷排序场景中，数据抽样技术的有效性确实会因具体评估指标的不同而不同。具体来说，RUM 算法在缺陷分类方面表现出了更优越的整体性能，尤其是在 F-measure、AUC 和 MCC 方面。另一方面，在缺陷排序方面，ENN 算法成为最有利的选择，在 (P_{opt})、Recall@20% 和 F-measure@20% 方面都表现出完美的结果。然而，数据采样技术可能会导致误报增加，并需要检查更多的变化。这些发现凸显了根据不同场景的具体评估指标仔细选择合适的数据抽样技术的重要性。

{"title":"An empirical study of data sampling techniques for just-in-time software defect prediction","authors":"Zhiqiang Li, Qiannan Du, Hongyu Zhang, Xiao-Yuan Jing, Fei Wu","doi":"10.1007/s10515-024-00455-8","DOIUrl":"10.1007/s10515-024-00455-8","url":null,"abstract":"<div><p>Just-in-time software defect prediction (JIT-SDP) is a fine-grained, easy-to-trace, and practical method. Unfortunately, JIT-SDP usually suffers from the class imbalance problem, which affects the performance of the models. Data sampling is one of the commonly used class imbalance techniques to overcome this problem. However, there is a lack of comprehensive empirical studies to compare different data sampling techniques on the performance of JIT-SDP. In this paper, we consider both defect classification and defect ranking, two typical application scenarios. To this end, we performed an empirical comparison of 10 data sampling algorithms on the performance of JIT-SDP. Extensive experiments on 10 open-source projects with 12 performance measures show that the effectiveness of data sampling techniques can indeed vary relying on the specific evaluation measures in both defect classification and defect ranking scenarios. Specifically, the RUM algorithm has demonstrated superior performance overall in the context of defect classification, particularly in <i>F-measure</i>, <i>AUC</i>, and <i>MCC</i>. On the other hand, for defect ranking, the ENN algorithm has emerged as the most favorable option, exhibiting perfect results in <span>(P_{opt})</span>, <i>Recall@20%</i>, and <i>F-measure@20%</i>. However, data sampling techniques can lead to an increase in false alarms and require the inspection of a higher number of changes. These findings highlight the importance of carefully selecting the appropriate data sampling technique based on the specific evaluation measures for different scenarios.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration SimAC：模拟敏捷协作，在用户故事阐述中生成验收标准

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-06-21 DOI: 10.1007/s10515-024-00448-7

Yishu Li, Jacky Keung, Zhen Yang, Xiaoxue Ma, Jingyu Zhang, Shuo Liu

In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.

在敏捷需求工程中，生成验收标准（GAC）以阐述用户故事在冲刺计划阶段起着关键作用，它为交付功能解决方案提供了参考。GAC 需要广泛的协作和人工参与。然而，由于缺乏为附有验收标准的用户故事（US-AC）量身定制的标记数据集，这给试图将这一过程自动化的监督学习技术带来了巨大挑战。大型语言模型（LLMs）的最新进展展示了其卓越的文本生成能力，绕过了监督微调的需要。因此，LLM 具备克服上述挑战的潜力。受此启发，我们提出了 SimAC，一个利用 LLM 模拟敏捷协作的框架，其中包含三个不同的角色组：需求分析师、质量分析师和其他。在基于角色的提示启动下，LLMs 按照 GAC 中的创建-更新-再创建-再更新模式依次扮演这些角色。由于无法获得基本事实，我们邀请从业人员建立了一个黄金标准，作为评估自动生成的 US-AC 与人工创建的 US-AC 的完整性和有效性的基准。此外，我们还邀请了八位经验丰富的敏捷实践者使用 INVEST 框架评估 US-AC 的质量。结果表明，所有测试过的 LLM（包括 LLaMA 和 GPT-3.5 系列）都得到了一致的改进。值得注意的是，SimAC 显著增强了 GPT-3.5-turbo 在 GAC 中的能力，在完整性和有效性方面分别提高了 29.48% 和 15.56%，INVEST 满意度得分最高，分别为 3.21/4。此外，本研究还通过案例分析说明了 SimAC 的有效性和局限性，揭示了 LLM 在自动化敏捷需求工程中的潜力。

{"title":"SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration","authors":"Yishu Li, Jacky Keung, Zhen Yang, Xiaoxue Ma, Jingyu Zhang, Shuo Liu","doi":"10.1007/s10515-024-00448-7","DOIUrl":"10.1007/s10515-024-00448-7","url":null,"abstract":"<div><p>In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data cleaning and machine learning: a systematic literature review 数据清理与机器学习：系统文献综述

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-06-11 DOI: 10.1007/s10515-024-00453-w

Pierre-Olivier Côté, Amin Nikanjam, Nafisa Ahmed, Dmytro Humeniuk, Foutse Khomh

Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. This paper’s objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. We believe that our review of the literature will help the community develop better approaches to clean data.

机器学习（ML）被越来越多的系统集成到各种应用中。由于 ML 模型的性能在很大程度上取决于它所训练的数据的质量，因此人们对检测和修复数据错误（即数据清理）的方法越来越感兴趣。研究人员也在探索如何将 ML 用于数据清洗，从而在 ML 和数据清洗之间建立起双重关系。据我们所知，目前还没有一项研究对这种关系进行全面回顾。本文有两个目的。首先，本文旨在总结用于 ML 的数据清洗和用于数据清洗的 ML 的最新方法。其次，本文提出了未来的工作建议。我们对 2016 年至 2022 年间发表的论文进行了系统的文献综述。我们确定了使用 ML 和针对 ML 的不同类型的数据清洗活动：特征清洗、标签清洗、实体匹配、离群点检测、估算和整体数据清洗。我们总结了 101 篇涉及各种数据清洗活动的论文内容，并提供了 24 项未来工作建议。我们的综述强调了许多有前途的数据清洗技术，这些技术可以进一步扩展。我们相信，我们的文献综述将有助于社区开发出更好的数据清理方法。

{"title":"Data cleaning and machine learning: a systematic literature review","authors":"Pierre-Olivier Côté, Amin Nikanjam, Nafisa Ahmed, Dmytro Humeniuk, Foutse Khomh","doi":"10.1007/s10515-024-00453-w","DOIUrl":"10.1007/s10515-024-00453-w","url":null,"abstract":"<div><p>Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. This paper’s objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. We believe that our review of the literature will help the community develop better approaches to clean data.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SDK4ED: a platform for building energy efficient, dependable, and maintainable embedded software SDK4ED：构建节能、可靠、可维护嵌入式软件的平台

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-06-11 DOI: 10.1007/s10515-024-00450-z

Miltiadis Siavvas, Dimitrios Tsoukalas, Charalambos Marantos, Lazaros Papadopoulos, Christos Lamprakos, Oliviu Matei, Christos Strydis, Muhammad Ali Siddiqi, Philippe Chrobocinski, Katarzyna Filus, Joanna Domańska, Paris Avgeriou, Apostolos Ampatzoglou, Dimitrios Soudris, Alexander Chatzigeorgiou, Erol Gelenbe, Dionysios Kehagias, Dimitrios Tzovaras

Developing embedded software applications is a challenging task, chiefly due to the limitations that are imposed by the hardware devices or platforms on which they operate, as well as due to the heterogeneous non-functional requirements that they need to exhibit. Modern embedded systems need to be energy efficient and dependable, whereas their maintenance costs should be minimized, in order to ensure the success and longevity of their application. Being able to build embedded software that satisfies the imposed hardware limitations, while maintaining high quality with respect to critical non-functional requirements is a difficult task that requires proper assistance. To this end, in the present paper, we present the SDK4ED Platform, which facilitates the development of embedded software that exhibits high quality with respect to important quality attributes, with a main focus on energy consumption, dependability, and maintainability. This is achieved through the provision of state-of-the-art and novel quality attribute-specific monitoring and optimization mechanisms, as well as through a novel fuzzy multi-criteria decision-making mechanism for facilitating the selection of code refactorings, which is based on trade-off analysis among the three main attributes of choice. Novel forecasting techniques are also proposed to further support decision making during the development of embedded software. The usefulness, practicality, and industrial relevance of the SDK4ED platform were evaluated in a real-world setting, through three use cases on actual commercial embedded software applications stemming from the airborne, automotive, and healthcare domains, as well as through an industrial study. To the best of our knowledge, this is the first quality analysis platform that focuses on multiple quality criteria, which also takes into account their trade-offs to facilitate code refactoring selection.

开发嵌入式软件应用程序是一项极具挑战性的任务，这主要是由于其运行所依赖的硬件设备或平台所带来的限制，以及它们需要满足的各种非功能性要求。现代嵌入式系统需要高能效和高可靠性，同时应最大限度地降低维护成本，以确保其应用的成功和寿命。要构建既能满足硬件限制，又能在关键的非功能要求方面保持高质量的嵌入式软件，是一项需要适当帮助的艰巨任务。为此，我们在本文中介绍了 SDK4ED 平台，该平台有助于开发在重要质量属性方面表现出高质量的嵌入式软件，主要侧重于能耗、可靠性和可维护性。为实现这一目标，我们提供了最先进、最新颖的针对特定质量属性的监控和优化机制，以及一种新颖的模糊多标准决策机制，该机制基于对三个主要选择属性的权衡分析，便于选择代码重构。此外，还提出了新颖的预测技术，以进一步支持嵌入式软件开发过程中的决策制定。通过机载、汽车和医疗保健领域实际商业嵌入式软件应用的三个用例，以及一项工业研究，在现实世界环境中评估了 SDK4ED 平台的有用性、实用性和工业相关性。据我们所知，这是第一个专注于多种质量标准的质量分析平台，该平台还考虑到了它们之间的权衡，以促进代码重构选择。

{"title":"SDK4ED: a platform for building energy efficient, dependable, and maintainable embedded software","authors":"Miltiadis Siavvas, Dimitrios Tsoukalas, Charalambos Marantos, Lazaros Papadopoulos, Christos Lamprakos, Oliviu Matei, Christos Strydis, Muhammad Ali Siddiqi, Philippe Chrobocinski, Katarzyna Filus, Joanna Domańska, Paris Avgeriou, Apostolos Ampatzoglou, Dimitrios Soudris, Alexander Chatzigeorgiou, Erol Gelenbe, Dionysios Kehagias, Dimitrios Tzovaras","doi":"10.1007/s10515-024-00450-z","DOIUrl":"10.1007/s10515-024-00450-z","url":null,"abstract":"<div><p>Developing embedded software applications is a challenging task, chiefly due to the limitations that are imposed by the hardware devices or platforms on which they operate, as well as due to the heterogeneous non-functional requirements that they need to exhibit. Modern embedded systems need to be energy efficient and dependable, whereas their maintenance costs should be minimized, in order to ensure the success and longevity of their application. Being able to build embedded software that satisfies the imposed hardware limitations, while maintaining high quality with respect to critical non-functional requirements is a difficult task that requires proper assistance. To this end, in the present paper, we present the SDK4ED Platform, which facilitates the development of embedded software that exhibits high quality with respect to important quality attributes, with a main focus on energy consumption, dependability, and maintainability. This is achieved through the provision of state-of-the-art and novel quality attribute-specific monitoring and optimization mechanisms, as well as through a novel fuzzy multi-criteria decision-making mechanism for facilitating the selection of code refactorings, which is based on trade-off analysis among the three main attributes of choice. Novel forecasting techniques are also proposed to further support decision making during the development of embedded software. The usefulness, practicality, and industrial relevance of the SDK4ED platform were evaluated in a real-world setting, through three use cases on actual commercial embedded software applications stemming from the airborne, automotive, and healthcare domains, as well as through an industrial study. To the best of our knowledge, this is the first quality analysis platform that focuses on multiple quality criteria, which also takes into account their trade-offs to facilitate code refactoring selection.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0