Pub Date : 2024-07-27DOI: 10.1007/s10515-024-00460-x
Chia-Yi Su, Aakash Bansal, Collin McMillan
Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder–decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself—that information often resides in other nearby code. In this paper, we revisit the idea of “file context” for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several baselines. We find that file context helps on a subset of challenging examples where traditional approaches struggle.
源代码摘要是编写源代码自然语言描述的任务。一个典型的用例是生成用于 API 文档的子程序简短摘要。目前几乎所有代码摘要研究的核心都是编码器-解码器神经架构,而编码器的输入几乎总是单个子程序或其他简短代码片段。这种设置的问题在于,描述代码所需的信息往往不存在于代码本身--这些信息往往存在于附近的其他代码中。在本文中,我们重新审视了用于代码摘要的 "文件上下文 "理念。文件上下文是指对同一文件中其他子程序的选择信息进行编码。我们对 Transformer 架构提出了一种新的修改方案,专门用于对文件上下文进行编码,并展示了它与几种基线相比的改进。我们发现,文件上下文有助于解决传统方法难以解决的一部分具有挑战性的例子。
{"title":"Revisiting file context for source code summarization","authors":"Chia-Yi Su, Aakash Bansal, Collin McMillan","doi":"10.1007/s10515-024-00460-x","DOIUrl":"10.1007/s10515-024-00460-x","url":null,"abstract":"<div><p>Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder–decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself—that information often resides in other nearby code. In this paper, we revisit the idea of “file context” for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several baselines. We find that file context helps on a subset of challenging examples where traditional approaches struggle.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141774512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simulation testing of Autonomous Driving Systems (ADS) is crucial for ensuring the safety of autonomous vehicles. Currently, scenarios searched by ADS simulation testing tools are less likely to expose ADS issues and highly similar. In this paper, we propose TM-fuzzer, a novel approach for searching ADS test scenarios, which utilizes real-time traffic management and diversity analysis to search security-critical and unique scenarios within the infinite scenario space. TM-fuzzer dynamically manages traffic flow by manipulating non-player characters near autonomous vehicle throughout the simulation process to enhance the efficiency of test scenarios. Additionally, the TM-fuzzer utilizes clustering analysis on vehicle trajectory graphs within scenarios to increase the diversity of test scenarios. Compared to the baseline, the TM-fuzzer identified 29 unique violated scenarios more than four times faster and enhanced the incidence of ADS-caused violations by 26.26%. Experiments suggest that the TM-fuzzer demonstrates improved efficiency and accuracy.
{"title":"TM-fuzzer: fuzzing autonomous driving systems through traffic management","authors":"Shenghao Lin, Fansong Chen, Laile Xi, Gaosheng Wang, Rongrong Xi, Yuyan Sun, Hongsong Zhu","doi":"10.1007/s10515-024-00461-w","DOIUrl":"10.1007/s10515-024-00461-w","url":null,"abstract":"<div><p>Simulation testing of Autonomous Driving Systems (ADS) is crucial for ensuring the safety of autonomous vehicles. Currently, scenarios searched by ADS simulation testing tools are less likely to expose ADS issues and highly similar. In this paper, we propose TM-fuzzer, a novel approach for searching ADS test scenarios, which utilizes real-time traffic management and diversity analysis to search security-critical and unique scenarios within the infinite scenario space. TM-fuzzer dynamically manages traffic flow by manipulating non-player characters near autonomous vehicle throughout the simulation process to enhance the efficiency of test scenarios. Additionally, the TM-fuzzer utilizes clustering analysis on vehicle trajectory graphs within scenarios to increase the diversity of test scenarios. Compared to the baseline, the TM-fuzzer identified 29 unique violated scenarios more than four times faster and enhanced the incidence of ADS-caused violations by 26.26%. Experiments suggest that the TM-fuzzer demonstrates improved efficiency and accuracy.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141774356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-12DOI: 10.1007/s10515-024-00451-y
Kim Tuyen Le, Artur Andrzejak
Code generation has become an integral feature of modern IDEs, gathering significant attention. Notable approaches like GitHub Copilot and TabNine have been proposed to tackle this task. However, these tools may shift code writing tasks towards code reviewing, which involves modification from users. Despite the advantages of user feedback, their responses remain transient and lack persistence across interaction sessions. This is attributed to the inherent characteristics of generative AI models, which require explicit re-training for new data integration. Additionally, the non-deterministic and unpredictable nature of AI-powered models limits thorough examination of their unforeseen behaviors. We propose a methodology named One-shot Correction to mitigate these issues in natural language to code translation models with no additional re-training. We utilize decomposition techniques to break down code translation into sub-problems. The final code is constructed using code snippets of each query chunk, extracted from user feedback or selectively generated from a generative model. Our evaluation indicates comparable or improved performance compared to other models. Moreover, the methodology offers straightforward and interpretable approaches, which enable in-depth examination of unexpected results and facilitate insights for potential enhancements. We also illustrate that user feedback can substantially improve code translation models without re-training. Ultimately, we develop a preliminary GUI application to demonstrate the utility of our methodology in simplifying customization and assessment of suggested code for users.
{"title":"Rethinking AI code generation: a one-shot correction approach based on user feedback","authors":"Kim Tuyen Le, Artur Andrzejak","doi":"10.1007/s10515-024-00451-y","DOIUrl":"10.1007/s10515-024-00451-y","url":null,"abstract":"<div><p>Code generation has become an integral feature of modern IDEs, gathering significant attention. Notable approaches like GitHub Copilot and TabNine have been proposed to tackle this task. However, these tools may shift code writing tasks towards code reviewing, which involves modification from users. Despite the advantages of user feedback, their responses remain transient and lack persistence across interaction sessions. This is attributed to the inherent characteristics of generative AI models, which require explicit re-training for new data integration. Additionally, the non-deterministic and unpredictable nature of AI-powered models limits thorough examination of their unforeseen behaviors. We propose a methodology named <i>One-shot Correction</i> to mitigate these issues in natural language to code translation models with no additional re-training. We utilize decomposition techniques to break down code translation into sub-problems. The final code is constructed using code snippets of each query chunk, extracted from user feedback or selectively generated from a generative model. Our evaluation indicates comparable or improved performance compared to other models. Moreover, the methodology offers straightforward and interpretable approaches, which enable in-depth examination of unexpected results and facilitate insights for potential enhancements. We also illustrate that user feedback can substantially improve code translation models without re-training. Ultimately, we develop a preliminary GUI application to demonstrate the utility of our methodology in simplifying customization and assessment of suggested code for users.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00451-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software Product Line (SPL) is an approach derived from other engineering fields that use reuse techniques for a family of products in a given domain. An essential artifact of SPL is the Product Line Architecture (PLA), which identifies elements characterized by variation points, variability, and variants. The PLA aims to anticipate design decisions to obtain features such as reusability and modularity. Nevertheless, getting a reusable and modular PLA and following pre-defined standards can be a complex task involving several conflicting objectives. In this sense, PLA can be formulated as a multiobjective optimization problem. This research presents an approach that helps DMs (Decision Makers) to interactively optimize the PLAs through several strategies such as interactive optimization and Machine Learning (ML) algorithms. The interactive multiobjective optimization approach for PLA design (iMOA4PLA) uses specific metrics for the PLA optimization problem, implemented through the OPLA-Tool v2.0. In this approach, the architect assumes the role of DM during the search process, guiding the evolution of PLAs through various strategies proposed in previous works. Two quantitative and one qualitative experiments were performed to evaluate the iMOA4PLA. The results showed that this approach can assist the PLA optimization process by meeting more than 90% of DM preferences. The scientific contribution of this work lies in providing an approach for the PLA design and evaluation that leverages the benefits of machine learning algorithms and can serve as a basis for different SE contexts.
{"title":"Interactive search-based Product Line Architecture design","authors":"Willian Marques Freire, Cláudia Tupan Rosa, Aline Maria Malachini Miotto Amaral, Thelma Elita Colanzi","doi":"10.1007/s10515-024-00457-6","DOIUrl":"10.1007/s10515-024-00457-6","url":null,"abstract":"<div><p>Software Product Line (SPL) is an approach derived from other engineering fields that use reuse techniques for a family of products in a given domain. An essential artifact of SPL is the Product Line Architecture (PLA), which identifies elements characterized by variation points, variability, and variants. The PLA aims to anticipate design decisions to obtain features such as reusability and modularity. Nevertheless, getting a reusable and modular PLA and following pre-defined standards can be a complex task involving several conflicting objectives. In this sense, PLA can be formulated as a multiobjective optimization problem. This research presents an approach that helps DMs (Decision Makers) to interactively optimize the PLAs through several strategies such as interactive optimization and Machine Learning (ML) algorithms. The interactive multiobjective optimization approach for PLA design (iMOA4PLA) uses specific metrics for the PLA optimization problem, implemented through the OPLA-Tool v2.0. In this approach, the architect assumes the role of DM during the search process, guiding the evolution of PLAs through various strategies proposed in previous works. Two quantitative and one qualitative experiments were performed to evaluate the iMOA4PLA. The results showed that this approach can assist the PLA optimization process by meeting more than 90% of DM preferences. The scientific contribution of this work lies in providing an approach for the PLA design and evaluation that leverages the benefits of machine learning algorithms and can serve as a basis for different SE contexts.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s10515-024-00458-5
Anis Zarrad, Rami Bahsoon, Priya Manimaran
Regression testing is essential to ensure that the actual software product confirms the expected requirements following modification. However, it can be costly and time-consuming. To address this issue, various approaches have been proposed for selecting test cases that provide adequate coverage of the modified software. Nonetheless, problems related to omitting and/or rerunning unnecessary test cases continue to pose challenges, particularly with regard to technical debt (TD) resulting from code coverage shortcomings and/or overtesting. In the case of testing-related shortcomings, incurring TD may result in cost and time savings in the short run, but it can lead to future maintenance and testing expenses. Most prior studies have treated test case selection as a single-objective or two-objective optimization problem. This study introduces a multi-objective decision-making approach to quantify and evaluate TD in regression testing. The proposed approach combines the analytic-hierarchy-process (AHP) method and the technique of order preference by similarity to an ideal solution (TOPSIS) to select the most ideal test cases in terms of objective values defined by the test cost, code coverage, and test risk. This approach effectively manages the software regression testing problems. The AHP method was used to eliminate subjective bias when optimizing objective weights, while the TOPSIS method was employed to evaluate and select test-case alternatives based on TD. The effectiveness of this approach was compared to that of a specific multi-objective optimization method and a standard coverage methodology. Unlike other approaches, our proposed approach always accepts solutions based on balanced decisions by considering modifications and using risk analysis and testing costs against potential technical debt. The results demonstrate that our proposed approach reduces both TD and regression testing efforts.
{"title":"Optimizing regression testing with AHP-TOPSIS metric system for effective technical debt evaluation","authors":"Anis Zarrad, Rami Bahsoon, Priya Manimaran","doi":"10.1007/s10515-024-00458-5","DOIUrl":"10.1007/s10515-024-00458-5","url":null,"abstract":"<div><p>Regression testing is essential to ensure that the actual software product confirms the expected requirements following modification. However, it can be costly and time-consuming. To address this issue, various approaches have been proposed for selecting test cases that provide adequate coverage of the modified software. Nonetheless, problems related to omitting and/or rerunning unnecessary test cases continue to pose challenges, particularly with regard to technical debt (TD) resulting from code coverage shortcomings and/or overtesting. In the case of testing-related shortcomings, incurring TD may result in cost and time savings in the short run, but it can lead to future maintenance and testing expenses. Most prior studies have treated test case selection as a single-objective or two-objective optimization problem. This study introduces a multi-objective decision-making approach to quantify and evaluate TD in regression testing. The proposed approach combines the analytic-hierarchy-process (AHP) method and the technique of order preference by similarity to an ideal solution (TOPSIS) to select the most ideal test cases in terms of objective values defined by the test cost, code coverage, and test risk. This approach effectively manages the software regression testing problems. The AHP method was used to eliminate subjective bias when optimizing objective weights, while the TOPSIS method was employed to evaluate and select test-case alternatives based on TD. The effectiveness of this approach was compared to that of a specific multi-objective optimization method and a standard coverage methodology. Unlike other approaches, our proposed approach always accepts solutions based on balanced decisions by considering modifications and using risk analysis and testing costs against potential technical debt. The results demonstrate that our proposed approach reduces both TD and regression testing efforts.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00458-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25DOI: 10.1007/s10515-024-00456-7
Maryam Nooraei Abadeh
In the rapidly evolving software development industry, the early identification of optimal design alternatives and accurate performance prediction are critical for developing efficient software products. This paper introduces a novel approach to software refinement, termed Reinforcement Learning-based Software Refinement (RLSR), which leverages Reinforcement Learning techniques to address this challenge. RLSR enables an automated software refinement process that incorporates quality-driven intelligent software development as an early decision-making strategy. By proposing a Q-learning-based approach, RLSR facilitates the automatic refinement of software in dynamic environments while optimizing the utilization of computational resources and time. Additionally, the convergence rate to an optimal policy during the refinement process is investigated. The results demonstrate that training the policy using throughput values leads to significantly faster convergence to optimal rewards. This study evaluates RLSR based on various metrics, including episode length, reward over time, and reward distributions on a running example. Furthermore, to illustrate the effectiveness and applicability of the proposed method, a comparative analysis is applied to three refinable software designs, such as the E-commerce platform, smart booking platform, and Web-based GIS transformation system. The comparison between Q-learning and the proposed algorithm reveals that the refinement outcomes achieved with the proposed algorithm are superior, particularly when an adequate number of learning steps and a comprehensive historical dataset are available. The findings emphasize the potential of leveraging reinforcement learning techniques for automating software refinement and improving the efficiency of the model-driven development process.
{"title":"Knowledge-enhanced software refinement: leveraging reinforcement learning for search-based quality engineering","authors":"Maryam Nooraei Abadeh","doi":"10.1007/s10515-024-00456-7","DOIUrl":"10.1007/s10515-024-00456-7","url":null,"abstract":"<div><p>In the rapidly evolving software development industry, the early identification of optimal design alternatives and accurate performance prediction are critical for developing efficient software products. This paper introduces a novel approach to software refinement, termed Reinforcement Learning-based Software Refinement (RLSR), which leverages Reinforcement Learning techniques to address this challenge. RLSR enables an automated software refinement process that incorporates quality-driven intelligent software development as an early decision-making strategy. By proposing a Q-learning-based approach, RLSR facilitates the automatic refinement of software in dynamic environments while optimizing the utilization of computational resources and time. Additionally, the convergence rate to an optimal policy during the refinement process is investigated. The results demonstrate that training the policy using throughput values leads to significantly faster convergence to optimal rewards. This study evaluates RLSR based on various metrics, including episode length, reward over time, and reward distributions on a running example. Furthermore, to illustrate the effectiveness and applicability of the proposed method, a comparative analysis is applied to three refinable software designs, such as the E-commerce platform, smart booking platform, and Web-based GIS transformation system. The comparison between Q-learning and the proposed algorithm reveals that the refinement outcomes achieved with the proposed algorithm are superior, particularly when an adequate number of learning steps and a comprehensive historical dataset are available. The findings emphasize the potential of leveraging reinforcement learning techniques for automating software refinement and improving the efficiency of the model-driven development process.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-22DOI: 10.1007/s10515-024-00455-8
Zhiqiang Li, Qiannan Du, Hongyu Zhang, Xiao-Yuan Jing, Fei Wu
Just-in-time software defect prediction (JIT-SDP) is a fine-grained, easy-to-trace, and practical method. Unfortunately, JIT-SDP usually suffers from the class imbalance problem, which affects the performance of the models. Data sampling is one of the commonly used class imbalance techniques to overcome this problem. However, there is a lack of comprehensive empirical studies to compare different data sampling techniques on the performance of JIT-SDP. In this paper, we consider both defect classification and defect ranking, two typical application scenarios. To this end, we performed an empirical comparison of 10 data sampling algorithms on the performance of JIT-SDP. Extensive experiments on 10 open-source projects with 12 performance measures show that the effectiveness of data sampling techniques can indeed vary relying on the specific evaluation measures in both defect classification and defect ranking scenarios. Specifically, the RUM algorithm has demonstrated superior performance overall in the context of defect classification, particularly in F-measure, AUC, and MCC. On the other hand, for defect ranking, the ENN algorithm has emerged as the most favorable option, exhibiting perfect results in (P_{opt}), Recall@20%, and F-measure@20%. However, data sampling techniques can lead to an increase in false alarms and require the inspection of a higher number of changes. These findings highlight the importance of carefully selecting the appropriate data sampling technique based on the specific evaluation measures for different scenarios.
{"title":"An empirical study of data sampling techniques for just-in-time software defect prediction","authors":"Zhiqiang Li, Qiannan Du, Hongyu Zhang, Xiao-Yuan Jing, Fei Wu","doi":"10.1007/s10515-024-00455-8","DOIUrl":"10.1007/s10515-024-00455-8","url":null,"abstract":"<div><p>Just-in-time software defect prediction (JIT-SDP) is a fine-grained, easy-to-trace, and practical method. Unfortunately, JIT-SDP usually suffers from the class imbalance problem, which affects the performance of the models. Data sampling is one of the commonly used class imbalance techniques to overcome this problem. However, there is a lack of comprehensive empirical studies to compare different data sampling techniques on the performance of JIT-SDP. In this paper, we consider both defect classification and defect ranking, two typical application scenarios. To this end, we performed an empirical comparison of 10 data sampling algorithms on the performance of JIT-SDP. Extensive experiments on 10 open-source projects with 12 performance measures show that the effectiveness of data sampling techniques can indeed vary relying on the specific evaluation measures in both defect classification and defect ranking scenarios. Specifically, the RUM algorithm has demonstrated superior performance overall in the context of defect classification, particularly in <i>F-measure</i>, <i>AUC</i>, and <i>MCC</i>. On the other hand, for defect ranking, the ENN algorithm has emerged as the most favorable option, exhibiting perfect results in <span>(P_{opt})</span>, <i>Recall@20%</i>, and <i>F-measure@20%</i>. However, data sampling techniques can lead to an increase in false alarms and require the inspection of a higher number of changes. These findings highlight the importance of carefully selecting the appropriate data sampling technique based on the specific evaluation measures for different scenarios.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.
{"title":"SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration","authors":"Yishu Li, Jacky Keung, Zhen Yang, Xiaoxue Ma, Jingyu Zhang, Shuo Liu","doi":"10.1007/s10515-024-00448-7","DOIUrl":"10.1007/s10515-024-00448-7","url":null,"abstract":"<div><p>In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. This paper’s objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. We believe that our review of the literature will help the community develop better approaches to clean data.
机器学习(ML)被越来越多的系统集成到各种应用中。由于 ML 模型的性能在很大程度上取决于它所训练的数据的质量,因此人们对检测和修复数据错误(即数据清理)的方法越来越感兴趣。研究人员也在探索如何将 ML 用于数据清洗,从而在 ML 和数据清洗之间建立起双重关系。据我们所知,目前还没有一项研究对这种关系进行全面回顾。本文有两个目的。首先,本文旨在总结用于 ML 的数据清洗和用于数据清洗的 ML 的最新方法。其次,本文提出了未来的工作建议。我们对 2016 年至 2022 年间发表的论文进行了系统的文献综述。我们确定了使用 ML 和针对 ML 的不同类型的数据清洗活动:特征清洗、标签清洗、实体匹配、离群点检测、估算和整体数据清洗。我们总结了 101 篇涉及各种数据清洗活动的论文内容,并提供了 24 项未来工作建议。我们的综述强调了许多有前途的数据清洗技术,这些技术可以进一步扩展。我们相信,我们的文献综述将有助于社区开发出更好的数据清理方法。
{"title":"Data cleaning and machine learning: a systematic literature review","authors":"Pierre-Olivier Côté, Amin Nikanjam, Nafisa Ahmed, Dmytro Humeniuk, Foutse Khomh","doi":"10.1007/s10515-024-00453-w","DOIUrl":"10.1007/s10515-024-00453-w","url":null,"abstract":"<div><p>Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. This paper’s objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. We believe that our review of the literature will help the community develop better approaches to clean data.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-11DOI: 10.1007/s10515-024-00450-z
Miltiadis Siavvas, Dimitrios Tsoukalas, Charalambos Marantos, Lazaros Papadopoulos, Christos Lamprakos, Oliviu Matei, Christos Strydis, Muhammad Ali Siddiqi, Philippe Chrobocinski, Katarzyna Filus, Joanna Domańska, Paris Avgeriou, Apostolos Ampatzoglou, Dimitrios Soudris, Alexander Chatzigeorgiou, Erol Gelenbe, Dionysios Kehagias, Dimitrios Tzovaras
Developing embedded software applications is a challenging task, chiefly due to the limitations that are imposed by the hardware devices or platforms on which they operate, as well as due to the heterogeneous non-functional requirements that they need to exhibit. Modern embedded systems need to be energy efficient and dependable, whereas their maintenance costs should be minimized, in order to ensure the success and longevity of their application. Being able to build embedded software that satisfies the imposed hardware limitations, while maintaining high quality with respect to critical non-functional requirements is a difficult task that requires proper assistance. To this end, in the present paper, we present the SDK4ED Platform, which facilitates the development of embedded software that exhibits high quality with respect to important quality attributes, with a main focus on energy consumption, dependability, and maintainability. This is achieved through the provision of state-of-the-art and novel quality attribute-specific monitoring and optimization mechanisms, as well as through a novel fuzzy multi-criteria decision-making mechanism for facilitating the selection of code refactorings, which is based on trade-off analysis among the three main attributes of choice. Novel forecasting techniques are also proposed to further support decision making during the development of embedded software. The usefulness, practicality, and industrial relevance of the SDK4ED platform were evaluated in a real-world setting, through three use cases on actual commercial embedded software applications stemming from the airborne, automotive, and healthcare domains, as well as through an industrial study. To the best of our knowledge, this is the first quality analysis platform that focuses on multiple quality criteria, which also takes into account their trade-offs to facilitate code refactoring selection.
{"title":"SDK4ED: a platform for building energy efficient, dependable, and maintainable embedded software","authors":"Miltiadis Siavvas, Dimitrios Tsoukalas, Charalambos Marantos, Lazaros Papadopoulos, Christos Lamprakos, Oliviu Matei, Christos Strydis, Muhammad Ali Siddiqi, Philippe Chrobocinski, Katarzyna Filus, Joanna Domańska, Paris Avgeriou, Apostolos Ampatzoglou, Dimitrios Soudris, Alexander Chatzigeorgiou, Erol Gelenbe, Dionysios Kehagias, Dimitrios Tzovaras","doi":"10.1007/s10515-024-00450-z","DOIUrl":"10.1007/s10515-024-00450-z","url":null,"abstract":"<div><p>Developing embedded software applications is a challenging task, chiefly due to the limitations that are imposed by the hardware devices or platforms on which they operate, as well as due to the heterogeneous non-functional requirements that they need to exhibit. Modern embedded systems need to be energy efficient and dependable, whereas their maintenance costs should be minimized, in order to ensure the success and longevity of their application. Being able to build embedded software that satisfies the imposed hardware limitations, while maintaining high quality with respect to critical non-functional requirements is a difficult task that requires proper assistance. To this end, in the present paper, we present the SDK4ED Platform, which facilitates the development of embedded software that exhibits high quality with respect to important quality attributes, with a main focus on energy consumption, dependability, and maintainability. This is achieved through the provision of state-of-the-art and novel quality attribute-specific monitoring and optimization mechanisms, as well as through a novel fuzzy multi-criteria decision-making mechanism for facilitating the selection of code refactorings, which is based on trade-off analysis among the three main attributes of choice. Novel forecasting techniques are also proposed to further support decision making during the development of embedded software. The usefulness, practicality, and industrial relevance of the SDK4ED platform were evaluated in a real-world setting, through three use cases on actual commercial embedded software applications stemming from the airborne, automotive, and healthcare domains, as well as through an industrial study. To the best of our knowledge, this is the first quality analysis platform that focuses on multiple quality criteria, which also takes into account their trade-offs to facilitate code refactoring selection.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}