首页 > 最新文献

Empirical Software Engineering最新文献

英文 中文
Reinforcement learning for online testing of autonomous driving systems: a replication and extension study. 用于自动驾驶系统在线测试的强化学习:一项复制和扩展研究。
IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2024-11-05 DOI: 10.1007/s10664-024-10562-5
Luca Giamattei, Matteo Biagiola, Roberto Pietrantuono, Stefano Russo, Paolo Tonella

In a recent study, Reinforcement Learning (RL) used in combination with many-objective search, has been shown to outperform alternative techniques (random search and many-objective search) for online testing of Deep Neural Network-enabled systems. The empirical evaluation of these techniques was conducted on a state-of-the-art Autonomous Driving System (ADS). This work is a replication and extension of that empirical study. Our replication shows that RL does not outperform pure random test generation in a comparison conducted under the same settings of the original study, but with no confounding factor coming from the way collisions are measured. Our extension aims at eliminating some of the possible reasons for the poor performance of RL observed in our replication: (1) the presence of reward components providing contrasting feedback to the RL agent; (2) the usage of an RL algorithm (Q-learning) which requires discretization of an intrinsically continuous state space. Results show that our new RL agent is able to converge to an effective policy that outperforms random search. Results also highlight other possible improvements, which open to further investigations on how to best leverage RL for online ADS testing.

在最近的一项研究中,强化学习(RL)与多目标搜索结合使用,在深度神经网络支持系统的在线测试中表现优于其他技术(随机搜索和多目标搜索)。对这些技术的实证评估是在最先进的自动驾驶系统(ADS)上进行的。这项工作是该实证研究的复制和扩展。我们的重复研究表明,在与原始研究相同的设置下进行的比较中,RL 并没有优于纯粹的随机测试生成,但碰撞测量的方式并没有带来混杂因素。我们的扩展旨在消除在复制中观察到的 RL 性能不佳的一些可能原因:(1) 向 RL 代理提供对比反馈的奖励成分的存在;(2) RL 算法(Q-learning)的使用要求对本质上连续的状态空间进行离散化。结果表明,我们的新 RL 代理能够收敛到优于随机搜索的有效策略。结果还凸显了其他可能的改进,这为进一步研究如何最好地利用 RL 进行在线 ADS 测试提供了可能。
{"title":"Reinforcement learning for online testing of autonomous driving systems: a replication and extension study.","authors":"Luca Giamattei, Matteo Biagiola, Roberto Pietrantuono, Stefano Russo, Paolo Tonella","doi":"10.1007/s10664-024-10562-5","DOIUrl":"10.1007/s10664-024-10562-5","url":null,"abstract":"<p><p>In a recent study, Reinforcement Learning (RL) used in combination with many-objective search, has been shown to outperform alternative techniques (random search and many-objective search) for online testing of Deep Neural Network-enabled systems. The empirical evaluation of these techniques was conducted on a state-of-the-art Autonomous Driving System (ADS). This work is a replication and extension of that empirical study. Our replication shows that RL does not outperform pure random test generation in a comparison conducted under the same settings of the original study, but with no confounding factor coming from the way collisions are measured. Our extension aims at eliminating some of the possible reasons for the poor performance of RL observed in our replication: (1) the presence of reward components providing contrasting feedback to the RL agent; (2) the usage of an RL algorithm (Q-learning) which requires discretization of an intrinsically continuous state space. Results show that our new RL agent is able to converge to an effective policy that outperforms random search. Results also highlight other possible improvements, which open to further investigations on how to best leverage RL for online ADS testing.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 1","pages":"19"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142602130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of data complexity on classifier performance. 数据复杂性对分类器性能的影响。
IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-01 Epub Date: 2024-10-31 DOI: 10.1007/s10664-024-10554-5
Jonas Eberlein, Daniel Rodriguez, Rachel Harrison

The research area of Software Defect Prediction (SDP) is both extensive and popular, and is often treated as a classification problem. Improvements in classification, pre-processing and tuning techniques, (together with many factors which can influence model performance) have encouraged this trend. However, no matter the effort in these areas, it seems that there is a ceiling in the performance of the classification models used in SDP. In this paper, the issue of classifier performance is analysed from the perspective of data complexity. Specifically, data complexity metrics are calculated using the Unified Bug Dataset, a collection of well-known SDP datasets, and then checked for correlation with the defect prediction performance of machine learning classifiers (in particular, the classifiers C5.0, Naive Bayes, Artificial Neural Networks, Random Forests, and Support Vector Machines). In this work, different domains of competence and incompetence are identified for the classifiers. Similarities and differences between the classifiers and the performance metrics are found and the Unified Bug Dataset is analysed from the perspective of data complexity. We found that certain classifiers work best in certain situations and that all data complexity metrics can be problematic, although certain classifiers did excel in some situations.

软件缺陷预测(SDP)研究领域既广泛又流行,通常被视为一个分类问题。分类、预处理和调整技术(以及许多可能影响模型性能的因素)的改进推动了这一趋势。然而,无论在这些领域做出怎样的努力,SDP 中使用的分类模型的性能似乎都有一个上限。本文从数据复杂性的角度分析了分类器的性能问题。具体地说,数据复杂度指标是利用著名的 SDP 数据集 "统一错误数据集 "计算得出的,然后检查其与机器学习分类器(特别是分类器 C5.0、奈夫贝叶、人工神经网络、随机森林和支持向量机)的缺陷预测性能之间的相关性。在这项工作中,为分类器确定了能力和不称职的不同领域。我们发现了分类器和性能指标之间的异同,并从数据复杂性的角度对统一错误数据集进行了分析。我们发现,某些分类器在某些情况下效果最佳,尽管某些分类器在某些情况下表现出色,但所有数据复杂度指标都可能存在问题。
{"title":"The effect of data complexity on classifier performance.","authors":"Jonas Eberlein, Daniel Rodriguez, Rachel Harrison","doi":"10.1007/s10664-024-10554-5","DOIUrl":"10.1007/s10664-024-10554-5","url":null,"abstract":"<p><p>The research area of Software Defect Prediction (SDP) is both extensive and popular, and is often treated as a classification problem. Improvements in classification, pre-processing and tuning techniques, (together with many factors which can influence model performance) have encouraged this trend. However, no matter the effort in these areas, it seems that there is a ceiling in the performance of the classification models used in SDP. In this paper, the issue of classifier performance is analysed from the perspective of data complexity. Specifically, data complexity metrics are calculated using the Unified Bug Dataset, a collection of well-known SDP datasets, and then checked for correlation with the defect prediction performance of machine learning classifiers (in particular, the classifiers C5.0, Naive Bayes, Artificial Neural Networks, Random Forests, and Support Vector Machines). In this work, different domains of competence and incompetence are identified for the classifiers. Similarities and differences between the classifiers and the performance metrics are found and the Unified Bug Dataset is analysed from the perspective of data complexity. We found that certain classifiers work best in certain situations and that all data complexity metrics can be problematic, although certain classifiers did excel in some situations.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 1","pages":"16"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11527945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues 关于开发人员在 GitHub 拉取请求和问题中使用 ChatGPT 共享对话的实证研究
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-16 DOI: 10.1007/s10664-024-10540-x
Huizi Hao, Kazi Amit Hasan, Hong Qin, Marcos Macedo, Yuan Tian, Steven H. H. Ding, Ahmed E. Hassan

ChatGPT has significantly impacted software development practices, providing substantial assistance to developers in various tasks, including coding, testing, and debugging. Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored. In this paper, we analyze a dataset of 210 and 370 developers’ shared conversations with ChatGPT in GitHub pull requests (PRs) and issues. We manually examined the content of the conversations and characterized the dynamics of the sharing behavior, i.e., understanding the rationale behind the sharing, identifying the locations where the conversations were shared, and determining the roles of the developers who shared them. Our main observations are: (1) Developers seek ChatGPT’s assistance across 16 types of software engineering inquiries. In both conversations shared in PRs and issues, the most frequently encountered inquiry categories include code generation, conceptual questions, how-to guides, issue resolution, and code review. (2) Developers frequently engage with ChatGPT via multi-turn conversations where each prompt can fulfill various roles, such as unveiling initial or new tasks, iterative follow-up, and prompt refinement. Multi-turn conversations account for 33.2% of the conversations shared in PRs and 36.9% in issues. (3) In collaborative coding, developers leverage shared conversations with ChatGPT to facilitate their role-specific contributions, whether as authors of PRs or issues, code reviewers, or collaborators on issues. Our work serves as the first step towards understanding the dynamics between developers and ChatGPT in collaborative software development and opens up new directions for future research on the topic.

ChatGPT 极大地影响了软件开发实践,为开发人员的各种任务(包括编码、测试和调试)提供了大量帮助。尽管 ChatGPT 被广泛采用,但其作为协作编码助手的影响在很大程度上仍未得到探讨。在本文中,我们分析了 210 和 370 个开发人员在 GitHub 拉请求(PR)和问题中与 ChatGPT 的共享对话数据集。我们手动检查了对话的内容,并描述了共享行为的动态特征,即了解共享背后的理由、识别对话的共享位置以及确定共享对话的开发人员的角色。我们的主要观察结果如下(1) 开发人员在 16 种软件工程咨询中寻求 ChatGPT 的帮助。在公关和问题共享的对话中,最常遇到的咨询类别包括代码生成、概念问题、操作指南、问题解决和代码审查。(2)开发人员经常通过多轮会话与 ChatGPT 进行交互,在多轮会话中,每个提示都能发挥不同的作用,如揭示初始任务或新任务、迭代跟进和提示完善。多轮对话占 PR 中共享对话的 33.2%,占问题中共享对话的 36.9%。(3) 在协作编码中,开发人员利用与 ChatGPT 的共享对话来促进其特定角色的贡献,无论是作为 PR 或问题的作者、代码审查员还是问题的协作者。我们的工作为理解软件协同开发中开发人员与 ChatGPT 之间的动态关系迈出了第一步,并为今后的相关研究开辟了新的方向。
{"title":"An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues","authors":"Huizi Hao, Kazi Amit Hasan, Hong Qin, Marcos Macedo, Yuan Tian, Steven H. H. Ding, Ahmed E. Hassan","doi":"10.1007/s10664-024-10540-x","DOIUrl":"https://doi.org/10.1007/s10664-024-10540-x","url":null,"abstract":"<p>ChatGPT has significantly impacted software development practices, providing substantial assistance to developers in various tasks, including coding, testing, and debugging. Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored. In this paper, we analyze a dataset of 210 and 370 developers’ shared conversations with ChatGPT in GitHub pull requests (PRs) and issues. We manually examined the content of the conversations and characterized the dynamics of the sharing behavior, i.e., understanding the rationale behind the sharing, identifying the locations where the conversations were shared, and determining the roles of the developers who shared them. Our main observations are: (1) Developers seek ChatGPT’s assistance across 16 types of software engineering inquiries. In both conversations shared in PRs and issues, the most frequently encountered inquiry categories include code generation, conceptual questions, how-to guides, issue resolution, and code review. (2) Developers frequently engage with ChatGPT via multi-turn conversations where each prompt can fulfill various roles, such as unveiling initial or new tasks, iterative follow-up, and prompt refinement. Multi-turn conversations account for 33.2% of the conversations shared in PRs and 36.9% in issues. (3) In collaborative coding, developers leverage shared conversations with ChatGPT to facilitate their role-specific contributions, whether as authors of PRs or issues, code reviewers, or collaborators on issues. Our work serves as the first step towards understanding the dynamics between developers and ChatGPT in collaborative software development and opens up new directions for future research on the topic.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"24 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality issues in machine learning software systems 机器学习软件系统的质量问题
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-11 DOI: 10.1007/s10664-024-10536-7
Pierre-Olivier Côté, Amin Nikanjam, Rached Bouchoucha, Ilan Basta, Mouna Abidi, Foutse Khomh

Context

An increasing demand is observed in various domains to employ Machine Learning (ML) for solving complex problems. ML models are implemented as software components and deployed in Machine Learning Software Systems (MLSSs).

Problem

There is a strong need for ensuring the serving quality of MLSSs. False or poor decisions of such systems can lead to malfunction of other systems, significant financial losses, or even threats to human life. The quality assurance of MLSSs is considered a challenging task and currently is a hot research topic.

Objective

This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners. This empirical study aims to identify a catalog of quality issues in MLSSs.

Method

We conduct a set of interviews with practitioners/experts, to gather insights about their experience and practices when dealing with quality issues. We validate the identified quality issues via a survey with ML practitioners.

Results

Based on the content of 37 interviews, we identified 18 recurring quality issues and 24 strategies to mitigate them. For each identified issue, we describe the causes and consequences according to the practitioners’ experience.

Conclusion

We believe the catalog of issues developed in this study will allow the community to develop efficient quality assurance tools for ML models and MLSSs. A replication package of our study is available on our public GitHub repository.

背景各个领域对使用机器学习(ML)解决复杂问题的需求日益增长。ML 模型以软件组件的形式实现,并部署在机器学习软件系统 (MLSS) 中。此类系统的错误或错误决策可能导致其他系统失灵、重大经济损失,甚至威胁人类生命。MLSS 的质量保证被认为是一项具有挑战性的任务,目前也是一个热门研究课题。方法我们对从业人员/专家进行了一系列访谈,收集他们在处理质量问题时的经验和做法。结果根据 37 次访谈的内容,我们发现了 18 个经常出现的质量问题和 24 个缓解这些问题的策略。结论我们相信,本研究中开发的问题目录将有助于社区为 ML 模型和 MLSS 开发高效的质量保证工具。我们在 GitHub 公共仓库中提供了本研究的复制包。
{"title":"Quality issues in machine learning software systems","authors":"Pierre-Olivier Côté, Amin Nikanjam, Rached Bouchoucha, Ilan Basta, Mouna Abidi, Foutse Khomh","doi":"10.1007/s10664-024-10536-7","DOIUrl":"https://doi.org/10.1007/s10664-024-10536-7","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>An increasing demand is observed in various domains to employ Machine Learning (ML) for solving complex problems. ML models are implemented as software components and deployed in Machine Learning Software Systems (MLSSs).</p><h3 data-test=\"abstract-sub-heading\">Problem</h3><p>There is a strong need for ensuring the serving quality of MLSSs. False or poor decisions of such systems can lead to malfunction of other systems, significant financial losses, or even threats to human life. The quality assurance of MLSSs is considered a challenging task and currently is a hot research topic.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners. This empirical study aims to identify a catalog of quality issues in MLSSs.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We conduct a set of interviews with practitioners/experts, to gather insights about their experience and practices when dealing with quality issues. We validate the identified quality issues via a survey with ML practitioners.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Based on the content of 37 interviews, we identified 18 recurring quality issues and 24 strategies to mitigate them. For each identified issue, we describe the causes and consequences according to the practitioners’ experience.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>We believe the catalog of issues developed in this study will allow the community to develop efficient quality assurance tools for ML models and MLSSs. A replication package of our study is available on our public GitHub repository.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"4 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An empirical study of token-based micro commits 基于代币的微提交实证研究
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-04 DOI: 10.1007/s10664-024-10527-8
Masanari Kondo, Daniel M. German, Yasutaka Kamei, Naoyasu Ubayashi, Osamu Mizuno

In software development, developers frequently apply maintenance activities to the source code that change a few lines by a single commit. A good understanding of the characteristics of such small changes can support quality assurance approaches (e.g., automated program repair), as it is likely that small changes are addressing deficiencies in other changes; thus, understanding the reasons for creating small changes can help understand the types of errors introduced. Eventually, these reasons and the types of errors can be used to enhance quality assurance approaches for improving code quality. While prior studies used code churns to characterize and investigate the small changes, such a definition has a critical limitation. Specifically, it loses the information of changed tokens in a line. For example, this definition fails to distinguish the following two one-line changes: (1) changing a string literal to fix a displayed message and (2) changing a function call and adding a new parameter. These are definitely maintenance activities, but we deduce that researchers and practitioners are interested in supporting the latter change. To address this limitation, in this paper, we define micro commits, a type of small change based on changed tokens. Our goal is to quantify small changes using changed tokens. Changed tokens allow us to identify small changes more precisely. In fact, this token-level definition can distinguish the above example. We investigate defined micro commits in four OSS projects and understand their characteristics as the first empirical study on token-based micro commits. We find that micro commits mainly replace a single name or literal token, and micro commits are more likely used to fix bugs. Additionally, we propose the use of token-based information to support software engineering approaches in which very small changes significantly affect their effectiveness.

在软件开发过程中,开发人员经常会对源代码进行维护,通过一次提交改变几行代码。充分了解此类小改动的特点可以为质量保证方法(如自动程序修复)提供支持,因为小改动很可能是为了解决其他改动中的缺陷;因此,了解产生小改动的原因有助于了解引入错误的类型。最终,这些原因和错误类型可用于加强质量保证方法,以提高代码质量。虽然之前的研究使用代码搅动来描述和研究小改动,但这样的定义有很大的局限性。具体来说,它丢失了一行中已更改标记的信息。例如,该定义无法区分以下两种单行更改:(1) 更改字符串字面以修复显示的信息;(2) 更改函数调用并添加新参数。这些无疑都是维护活动,但我们推断,研究人员和从业人员更感兴趣的是支持后一种变更。为了解决这一局限性,我们在本文中定义了微提交(micro commits),一种基于变更标记的小变更类型。我们的目标是使用已更改标记来量化小变更。已更改标记让我们能更精确地识别小变更。事实上,这种标记级定义可以区分上述例子。我们对四个开放源码软件项目中定义的微提交进行了调查,并了解了它们的特点,这是首次对基于令牌的微提交进行实证研究。我们发现,微提交主要替换单个名称或字面标记,而且微提交更有可能用于修复错误。此外,我们还建议使用基于标记的信息来支持软件工程方法,在这些方法中,非常小的改动会显著影响其有效性。
{"title":"An empirical study of token-based micro commits","authors":"Masanari Kondo, Daniel M. German, Yasutaka Kamei, Naoyasu Ubayashi, Osamu Mizuno","doi":"10.1007/s10664-024-10527-8","DOIUrl":"https://doi.org/10.1007/s10664-024-10527-8","url":null,"abstract":"<p>In software development, developers frequently apply maintenance activities to the source code that change a few lines by a single commit. A good understanding of the characteristics of such small changes can support quality assurance approaches (e.g., automated program repair), as it is likely that small changes are addressing deficiencies in other changes; thus, understanding the reasons for creating small changes can help understand the types of errors introduced. Eventually, these reasons and the types of errors can be used to enhance quality assurance approaches for improving code quality. While prior studies used code churns to characterize and investigate the small changes, such a definition has a critical limitation. Specifically, it loses the information of changed tokens in a line. For example, this definition fails to distinguish the following two one-line changes: (1) changing a string literal to fix a displayed message and (2) changing a function call and adding a new parameter. These are definitely maintenance activities, but we deduce that researchers and practitioners are interested in supporting the latter change. To address this limitation, in this paper, we define <i>micro commits</i>, a type of small change based on changed tokens. Our goal is to quantify small changes using changed tokens. Changed tokens allow us to identify small changes more precisely. In fact, this token-level definition can distinguish the above example. We investigate defined micro commits in four OSS projects and understand their characteristics as the first empirical study on token-based micro commits. We find that micro commits mainly replace a single name or literal token, and micro commits are more likely used to fix bugs. Additionally, we propose the use of token-based information to support software engineering approaches in which very small changes significantly affect their effectiveness.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Software product line testing: a systematic literature review 软件产品生产线测试:系统文献综述
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-02 DOI: 10.1007/s10664-024-10516-x
Halimeh Agh, Aidin Azamnouri, Stefan Wagner

A Software Product Line (SPL) is a software development paradigm in which a family of software products shares a set of core assets. Testing has a vital role in both single-system development and SPL development in identifying potential faults by examining the behavior of a product or products, but it is especially challenging in SPL. There have been many research contributions in the SPL testing field; therefore, assessing the current state of research and practice is necessary to understand the progress in testing practices and to identify the gap between required techniques and existing approaches. This paper aims to survey existing research on SPL testing to provide researchers and practitioners with up-to-date evidence and issues that enable further development of the field. To this end, we conducted a Systematic Literature Review (SLR) with seven research questions in which we identified and analyzed 118 studies dating from 2003 to 2022. The results indicate that the literature proposes many techniques for specific aspects (e.g., controlling cost/effort in SPL testing); however, other elements (e.g., regression testing and non-functional testing) still need to be covered by existing research. Furthermore, most approaches are evaluated by only one empirical method, most of which are academic evaluations. This may jeopardize the adoption of approaches in industry. The results of this study can help identify gaps in SPL testing since specific points of SPL Engineering still need to be addressed entirely.

软件产品系列(SPL)是一种软件开发模式,其中的一系列软件产品共享一套核心资产。测试在单系统开发和 SPL 开发中都起着至关重要的作用,它可以通过检查一个或多个产品的行为来识别潜在故障,但在 SPL 中尤其具有挑战性。SPL 测试领域的研究成果很多,因此,有必要对研究和实践现状进行评估,以了解测试实践的进展,找出所需技术与现有方法之间的差距。本文旨在调查 SPL 测试的现有研究,为研究人员和从业人员提供最新的证据和问题,以促进该领域的进一步发展。为此,我们针对七个研究问题进行了系统文献综述(SLR),确定并分析了 2003 年至 2022 年期间的 118 项研究。结果表明,文献针对特定方面(如 SPL 测试中的成本/努力控制)提出了许多技术;然而,其他要素(如回归测试和非功能测试)仍有待现有研究的覆盖。此外,大多数方法只用一种实证方法进行评估,其中大部分是学术评估。这可能会妨碍行业采用这些方法。本研究的结果有助于找出 SPL 测试中的不足之处,因为 SPL 工程的具体要点仍有待全面解决。
{"title":"Software product line testing: a systematic literature review","authors":"Halimeh Agh, Aidin Azamnouri, Stefan Wagner","doi":"10.1007/s10664-024-10516-x","DOIUrl":"https://doi.org/10.1007/s10664-024-10516-x","url":null,"abstract":"<p>A Software Product Line (SPL) is a software development paradigm in which a family of software products shares a set of core assets. Testing has a vital role in both single-system development and SPL development in identifying potential faults by examining the behavior of a product or products, but it is especially challenging in SPL. There have been many research contributions in the SPL testing field; therefore, assessing the current state of research and practice is necessary to understand the progress in testing practices and to identify the gap between required techniques and existing approaches. This paper aims to survey existing research on SPL testing to provide researchers and practitioners with up-to-date evidence and issues that enable further development of the field. To this end, we conducted a Systematic Literature Review (SLR) with seven research questions in which we identified and analyzed 118 studies dating from 2003 to 2022. The results indicate that the literature proposes many techniques for specific aspects (e.g., controlling cost/effort in SPL testing); however, other elements (e.g., regression testing and non-functional testing) still need to be covered by existing research. Furthermore, most approaches are evaluated by only one empirical method, most of which are academic evaluations. This may jeopardize the adoption of approaches in industry. The results of this study can help identify gaps in SPL testing since specific points of SPL Engineering still need to be addressed entirely.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"9 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consensus task interaction trace recommender to guide developers’ software navigation 共识任务交互跟踪推荐器为开发人员的软件导航提供指导
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-02 DOI: 10.1007/s10664-024-10528-7
Layan Etaiwi, Pascal Sager, Yann-Gaël Guéhéneuc, Sylvie Hamel

Developers must complete change tasks on large software systems for maintenance and development purposes. Having a custom software system with numerous instances that meet the growing client demand for features and functionalities increases the software complexity. Developers, especially newcomers, must spend a significant amount of time navigating through the source code and switching back and forth between files in order to understand such a system and find the parts relevant for performing current tasks. This navigation can be difficult, time-consuming and affect developers’ productivity. To help guide developers’ navigation towards successfully resolving tasks with minimal time and effort, we present a task-based recommendation approach that exploits aggregated developers’ interaction traces. Our novel approach, Consensus Task Interaction Trace Recommender (CITR), recommends file(s)-to-edit that help perform a set of tasks based on a tasks-related set of interaction traces obtained from developers who performed similar change tasks on the same or different custom instances of the same system. Our approach uses a consensus algorithm, which takes as input task-related interaction traces and recommends a consensus task interaction trace that developers can use to complete given similar change tasks that require editing (a) common file(s). To evaluate the efficiency of our approach, we perform three different evaluations. The first evaluation measures the accuracy of CITR recommendations. In the second evaluation, we assess to what extent CITR can help developers by conducting an observational controlled experiment in which two groups of developers performed evaluation tasks with and without the recommendations of CITR. In the third and last evaluation, we compare CITR to a state-of-the-art recommendation approach, MI. Results report with statistical significance that CITR can correctly recommend on average 73% of the files to be edited. Furthermore, they show that CITR can increase developers’ successful task completion rate. CITR outperforms MI by an average of 31% higher recommendation accuracy.

出于维护和开发目的,开发人员必须完成大型软件系统的变更任务。定制软件系统拥有众多实例,可以满足客户对特性和功能不断增长的需求,这就增加了软件的复杂性。开发人员,尤其是新手,必须花费大量时间浏览源代码并在文件之间来回切换,才能理解这样的系统并找到与执行当前任务相关的部分。这种浏览方式既困难又耗时,还会影响开发人员的工作效率。为了帮助引导开发人员以最少的时间和精力成功完成任务,我们提出了一种基于任务的推荐方法,该方法利用了开发人员的聚合交互痕迹。我们的新方法--共识任务交互跟踪推荐器(CITR)--根据从在同一系统的相同或不同自定义实例上执行过类似变更任务的开发人员那里获得的与任务相关的交互跟踪集,推荐有助于执行一系列任务的编辑文件。我们的方法使用一种共识算法,该算法将与任务相关的交互跟踪作为输入,并推荐一种共识任务交互跟踪,开发人员可以使用该跟踪来完成需要编辑(一个或多个)共同文件的类似变更任务。为了评估我们方法的效率,我们进行了三种不同的评估。第一项评估是衡量 CITR 建议的准确性。在第二项评估中,我们通过观察控制实验来评估 CITR 对开发人员的帮助程度,在实验中,两组开发人员分别在使用和不使用 CITR 建议的情况下执行评估任务。在第三个也是最后一个评估中,我们将 CITR 与最先进的推荐方法 MI 进行了比较。结果表明,CITR 平均能正确推荐 73% 的待编辑文件,具有统计学意义。此外,结果还显示 CITR 可以提高开发人员的任务成功完成率。CITR 的推荐准确率比 MI 平均高出 31%。
{"title":"Consensus task interaction trace recommender to guide developers’ software navigation","authors":"Layan Etaiwi, Pascal Sager, Yann-Gaël Guéhéneuc, Sylvie Hamel","doi":"10.1007/s10664-024-10528-7","DOIUrl":"https://doi.org/10.1007/s10664-024-10528-7","url":null,"abstract":"<p>Developers must complete change tasks on large software systems for maintenance and development purposes. Having a custom software system with numerous instances that meet the growing client demand for features and functionalities increases the software complexity. Developers, especially newcomers, must spend a significant amount of time navigating through the source code and switching back and forth between files in order to understand such a system and find the parts relevant for performing current tasks. This navigation can be difficult, time-consuming and affect developers’ productivity. To help guide developers’ navigation towards successfully resolving tasks with minimal time and effort, we present a task-based recommendation approach that exploits aggregated developers’ interaction traces. Our novel approach, Consensus Task Interaction Trace Recommender (CITR), recommends file(s)-to-edit that help perform a set of tasks based on a tasks-related set of interaction traces obtained from developers who performed similar change tasks on the same or different custom instances of the same system. Our approach uses a consensus algorithm, which takes as input task-related interaction traces and recommends a consensus task interaction trace that developers can use to complete given similar change tasks that require editing (a) common file(s). To evaluate the efficiency of our approach, we perform three different evaluations. The first evaluation measures the accuracy of CITR recommendations. In the second evaluation, we assess to what extent CITR can help developers by conducting an observational controlled experiment in which two groups of developers performed evaluation tasks with and without the recommendations of CITR. In the third and last evaluation, we compare CITR to a state-of-the-art recommendation approach, MI. Results report with statistical significance that CITR can correctly recommend on average 73% of the files to be edited. Furthermore, they show that CITR can increase developers’ successful task completion rate. CITR outperforms MI by an average of 31% higher recommendation accuracy.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"420 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On combining commit grouping and build skip prediction to reduce redundant continuous integration activity 关于结合提交分组和构建跳转预测以减少冗余的持续集成活动
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-30 DOI: 10.1007/s10664-024-10477-1
Divya M. Kamath, Eduardo Fernandes, Bram Adams, Ahmed E. Hassan

Context

Continuous Integration (CI) is a resource intensive, widely used industry practice. The two most commonly used heuristics to reduce the number of builds are either by grouping multiple builds together or by skipping builds predicted to be safe. Yet, both techniques have their disadvantages in terms of missing build failures and respectively higher build turn-around time (delays).

Objective

We aim to bring together these two lines of research, empirically comparing their advantages and disadvantages over time, and proposing and evaluating two ways in which these build avoidance heuristics can be combined more effectively, i.e., the ML-CI model based on machine learning and the Timeout Rule.

Method

We empirically study the trade-off between reduction in the number of builds required and the speed of recognition of failing builds on a dataset of 79,482 builds from 20 open-source projects.

Results

We find that both of our hybrid heuristics can provide a significant improvement in terms of less missed build failures and lower delays than the baseline heuristics. They substantially reduce the turn-around-time of commits by 96% in comparison to skipping heuristics, the Timeout Rule also enables a median of 26.10% less builds to be scheduled than grouping heuristics.

Conclusions

Our hybrid approaches offer build engineers a better flexibility in terms of scheduling builds during CI without compromising the quality of the resulting software.

背景持续集成(CI)是一种资源密集型的、广泛使用的行业实践。要减少构建次数,最常用的两种启发式方法是将多个构建分组或跳过预测安全的构建。然而,这两种技术都有其缺点,即会遗漏构建失败和分别增加构建周转时间(延迟)。我们的目标是将这两种研究方法结合起来,通过经验比较它们在一段时间内的优缺点,并提出和评估两种可以更有效地结合这些构建避免启发式方法的方法,即方法我们在来自 20 个开源项目的 79,482 个构建数据集上实证研究了减少所需构建数量和识别失败构建速度之间的权衡。与跳过启发式相比,它们将提交的周转时间大幅缩短了 96%;与分组启发式相比,超时规则还能使编排的构建次数中位数减少 26.10%。结论我们的混合方法为构建工程师在 CI 期间编排构建提供了更好的灵活性,同时不会影响生成软件的质量。
{"title":"On combining commit grouping and build skip prediction to reduce redundant continuous integration activity","authors":"Divya M. Kamath, Eduardo Fernandes, Bram Adams, Ahmed E. Hassan","doi":"10.1007/s10664-024-10477-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10477-1","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Continuous Integration (CI) is a resource intensive, widely used industry practice. The two most commonly used heuristics to reduce the number of builds are either by grouping multiple builds together or by skipping builds predicted to be safe. Yet, both techniques have their disadvantages in terms of missing build failures and respectively higher build turn-around time (delays).</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>We aim to bring together these two lines of research, empirically comparing their advantages and disadvantages over time, and proposing and evaluating two ways in which these build avoidance heuristics can be combined more effectively, i.e., the ML-CI model based on machine learning and the Timeout Rule.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We empirically study the trade-off between reduction in the number of builds required and the speed of recognition of failing builds on a dataset of 79,482 builds from 20 open-source projects.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>We find that both of our hybrid heuristics can provide a significant improvement in terms of less missed build failures and lower delays than the baseline heuristics. They substantially reduce the turn-around-time of commits by 96% in comparison to skipping heuristics, the Timeout Rule also enables a median of 26.10% less builds to be scheduled than grouping heuristics.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Our hybrid approaches offer build engineers a better flexibility in terms of scheduling builds during CI without compromising the quality of the resulting software.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"70 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-access performance anti-patterns in data-intensive systems 数据密集型系统中的数据访问性能反模式
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-29 DOI: 10.1007/s10664-024-10535-8
Biruk Asmare Muse, Kawser Wazed Nafi, Foutse Khomh, Giuliano Antoniol

Data-intensive systems handle variable, high-volume, and high-velocity data generated by human and digital devices. Like traditional software, data-intensive systems are prone to technical debts introduced to cope-up with the pressure of time and resource constraints on developers. Data-access is a critical component of data-intensive systems, as it determines their overall performance and functionality. While data access technical debts are getting attention from the research community, technical debts that affect performance are not well investigated. This study aims to identify, categorize, and validate data-access performance anti-patterns. We collected issues from NoSQL-based and polyglot persistence open-source data-intensive systems, implemented in Java programing language, and identified 14 new data access-performance anti-patterns categorized under seven high-level categories. We conducted a developer survey to evaluate the perceived relevance and criticality of the newly identified anti-patterns and found that Improper Handling of Node Failures, Using Synchronous Connection, and Inefficient Driver API performance anti-patterns are the most critical data-access performance anti-patterns. The study findings can help improve the quality of data-intensive software systems by raising awareness of practitioners about the impact of the data-access performance anti-patterns. At the same time, the findings will help quality assurance teams to prioritize the correction of performance anti-patterns based on their criticality.

数据密集型系统处理人类和数字设备产生的可变、大量和高速数据。与传统软件一样,数据密集型系统也容易出现技术欠账,以应对开发人员在时间和资源方面的压力。数据访问是数据密集型系统的关键组成部分,因为它决定了系统的整体性能和功能。虽然数据访问技术债务受到研究界的关注,但影响性能的技术债务却没有得到很好的研究。本研究旨在识别、分类和验证数据访问性能反模式。我们从使用 Java 编程语言实现的基于 NoSQL 和多重持久性的开源数据密集型系统中收集问题,并确定了 14 种新的数据访问性能反模式,分为 7 个高级类别。我们对开发人员进行了调查,以评估新发现的反模式的相关性和关键性,并发现节点故障处理不当、使用同步连接和低效驱动程序 API 性能反模式是最关键的数据访问性能反模式。研究结果有助于提高从业人员对数据访问性能反模式影响的认识,从而提高数据密集型软件系统的质量。同时,研究结果还有助于质量保证团队根据性能反模式的关键程度,确定纠正性能反模式的优先次序。
{"title":"Data-access performance anti-patterns in data-intensive systems","authors":"Biruk Asmare Muse, Kawser Wazed Nafi, Foutse Khomh, Giuliano Antoniol","doi":"10.1007/s10664-024-10535-8","DOIUrl":"https://doi.org/10.1007/s10664-024-10535-8","url":null,"abstract":"<p>Data-intensive systems handle variable, high-volume, and high-velocity data generated by human and digital devices. Like traditional software, data-intensive systems are prone to technical debts introduced to cope-up with the pressure of time and resource constraints on developers. Data-access is a critical component of data-intensive systems, as it determines their overall performance and functionality. While data access technical debts are getting attention from the research community, technical debts that affect performance are not well investigated. This study aims to identify, categorize, and validate data-access performance anti-patterns. We collected issues from NoSQL-based and polyglot persistence open-source data-intensive systems, implemented in Java programing language, and identified 14 new data access-performance anti-patterns categorized under seven high-level categories. We conducted a developer survey to evaluate the perceived relevance and criticality of the newly identified anti-patterns and found that <i>Improper Handling of Node Failures</i>, <i>Using Synchronous Connection</i>, and <i>Inefficient Driver API</i> performance anti-patterns are the most critical data-access performance anti-patterns. The study findings can help improve the quality of data-intensive software systems by raising awareness of practitioners about the impact of the data-access performance anti-patterns. At the same time, the findings will help quality assurance teams to prioritize the correction of performance anti-patterns based on their criticality.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An extensive replication study of the ABLoTS approach for bug localization 对错误定位 ABLoTS 方法的广泛复制研究
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-24 DOI: 10.1007/s10664-024-10537-6
Feifei Niu, Enshuo Zhang, Christoph Mayr-Dorn, Wesley Klewerton Guez Assunção, Liguo Huang, Jidong Ge, Bin Luo, Alexander Egyed

Bug localization is the task of recommending source code locations (typically files) that contain the cause of a bug and hence need to be changed to fix the bug. Along these lines, information retrieval-based bug localization (IRBL) approaches have been adopted, which identify the most bug-prone files from the source code space. In current practice, a series of state-of-the-art IRBL techniques leverage the combination of different components (e.g., similar reports, version history, and code structure) to achieve better performance. ABLoTS is a recently proposed approach with the core component, TraceScore, that utilizes requirements and traceability information between different issue reports (i.e., feature requests and bug reports) to identify buggy source code snippets with promising results. To evaluate the accuracy of these results and obtain additional insights into the practical applicability of ABLoTS, we conducted a replication study of this approach with the original dataset and also on two extended datasets (i.e., additional Java dataset and Python dataset). The original dataset consists of 11 open source Java projects with 8,494 bug reports. The extended Java dataset includes 16 more projects comprising 25,893 bug reports and corresponding source code commits. The extended Python dataset consists of 12 projects with 1,289 bug reports. While we find that the TraceScore component, which is the core of ABLoTS, produces comparable or even better results with the extended datasets, we also find that we cannot reproduce the ABLoTS results, as reported in its original paper, due to an overlooked side effect of incorrectly choosing a cut-off date that led to test data leaking into training data with significant effects on performance. Additionally, we conduct experiments to assess the performance of various composers that aggregate scores from different components, revealing that Logistic Regression, fixed weight, and CombSUM outperform the other composers across all three datasets, while decision tree and random forest exhibited subpar performance.

错误定位是一项推荐源代码位置(通常是文件)的任务,这些位置包含导致错误的原因,因此需要进行修改以修复错误。按照这种思路,人们采用了基于信息检索的错误定位(IRBL)方法,从源代码空间中识别出最容易出现错误的文件。在目前的实践中,一系列最先进的 IRBL 技术利用不同组件(如类似报告、版本历史和代码结构)的组合来实现更好的性能。ABLoTS 是最近提出的一种方法,其核心组件 TraceScore 利用不同问题报告(即功能请求和错误报告)之间的需求和可追溯性信息来识别存在错误的源代码片段,并取得了良好的效果。为了评估这些结果的准确性并进一步了解 ABLoTS 的实际应用性,我们利用原始数据集和两个扩展数据集(即额外的 Java 数据集和 Python 数据集)对该方法进行了重复研究。原始数据集由 11 个开源 Java 项目和 8,494 份错误报告组成。扩展的 Java 数据集又包括 16 个项目,包含 25,893 份错误报告和相应的源代码提交。扩展 Python 数据集由 12 个项目组成,包含 1,289 份错误报告。我们发现,作为 ABLoTS 核心的 TraceScore 组件在扩展数据集上产生了与 ABLoTS 相当甚至更好的结果,但我们也发现,我们无法重现 ABLoTS 最初论文中报告的结果,原因是我们忽略了错误选择截止日期的副作用,该副作用导致测试数据泄漏到训练数据中,对性能产生了重大影响。此外,我们还进行了实验,以评估汇总来自不同组件的分数的各种合成器的性能,结果显示,在所有三个数据集上,逻辑回归、固定权重和 CombSUM 都优于其他合成器,而决策树和随机森林则表现不佳。
{"title":"An extensive replication study of the ABLoTS approach for bug localization","authors":"Feifei Niu, Enshuo Zhang, Christoph Mayr-Dorn, Wesley Klewerton Guez Assunção, Liguo Huang, Jidong Ge, Bin Luo, Alexander Egyed","doi":"10.1007/s10664-024-10537-6","DOIUrl":"https://doi.org/10.1007/s10664-024-10537-6","url":null,"abstract":"<p>Bug localization is the task of recommending source code locations (typically files) that contain the cause of a bug and hence need to be changed to fix the bug. Along these lines, information retrieval-based bug localization (IRBL) approaches have been adopted, which identify the most bug-prone files from the source code space. In current practice, a series of state-of-the-art IRBL techniques leverage the combination of different components (e.g., similar reports, version history, and code structure) to achieve better performance. ABLoTS is a recently proposed approach with the core component, TraceScore, that utilizes requirements and traceability information between different issue reports (i.e., feature requests and bug reports) to identify buggy source code snippets with promising results. To evaluate the accuracy of these results and obtain additional insights into the practical applicability of ABLoTS, we conducted a replication study of this approach with the original dataset and also on two extended datasets (i.e., additional Java dataset and Python dataset). The original dataset consists of 11 open source Java projects with 8,494 bug reports. The extended Java dataset includes 16 more projects comprising 25,893 bug reports and corresponding source code commits. The extended Python dataset consists of 12 projects with 1,289 bug reports. While we find that the TraceScore component, which is the core of ABLoTS, produces comparable or even better results with the extended datasets, we also find that we cannot reproduce the ABLoTS results, as reported in its original paper, due to an overlooked side effect of incorrectly choosing a cut-off date that led to test data leaking into training data with significant effects on performance. Additionally, we conduct experiments to assess the performance of various composers that aggregate scores from different components, revealing that Logistic Regression, fixed weight, and CombSUM outperform the other composers across all three datasets, while decision tree and random forest exhibited subpar performance.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"181 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Empirical Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1