首页 > 最新文献

Empirical Software Engineering最新文献

英文 中文
A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs 关于 GDPR 人工智能支持的 DPA 完整性检查的多方案研究
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-14 DOI: 10.1007/s10664-024-10491-3
Muhammad Ilyas Azeem, Sallam Abualhaija

Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern of requirements engineering. Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated both at a general level in GDPR as well as the obligations outlined in DPAs highlighting specific business. In other words, a DPA is yet another source from which requirements engineers can elicit legal requirements. However, the DPA must be complete according to GDPR to ensure that the elicited requirements cover the complete set of obligations. Therefore, checking the completeness of DPAs is a prerequisite step towards developing a compliant system. Analyzing DPAs with respect to GDPR entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy that addresses the completeness checking of DPAs against GDPR provisions as a text classification problem. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F(_2) score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F(_2) score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.

明确软件系统的法律要求,确保其符合适用的法规,是需求工程的一个主要关注点。组织收集的个人数据通常会与其他组织共享,以执行某些处理活动。在这种情况下,《一般数据保护条例》(GDPR)要求签发数据处理协议(DPA),对数据处理进行规范,并进一步确保个人数据受到保护。违反 GDPR 可导致高达数十亿欧元的巨额罚款。涉及个人数据处理的软件系统必须遵守 GDPR 中规定的一般法律义务,以及 DPA 中概述的针对特定业务的义务。换句话说,DPA 是需求工程师可以从中获得法律要求的另一个来源。然而,根据 GDPR,DPA 必须是完整的,以确保所激发的需求涵盖整套义务。因此,检查 DPA 的完整性是开发合规系统的前提步骤。完全手动分析 DPA 与 GDPR 的关系非常耗时,而且需要足够的法律专业知识。在本文中,我们提出了一种自动化策略,将根据 GDPR 条款对 DPA 进行完整性检查作为一个文本分类问题来解决。具体来说,我们采用了十种不同技术的替代解决方案,即传统机器学习、深度学习、语言建模和少量学习。我们工作的目标是通过实证研究这些不同技术在法律领域的应用情况。我们在一组 30 个真实的 DPA 上计算了 F(_2) 分数。我们的评估显示,基于预训练的 BERT 和 RoBERTa 语言模型,表现最好的解决方案的 F(_2) 分数分别为 86.7% 和 89.7%。我们的分析进一步表明,其他基于深度学习(如 BiLSTM)和少量学习(如 SetFit)的替代解决方案可以达到相当的准确率,但开发效率更高。
{"title":"A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs","authors":"Muhammad Ilyas Azeem, Sallam Abualhaija","doi":"10.1007/s10664-024-10491-3","DOIUrl":"https://doi.org/10.1007/s10664-024-10491-3","url":null,"abstract":"<p>Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern of requirements engineering. Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated both at a general level in GDPR as well as the obligations outlined in DPAs highlighting specific business. In other words, a DPA is yet another source from which requirements engineers can elicit legal requirements. However, the DPA must be complete according to GDPR to ensure that the elicited requirements cover the complete set of obligations. Therefore, checking the completeness of DPAs is a prerequisite step towards developing a compliant system. Analyzing DPAs with respect to GDPR entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy that addresses the completeness checking of DPAs against GDPR provisions as a text classification problem. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F<span>(_2)</span> score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F<span>(_2)</span> score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"1 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Common challenges of deep reinforcement learning applications development: an empirical study 深度强化学习应用开发的常见挑战:实证研究
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-14 DOI: 10.1007/s10664-024-10500-5
Mohammad Mehdi Morovati, Florian Tambon, Mina Taraghi, Amin Nikanjam, Foutse Khomh

Machine Learning (ML) is increasingly being adopted in different industries. Deep Reinforcement Learning (DRL) is a subdomain of ML used to produce intelligent agents. Despite recent developments in DRL technology, the main challenges that developers face in the development of DRL applications are still unknown. To fill this gap, in this paper, we conduct a large-scale empirical study of 927 DRL-related posts extracted from Stack Overflow, the most popular Q &A platform in the software community. Through the process of labeling and categorizing extracted posts, we created a taxonomy of common challenges encountered in the development of DRL applications, along with their corresponding popularity levels. This taxonomy has been validated through a survey involving 65 DRL developers. Results show that at least (45%) of developers experienced 18 of the 21 challenges identified in the taxonomy. The most frequent source of difficulty during the development of DRL applications are Comprehension, API usage, and Design problems, while Parallel processing, and DRL libraries/frameworks are classified as the most difficult challenges to address, with respect to the time required to receive an accepted answer. We hope that the research community will leverage this taxonomy to develop efficient strategies to address the identified challenges and improve the quality of DRL applications

机器学习(ML)正被越来越多地应用于各行各业。深度强化学习(DRL)是 ML 的一个子领域,用于生产智能代理。尽管 DRL 技术近年来取得了长足发展,但开发人员在开发 DRL 应用程序时所面临的主要挑战仍不为人知。为了填补这一空白,我们在本文中对从软件社区最受欢迎的问答平台 Stack Overflow 中提取的 927 篇与 DRL 相关的帖子进行了大规模实证研究。通过对提取的帖子进行标注和分类,我们创建了 DRL 应用程序开发过程中遇到的常见挑战分类法及其相应的流行程度。通过对 65 名 DRL 开发人员的调查,我们验证了这一分类法。结果表明,在分类法确定的 21 个挑战中,至少有 18 个开发者遇到过。在开发 DRL 应用程序的过程中,最常见的困难是理解问题、应用程序接口使用问题和设计问题,而并行处理和 DRL 库/框架被归类为最难解决的挑战,这与获得认可答案所需的时间有关。我们希望研究界能利用这一分类法来制定有效的策略,以应对已确定的挑战并提高 DRL 应用程序的质量。
{"title":"Common challenges of deep reinforcement learning applications development: an empirical study","authors":"Mohammad Mehdi Morovati, Florian Tambon, Mina Taraghi, Amin Nikanjam, Foutse Khomh","doi":"10.1007/s10664-024-10500-5","DOIUrl":"https://doi.org/10.1007/s10664-024-10500-5","url":null,"abstract":"<p>Machine Learning (ML) is increasingly being adopted in different industries. Deep Reinforcement Learning (DRL) is a subdomain of ML used to produce intelligent agents. Despite recent developments in DRL technology, the main challenges that developers face in the development of DRL applications are still unknown. To fill this gap, in this paper, we conduct a large-scale empirical study of <b>927</b> DRL-related posts extracted from Stack Overflow, the most popular Q &amp;A platform in the software community. Through the process of labeling and categorizing extracted posts, we created a taxonomy of common challenges encountered in the development of DRL applications, along with their corresponding popularity levels. This taxonomy has been validated through a survey involving 65 DRL developers. Results show that at least <span>(45%)</span> of developers experienced 18 of the 21 challenges identified in the taxonomy. The most frequent source of difficulty during the development of DRL applications are <i>Comprehension</i>, <i>API usage</i>, and <i>Design problems</i>, while <i>Parallel processing</i>, and <i>DRL libraries/frameworks</i> are classified as the most difficult challenges to address, with respect to the time required to receive an accepted answer. We hope that the research community will leverage this taxonomy to develop efficient strategies to address the identified challenges and improve the quality of DRL applications</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"97 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP 研究使用 LIME 和 SHAP 自动预测错误和非错误问题的原因
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-13 DOI: 10.1007/s10664-024-10469-1
Lukas Schulte, Benjamin Ledel, Steffen Herbold

Context

The identification of bugs within issues reported to an issue tracking system is crucial for triage. Machine learning models have shown promising results for this task. However, we have only limited knowledge of how such models identify bugs. Explainable AI methods like LIME and SHAP can be used to increase this knowledge.

Objective

We want to understand if explainable AI provides explanations that are reasonable to us as humans and align with our assumptions about the model’s decision-making. We also want to know if the quality of predictions is correlated with the quality of explanations.

Methods

We conduct a study where we rate LIME and SHAP explanations based on their quality of explaining the outcome of an issue type prediction model. For this, we rate the quality of the explanations, i.e., if they align with our expectations and help us understand the underlying machine learning model.

Results

We found that both LIME and SHAP give reasonable explanations and that correct predictions are well explained. Further, we found that SHAP outperforms LIME due to a lower ambiguity and a higher contextuality that can be attributed to the ability of the deep SHAP variant to capture sentence fragments.

Conclusion

We conclude that the model finds explainable signals for both bugs and non-bugs. Also, we recommend that research dealing with the quality of explanations for classification tasks reports and investigates rater agreement, since the rating of explanations is highly subjective.

背景识别向问题跟踪系统报告的问题中的错误对于问题的分流至关重要。机器学习模型在这项任务中取得了可喜的成果。然而,我们对这些模型如何识别错误的了解还很有限。我们希望了解可解释的人工智能所提供的解释对我们人类来说是否合理,是否符合我们对模型决策的假设。我们还想知道预测的质量是否与解释的质量相关。方法我们进行了一项研究,根据 LIME 和 SHAP 解释对问题类型预测模型结果的解释质量对其进行评分。结果我们发现,LIME 和 SHAP 都给出了合理的解释,正确的预测结果也得到了很好的解释。此外,我们还发现 SHAP 的表现优于 LIME,这是因为 SHAP 的深度变体能够捕捉句子片段,因此模糊性更低,语境性更高。此外,我们还建议对分类任务中的解释质量进行研究,并报告和调查评分者的一致意见,因为解释的评分具有很强的主观性。
{"title":"Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP","authors":"Lukas Schulte, Benjamin Ledel, Steffen Herbold","doi":"10.1007/s10664-024-10469-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10469-1","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>The identification of bugs within issues reported to an issue tracking system is crucial for triage. Machine learning models have shown promising results for this task. However, we have only limited knowledge of how such models identify bugs. Explainable AI methods like LIME and SHAP can be used to increase this knowledge.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>We want to understand if explainable AI provides explanations that are reasonable to us as humans and align with our assumptions about the model’s decision-making. We also want to know if the quality of predictions is correlated with the quality of explanations.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>We conduct a study where we rate LIME and SHAP explanations based on their quality of explaining the outcome of an issue type prediction model. For this, we rate the quality of the explanations, i.e., if they align with our expectations and help us understand the underlying machine learning model.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>We found that both LIME and SHAP give reasonable explanations and that correct predictions are well explained. Further, we found that SHAP outperforms LIME due to a lower ambiguity and a higher contextuality that can be attributed to the ability of the deep SHAP variant to capture sentence fragments.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>We conclude that the model finds explainable signals for both bugs and non-bugs. Also, we recommend that research dealing with the quality of explanations for classification tasks reports and investigates rater agreement, since the rating of explanations is highly subjective.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"52 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How far are we with automated machine learning? characterization and challenges of AutoML toolkits 自动机器学习进展如何? AutoML 工具包的特点和挑战
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-13 DOI: 10.1007/s10664-024-10450-y
Md Abdullah Al Alamin, Gias Uddin

Automated Machine Learning aka AutoML toolkits are low/no-code software that aim to democratize ML system application development by ensuring rapid prototyping of ML models and by enabling collaboration across different stakeholders in ML system design (e.g., domain experts, data scientists, etc.). It is thus important to know the state of current AutoML toolkits and the challenges ML practitioners face while using those toolkits. In this paper, we first offer a characterization of currently available AutoML toolits by analyzing 37 top AutoML tools and platforms. We find that the top AutoML platforms are mostly cloud-based. Most of the tools are optimized for the adoption of shallow ML models. Second, we present an empirical study of 14.3K AutoML related posts from Stack Overflow (SO) that we analyzed using topic modelling algorithm LDA (Latent Dirichlet Allocation) to understand the challenges of ML practitioners while using the AutoML toolkits. We find 13 topics in the AutoML related discussions in SO. The 13 topics are grouped into four categories: MLOps (43% of all questions), Model (28% questions), Data (27% questions), and Documentation (2% questions). Most questions are asked during Model training (29%) and Data preparation (25%) phases. AutoML practitioners find the MLOps topic category most challenging. Topics related to the MLOps category are the most prevalent and popular for cloud-based AutoML toolkits. Based on our study findings, we provide 15 recommendations to improve the adoption and development of AutoML toolkits.

自动化机器学习(又称 AutoML)工具包是一种低代码/无代码软件,旨在通过确保 ML 模型的快速原型开发以及实现 ML 系统设计中不同利益相关者(如领域专家、数据科学家等)之间的协作,实现 ML 系统应用开发的民主化。因此,了解当前 AutoML 工具包的状况以及 ML 从业人员在使用这些工具包时面临的挑战非常重要。在本文中,我们首先通过分析 37 个顶级 AutoML 工具和平台,对当前可用的 AutoML 工具包进行了描述。我们发现,顶级 AutoML 平台大多基于云。大多数工具都针对浅层 ML 模型的采用进行了优化。其次,我们使用主题建模算法 LDA(潜在德里希特分配)分析了 Stack Overflow (SO) 中 14.3K 篇与 AutoML 相关的帖子,对其进行了实证研究,以了解 ML 从业人员在使用 AutoML 工具包时面临的挑战。我们在 SO 中与 AutoML 相关的讨论中发现了 13 个主题。这 13 个主题分为四类:MLOps(占所有问题的 43%)、模型(占 28%)、数据(占 27%)和文档(占 2%)。大多数问题是在模型培训(29%)和数据准备(25%)阶段提出的。AutoML 从业人员认为 MLOps 主题类别最具挑战性。与 MLOps 类别相关的主题在基于云的 AutoML 工具包中最为普遍和流行。根据研究结果,我们提出了 15 项建议,以改进 AutoML 工具包的采用和开发。
{"title":"How far are we with automated machine learning? characterization and challenges of AutoML toolkits","authors":"Md Abdullah Al Alamin, Gias Uddin","doi":"10.1007/s10664-024-10450-y","DOIUrl":"https://doi.org/10.1007/s10664-024-10450-y","url":null,"abstract":"<p>Automated Machine Learning aka AutoML toolkits are low/no-code software that aim to democratize ML system application development by ensuring rapid prototyping of ML models and by enabling collaboration across different stakeholders in ML system design (e.g., domain experts, data scientists, etc.). It is thus important to know the state of current AutoML toolkits and the challenges ML practitioners face while using those toolkits. In this paper, we first offer a characterization of currently available AutoML toolits by analyzing 37 top AutoML tools and platforms. We find that the top AutoML platforms are mostly cloud-based. Most of the tools are optimized for the adoption of shallow ML models. Second, we present an empirical study of 14.3K AutoML related posts from Stack Overflow (SO) that we analyzed using topic modelling algorithm LDA (Latent Dirichlet Allocation) to understand the challenges of ML practitioners while using the AutoML toolkits. We find 13 topics in the AutoML related discussions in SO. The 13 topics are grouped into four categories: MLOps (43% of all questions), Model (28% questions), Data (27% questions), and Documentation (2% questions). Most questions are asked during Model training (29%) and Data preparation (25%) phases. AutoML practitioners find the MLOps topic category most challenging. Topics related to the MLOps category are the most prevalent and popular for cloud-based AutoML toolkits. Based on our study findings, we provide 15 recommendations to improve the adoption and development of AutoML toolkits.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"61 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An empirical study of fault localization in Python programs Python 程序故障定位实证研究
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-13 DOI: 10.1007/s10664-024-10475-3
Mohammad Rezaalipour, Carlo A. Furia

Despite its massive popularity as a programming language, especially in novel domains like data science programs, there is comparatively little research about fault localization that targets Python. Even though it is plausible that several findings about programming languages like C/C++ and Java—the most common choices for fault localization research—carry over to other languages, whether the dynamic nature of Python and how the language is used in practice affect the capabilities of classic fault localization approaches remain open questions to investigate. This paper is the first multi-family large-scale empirical study of fault localization on real-world Python programs and faults. Using Zou et al.’s recent large-scale empirical study of fault localization in Java (Zou et al. 2021) as the basis of our study, we investigated the effectiveness (i.e., localization accuracy), efficiency (i.e., runtime performance), and other features (e.g., different entity granularities) of seven well-known fault-localization techniques in four families (spectrum-based, mutation-based, predicate switching, and stack-trace based) on 135 faults from 13 open-source Python projects from the BugsInPy curated collection (Widyasari et al. 2020). The results replicate for Python several results known about Java, and shed light on whether Python’s peculiarities affect the capabilities of fault localization. The replication package that accompanies this paper includes detailed data about our experiments, as well as the tool FauxPy that we implemented to conduct the study.

尽管 Python 作为一种编程语言大受欢迎,尤其是在数据科学程序等新领域,但针对 Python 的故障定位研究却相对较少。尽管关于 C/C++ 和 Java 等编程语言(故障定位研究中最常见的选择)的一些发现有可能被其他语言所借鉴,但 Python 的动态特性以及该语言在实践中的使用方式是否会影响经典故障定位方法的能力,仍然是有待研究的开放性问题。本文是对真实 Python 程序和故障进行故障定位的首次多家族大规模实证研究。我们以 Zou 等人最近对 Java 中故障定位的大规模实证研究(Zou et al、基于频谱、基于突变、基于谓词切换和基于堆栈跟踪)的七种著名的故障定位技术的有效性(即定位精度)、效率(即运行时性能)和其他特征(如不同的实体粒度),这些技术分别来自 BugsInPy 精选集(Widyasari 等人,2020 年)中 13 个开源 Python 项目的 135 个故障。这些结果在 Python 上复制了 Java 的一些已知结果,并揭示了 Python 的特殊性是否会影响故障定位的能力。本文附带的复制包中包含了有关我们实验的详细数据,以及我们为开展研究而实施的工具 FauxPy。
{"title":"An empirical study of fault localization in Python programs","authors":"Mohammad Rezaalipour, Carlo A. Furia","doi":"10.1007/s10664-024-10475-3","DOIUrl":"https://doi.org/10.1007/s10664-024-10475-3","url":null,"abstract":"<p>Despite its massive popularity as a programming language, especially in novel domains like data science programs, there is comparatively little research about fault localization that targets Python. Even though it is plausible that several findings about programming languages like C/C++ and Java—the most common choices for fault localization research—carry over to other languages, whether the dynamic nature of Python and how the language is used in practice affect the capabilities of classic fault localization approaches remain open questions to investigate. This paper is the first multi-family large-scale empirical study of fault localization on real-world Python programs and faults. Using Zou et al.’s recent large-scale empirical study of fault localization in Java (Zou et al. 2021) as the basis of our study, we investigated the effectiveness (i.e., localization accuracy), efficiency (i.e., runtime performance), and other features (e.g., different entity granularities) of seven well-known fault-localization techniques in four families (spectrum-based, mutation-based, predicate switching, and stack-trace based) on 135 faults from 13 open-source Python projects from the <span>BugsInPy</span> curated collection (Widyasari et al. 2020). The results replicate for Python several results known about Java, and shed light on whether Python’s peculiarities affect the capabilities of fault localization. The replication package that accompanies this paper includes detailed data about our experiments, as well as the tool <span>FauxPy</span> that we implemented to conduct the study.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"68 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utilization of pre-trained language models for adapter-based knowledge transfer in software engineering 在软件工程中利用预训练语言模型进行基于适配器的知识转移
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-13 DOI: 10.1007/s10664-024-10457-5
Iman Saberi, Fatemeh Fard, Fuxiang Chen

Software Engineering (SE) Pre-trained Language Models (PLMs), such as CodeBERT, are pre-trained on large code corpora, and their learned knowledge has shown success in transferring into downstream tasks (e.g., code clone detection) through the fine-tuning of PLMs. In Natural Language Processing (NLP), an alternative in transferring the knowledge of PLMs is explored through the use of adapter, a compact and parameter efficient module that is inserted into a PLM. Although the use of adapters has shown promising results in many NLP-based downstream tasks, their application and exploration in SE-based downstream tasks are limited. Here, we study the knowledge transfer using adapters on multiple downstream tasks including cloze test, code clone detection, and code summarization. These adapters are trained on code corpora and are inserted into a PLM that is pre-trained on English corpora or code corpora. We called these PLMs as NL-PLM and C-PLM, respectively. We observed an improvement in results using NL-PLM over a PLM that does not have adapters, and this suggested that adapters can transfer and utilize useful knowledge from NL-PLM to SE tasks. The results are sometimes on par with or exceed the results of C-PLM; while being more efficient in terms of the number of parameters and training time. Interestingly, adapters inserted into a C-PLM generally yield better results than a traditional fine-tuned C-PLM. Our results open new directions to build more compact models for SE tasks.

软件工程(SE)的预训练语言模型(PLMs),如 CodeBERT,是在大型代码库中进行预训练的,通过对 PLMs 进行微调,其学习到的知识可以成功地转移到下游任务中(如代码克隆检测)。在自然语言处理(NLP)领域,通过使用适配器(一种插入到 PLM 中的紧凑、参数高效的模块),探索了一种转移 PLM 知识的替代方法。虽然适配器的使用在许多基于 NLP 的下游任务中显示出了良好的效果,但其在基于 SE 的下游任务中的应用和探索还很有限。在此,我们研究了在多个下游任务中使用适配器进行知识转移的情况,包括掐头去尾测试、代码克隆检测和代码总结。这些适配器是在代码语料库中训练的,并被插入到预先在英语语料库或代码语料库中训练过的 PLM 中。我们将这些 PLM 分别称为 NL-PLM 和 C-PLM。我们观察到,使用 NL-PLM 的结果比不使用适配器的 PLM 有所改进,这表明适配器可以将有用的知识从 NL-PLM 转移到 SE 任务中并加以利用。其结果有时与 C-PLM 的结果相当,有时甚至超过 C-PLM 的结果;而在参数数量和训练时间方面,NL-PLM 的效率更高。有趣的是,插入 C-PLM 的适配器通常比传统的微调 C-PLM 产生更好的结果。我们的研究结果为建立更紧凑的 SE 任务模型开辟了新的方向。
{"title":"Utilization of pre-trained language models for adapter-based knowledge transfer in software engineering","authors":"Iman Saberi, Fatemeh Fard, Fuxiang Chen","doi":"10.1007/s10664-024-10457-5","DOIUrl":"https://doi.org/10.1007/s10664-024-10457-5","url":null,"abstract":"<p>Software Engineering (SE) Pre-trained Language Models (PLMs), such as CodeBERT, are pre-trained on large code corpora, and their learned knowledge has shown success in transferring into downstream tasks (e.g., code clone detection) through the fine-tuning of PLMs. In Natural Language Processing (NLP), an alternative in transferring the knowledge of PLMs is explored through the use of <i>adapter</i>, a compact and <b>parameter efficient</b> module that is inserted into a PLM. Although the use of adapters has shown promising results in many NLP-based downstream tasks, their application and exploration in SE-based downstream tasks are limited. Here, we study the knowledge transfer using adapters on multiple downstream tasks including cloze test, code clone detection, and code summarization. These adapters are trained on code corpora and are inserted into a PLM that is pre-trained on English corpora or code corpora. We called these PLMs as NL-PLM and C-PLM, respectively. We observed an improvement in results using NL-PLM over a PLM that does not have adapters, and this suggested that adapters can transfer and utilize useful knowledge from NL-PLM to SE tasks. The results are sometimes on par with or exceed the results of C-PLM; while being more efficient in terms of the number of parameters and training time. Interestingly, adapters inserted into a C-PLM generally yield better results than a traditional fine-tuned C-PLM. Our results open new directions to build more compact models for SE tasks.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"355 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adoption of automated software engineering tools and techniques in Thailand 泰国采用自动化软件工程工具和技术的情况
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-10 DOI: 10.1007/s10664-024-10472-6
Chaiyong Ragkhitwetsagul, Jens Krinke, Morakot Choetkiertikul, Thanwadee Sunetnanta, Federica Sarro

Readiness for the adoption of Automated Software Engineering (ASE) tools and techniques can vary according to the size and maturity of software companies. ASE tools and techniques have been adopted by large or ultra-large software companies. However, little is known about the adoption of ASE tools and techniques in small and medium-sized software enterprises (SSMEs) in emerging countries, and the challenges faced by such companies. We study the adoption of ASE tools and techniques for software measurement, static code analysis, continuous integration, and software testing, and the respective challenges faced by software developers in Thailand, a developing country with a growing software economy which mainly consists of SSMEs (similar to other developing countries). Based on the answers from 103 Thai participants in an online survey, we found that Thai software developers are somewhat familiar with ASE tools and agree that adopting such tools would be beneficial. Most of the developers do not use software measurement or static code analysis tools due to a lack of knowledge or experience but agree that their use would be useful. Continuous integration tools have been used with some difficulties. Lastly, although automated testing tools are adopted despite several serious challenges, many developers are still testing the software manually. We call for improvements in ASE tools to be easier to use in order to lower the barrier to adoption in small and medium-sized software enterprises (SSMEs) in developing countries.

软件公司的规模和成熟度不同,采用自动化软件工程(ASE)工具和技术的准备程度也不同。大型或超大型软件公司已经采用了 ASE 工具和技术。然而,人们对新兴国家的中小型软件企业(SSMEs)采用 ASE 工具和技术的情况以及这些企业面临的挑战知之甚少。泰国是一个软件经济不断发展的发展中国家,主要由中小型软件企业组成(与其他发展中国家类似),我们研究了泰国软件开发人员在软件测量、静态代码分析、持续集成和软件测试方面采用 ASE 工具和技术的情况,以及他们各自面临的挑战。根据在线调查中 103 位泰国参与者的回答,我们发现泰国软件开发人员对 ASE 工具有一定程度的了解,并同意采用此类工具会带来益处。由于缺乏相关知识或经验,大多数开发人员并不使用软件测量或静态代码分析工具,但他们都认为使用这些工具是有益的。持续集成工具的使用遇到了一些困难。最后,尽管自动化测试工具的使用面临着一些严峻挑战,但许多开发人员仍在手动测试软件。我们呼吁改进自动测试工具,使其更易于使用,从而降低发展中国家中小型软件企业(SSMEs)采用自动测试工具的门槛。
{"title":"Adoption of automated software engineering tools and techniques in Thailand","authors":"Chaiyong Ragkhitwetsagul, Jens Krinke, Morakot Choetkiertikul, Thanwadee Sunetnanta, Federica Sarro","doi":"10.1007/s10664-024-10472-6","DOIUrl":"https://doi.org/10.1007/s10664-024-10472-6","url":null,"abstract":"<p>Readiness for the adoption of Automated Software Engineering (ASE) tools and techniques can vary according to the size and maturity of software companies. ASE tools and techniques have been adopted by large or ultra-large software companies. However, little is known about the adoption of ASE tools and techniques in small and medium-sized software enterprises (SSMEs) in emerging countries, and the challenges faced by such companies. We study the adoption of ASE tools and techniques for software measurement, static code analysis, continuous integration, and software testing, and the respective challenges faced by software developers in Thailand, a developing country with a growing software economy which mainly consists of SSMEs (similar to other developing countries). Based on the answers from 103 Thai participants in an online survey, we found that Thai software developers are somewhat familiar with ASE tools and agree that adopting such tools would be beneficial. Most of the developers do not use software measurement or static code analysis tools due to a lack of knowledge or experience but agree that their use would be useful. Continuous integration tools have been used with some difficulties. Lastly, although automated testing tools are adopted despite several serious challenges, many developers are still testing the software manually. We call for improvements in ASE tools to be easier to use in order to lower the barrier to adoption in small and medium-sized software enterprises (SSMEs) in developing countries.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"61 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding the characteristics and the role of visual issue reports 了解视觉问题报告的特点和作用
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-10 DOI: 10.1007/s10664-024-10459-3
Hiroki Kuramoto, Dong Wang, Masanari Kondo, Yutaro Kashiwa, Yasutaka Kamei, Naoyasu Ubayashi

Issue reports are a pivotal interface between developers and users for receiving information about bugs in their products. In practice, reproducing those bugs is challenging, since issue reports often contain incorrect information or lack sufficient information. Furthermore, the poor quality of issue reports would have the effect of delaying the entire bug-fixing process. To enhance bug comprehension and facilitate bug reproduction, GitHub Issue allows users to embed visuals such as images and videos to complement the textual description. Hence, we conduct an empirical study on 34 active GitHub repositories to quantitatively analyze the difference between visual issue reports and non-visual ones, and qualitatively analyze the characteristics of visuals and the usage of visuals in bug types. Our results show that visual issue reports have a significantly higher probability of reporting bugs. Visual reports also tend to receive the first comment and complete the conversation in a relatively shorter time. Visuals are frequently used to present the program behavior and the user interface, with the major purpose of introducing problems in reports. Additionally, we observe that visuals are commonly used to report GUI-related bugs, but they are rarely used to report configuration bugs in comparison to non-visual issue reports. To summarize, our work highlights the role of visual play in the bug-fixing process and lays the foundation for future research to support bug comprehension by exploiting visuals.

问题报告是开发人员和用户之间的一个重要接口,用于接收产品中的错误信息。在实践中,由于问题报告往往包含不正确的信息或缺乏足够的信息,因此重现这些错误具有挑战性。此外,低质量的问题报告还会延误整个错误修复过程。为了增强对错误的理解并促进错误的重现,GitHub Issue 允许用户嵌入图片和视频等视觉效果来补充文字描述。因此,我们对 34 个活跃的 GitHub 仓库进行了实证研究,定量分析了可视化问题报告与非可视化问题报告之间的差异,并定性分析了可视化问题报告的特点以及可视化问题报告在错误类型中的使用情况。结果表明,可视化问题报告报告错误的概率明显更高。可视化报告还倾向于收到第一条评论,并在相对较短的时间内完成对话。可视化通常用于展示程序行为和用户界面,其主要目的是在报告中引入问题。此外,我们还观察到,可视化通常用于报告图形用户界面相关的错误,但与非可视化问题报告相比,可视化很少用于报告配置错误。总之,我们的工作强调了视觉效果在错误修复过程中的作用,并为今后通过利用视觉效果来支持错误理解的研究奠定了基础。
{"title":"Understanding the characteristics and the role of visual issue reports","authors":"Hiroki Kuramoto, Dong Wang, Masanari Kondo, Yutaro Kashiwa, Yasutaka Kamei, Naoyasu Ubayashi","doi":"10.1007/s10664-024-10459-3","DOIUrl":"https://doi.org/10.1007/s10664-024-10459-3","url":null,"abstract":"<p>Issue reports are a pivotal interface between developers and users for receiving information about bugs in their products. In practice, reproducing those bugs is challenging, since issue reports often contain incorrect information or lack sufficient information. Furthermore, the poor quality of issue reports would have the effect of delaying the entire bug-fixing process. To enhance bug comprehension and facilitate bug reproduction, GitHub Issue allows users to embed visuals such as images and videos to complement the textual description. Hence, we conduct an empirical study on 34 active GitHub repositories to quantitatively analyze the difference between visual issue reports and non-visual ones, and qualitatively analyze the characteristics of visuals and the usage of visuals in bug types. Our results show that visual issue reports have a significantly higher probability of reporting bugs. Visual reports also tend to receive the first comment and complete the conversation in a relatively shorter time. Visuals are frequently used to present the program behavior and the user interface, with the major purpose of introducing problems in reports. Additionally, we observe that visuals are commonly used to report GUI-related bugs, but they are rarely used to report configuration bugs in comparison to non-visual issue reports. To summarize, our work highlights the role of visual play in the bug-fixing process and lays the foundation for future research to support bug comprehension by exploiting visuals.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"23 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward effective secure code reviews: an empirical study of security-related coding weaknesses 实现有效的安全代码审查:与安全相关的编码弱点实证研究
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-08 DOI: 10.1007/s10664-024-10496-y
Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude

Identifying security issues early is encouraged to reduce the latent negative impacts on the software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce real-world security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%-36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews.

我们鼓励尽早发现安全问题,以减少对软件系统的潜在负面影响。代码审查是一种广泛使用的方法,允许开发人员手动检查修改过的代码,在软件开发周期中捕捉安全问题。然而,现有的代码审查研究往往只关注已知的漏洞,而忽略了编码弱点,而这些弱点可能会带来现实世界中的安全问题,通过代码审查更容易发现。代码审查在识别此类编码弱点方面的实践尚未得到充分研究。为了更好地理解这一点,我们对 OpenSSL 和 PHP 这两个大型开源项目进行了实证案例研究。基于 135,560 条代码审查意见,我们发现审查员在 40 个编码弱点类别中的 35 个类别中提出了安全问题。令人惊讶的是,与过去的漏洞相关的一些编码弱点,如内存错误和资源管理,被讨论的次数少于漏洞。在许多情况下,开发人员都试图解决提出的安全问题(39%-41%),但有相当一部分问题只是得到了承认(30%-36%),还有一些问题由于对解决方案存在分歧而没有得到修复(18%-20%)。这突出表明,编码缺陷即使被识别出来,也有可能在代码审查中漏掉。我们的研究结果表明,审查人员可以在代码审查过程中发现导致安全问题的各种编码弱点。然而,这些结果也揭示了当前代码审查实践中的不足,表明需要更有效的机制或支持来提高代码审查中的安全问题管理意识。
{"title":"Toward effective secure code reviews: an empirical study of security-related coding weaknesses","authors":"Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude","doi":"10.1007/s10664-024-10496-y","DOIUrl":"https://doi.org/10.1007/s10664-024-10496-y","url":null,"abstract":"<p>Identifying security issues early is encouraged to reduce the latent negative impacts on the software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce real-world security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%-36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"204 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The untold impact of learning approaches on software fault-proneness predictions: an analysis of temporal aspects 学习方法对软件故障倾向性预测的不可言喻的影响:时间方面的分析
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-08 DOI: 10.1007/s10664-024-10454-8
Mohammad Jamil Ahmad, Katerina Goseva-Popstojanova, Robyn R. Lutz

This paper aims to improve software fault-proneness prediction by investigating the unexplored effects on classification performance of the temporal decisions made by practitioners and researchers regarding (i) the interval for which they will collect longitudinal features (software metrics data), and (ii) the interval for which they will predict software bugs (the target variable). We call these specifics of the data used for training and of the target variable being predicted the learning approach, and explore the impact of the two most common learning approaches on the performance of software fault-proneness prediction, both within a single release of a software product and across releases. The paper presents empirical results from a study based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on classification performance. Specifically, we show that one learning approach leads to significantly better performance than the other, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, the difference in classification performance is due to different levels of class imbalance in the two learning approaches. Our findings show that improved specification of the learning approach is essential to understanding and explaining the performance of fault-proneness prediction models, as well as to avoiding misleading comparisons among them. The paper concludes with some practical recommendations and research directions based on our findings toward improved software fault-proneness prediction.

本文旨在通过研究从业人员和研究人员在以下两个方面做出的时间性决定对分类性能的影响来改进软件故障倾向性预测:(i) 收集纵向特征(软件度量数据)的时间间隔;(ii) 预测软件错误(目标变量)的时间间隔。我们将这些用于训练的数据和预测的目标变量的具体情况称为学习方法,并探讨了两种最常见的学习方法对软件产品单个版本内和跨版本的软件缺陷预测性能的影响。本文介绍了一项基于 12 个开源项目 64 个版本数据的研究的实证结果。研究结果表明,学习方法对分类性能有很大的影响,而这种影响通常未得到承认。具体来说,我们发现,无论是在发布版本内还是在不同发布版本之间,一种学习方法的性能都明显优于另一种学习方法。此外,本文还发现,在版本内预测中,分类性能的差异是由于两种学习方法的类不平衡程度不同造成的。我们的研究结果表明,改进学习方法的规范对于理解和解释故障倾向性预测模型的性能以及避免对它们进行误导性比较至关重要。最后,本文根据我们的发现提出了一些实用建议和研究方向,以改进软件故障倾向性预测。
{"title":"The untold impact of learning approaches on software fault-proneness predictions: an analysis of temporal aspects","authors":"Mohammad Jamil Ahmad, Katerina Goseva-Popstojanova, Robyn R. Lutz","doi":"10.1007/s10664-024-10454-8","DOIUrl":"https://doi.org/10.1007/s10664-024-10454-8","url":null,"abstract":"<p>This paper aims to improve software fault-proneness prediction by investigating the unexplored effects on classification performance of the temporal decisions made by practitioners and researchers regarding (i) the interval for which they will collect longitudinal features (software metrics data), and (ii) the interval for which they will predict software bugs (the target variable). We call these specifics of the data used for training and of the target variable being predicted the <i>learning approach</i>, and explore the impact of the two most common learning approaches on the performance of software fault-proneness prediction, both within a single release of a software product and across releases. The paper presents empirical results from a study based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on classification performance. Specifically, we show that one learning approach leads to significantly better performance than the other, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, the difference in classification performance is due to different levels of class imbalance in the two learning approaches. Our findings show that improved specification of the learning approach is essential to understanding and explaining the performance of fault-proneness prediction models, as well as to avoiding misleading comparisons among them. The paper concludes with some practical recommendations and research directions based on our findings toward improved software fault-proneness prediction.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"65 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Empirical Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1