On Refining the SZZ Algorithm with Bug Discussion Data

IF 3.5 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Empirical Software Engineering Pub Date : 2024-07-24 DOI:10.1007/s10664-024-10511-2

Pooja Rani, Fernando Petrulio, Alberto Bacchelli

{"title":"On Refining the SZZ Algorithm with Bug Discussion Data","authors":"Pooja Rani, Fernando Petrulio, Alberto Bacchelli","doi":"10.1007/s10664-024-10511-2","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Researchers testing hypotheses related to factors leading to low-quality software often rely on historical data, specifically on details regarding when defects were introduced into a codebase of interest. The prevailing techniques to determine the introduction of defects revolve around variants of the <span>SZZ</span> algorithm. This algorithm leverages information on the lines modified during a bug-fixing commit and finds when these lines were last modified, thereby identifying bug-introducing commits.</p><h3 data-test=\"abstract-sub-heading\">Objectives</h3><p>Despite several improvements and variants, <span>SZZ</span> struggles with accuracy, especially in cases of unrelated modifications or that touch files not involved in the introduction of the bug in the version control systems (aka <i>tangled commit</i> and <i>ghost commits</i>).</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Our research investigates whether and how incorporating content retrieved from bug discussions can address these issues by identifying the related and external files and thus improve the efficacy of the <span>SZZ</span> algorithm.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>To conduct our investigation, we take advantage of the links manually inserted by Mozilla developers in bug reports to signal which commits inserted bugs. Thus, we prepared the dataset, <i>RoTEB</i>, comprised of 12,472 bug reports. We first manually inspect a sample of 369 bug reports related to these bug-fixing or bug-introducing commits and investigate whether the files mentioned in these reports could be useful for <span>SZZ</span>. After we found evidence that the mentioned files are relevant, we augment <span>SZZ</span> with this information, using different strategies, and evaluate the resulting approach against multiple <span>SZZ</span> variations.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>We define a taxonomy outlining the rationale behind developers’ references to diverse files in their discussions. We observe that bug discussions often mention files relevant to enhancing the <span>SZZ</span> algorithm’s efficacy. Then, we verify that integrating these file references augments the precision of <span>SZZ</span> in pinpointing bug-introducing commits. Yet, it does not markedly influence recall. These results deepen our comprehension of the usefulness of bug discussions for <span>SZZ</span>. Future work can leverage our dataset and explore other techniques to further address the problem of tangled commits and ghost commits. Data & material: https://zenodo.org/records/11484723.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-024-10511-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Context

Researchers testing hypotheses related to factors leading to low-quality software often rely on historical data, specifically on details regarding when defects were introduced into a codebase of interest. The prevailing techniques to determine the introduction of defects revolve around variants of the SZZ algorithm. This algorithm leverages information on the lines modified during a bug-fixing commit and finds when these lines were last modified, thereby identifying bug-introducing commits.

Objectives

Despite several improvements and variants, SZZ struggles with accuracy, especially in cases of unrelated modifications or that touch files not involved in the introduction of the bug in the version control systems (aka tangled commit and ghost commits).

Methods

Our research investigates whether and how incorporating content retrieved from bug discussions can address these issues by identifying the related and external files and thus improve the efficacy of the SZZ algorithm.

Results

To conduct our investigation, we take advantage of the links manually inserted by Mozilla developers in bug reports to signal which commits inserted bugs. Thus, we prepared the dataset, RoTEB, comprised of 12,472 bug reports. We first manually inspect a sample of 369 bug reports related to these bug-fixing or bug-introducing commits and investigate whether the files mentioned in these reports could be useful for SZZ. After we found evidence that the mentioned files are relevant, we augment SZZ with this information, using different strategies, and evaluate the resulting approach against multiple SZZ variations.

Conclusion

We define a taxonomy outlining the rationale behind developers’ references to diverse files in their discussions. We observe that bug discussions often mention files relevant to enhancing the SZZ algorithm’s efficacy. Then, we verify that integrating these file references augments the precision of SZZ in pinpointing bug-introducing commits. Yet, it does not markedly influence recall. These results deepen our comprehension of the usefulness of bug discussions for SZZ. Future work can leverage our dataset and explore other techniques to further address the problem of tangled commits and ghost commits. Data & material: https://zenodo.org/records/11484723.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用错误讨论数据完善 SZZ 算法

背景研究人员在测试与导致低质量软件的因素有关的假设时，往往依赖于历史数据，特别是有关缺陷何时被引入相关代码库的详细信息。确定缺陷引入时间的主流技术围绕着 SZZ 算法的变体展开。尽管对 SZZ 进行了多次改进和变体，但其准确性仍有问题，尤其是在不相关的修改或触及版本控制系统中与引入缺陷无关的文件（又称纠缠提交和幽灵提交）的情况下。方法我们的研究调查了从错误讨论中获取的内容是否以及如何通过识别相关文件和外部文件来解决这些问题，从而提高 SZZ 算法的效率。结果为了进行调查，我们利用了 Mozilla 开发人员在错误报告中手动插入的链接，以显示哪些提交插入了错误。因此，我们准备了由 12,472 份错误报告组成的数据集 RoTEB。我们首先手动检查了与这些修复错误或引入错误的提交相关的 369 份错误报告样本，并调查这些报告中提到的文件是否对 SZZ 有用。在我们发现所提及文件具有相关性的证据后，我们使用不同的策略用这些信息增强了 SZZ，并针对多个 SZZ 变体评估了由此产生的方法。我们发现，错误讨论中经常提到与提高 SZZ 算法效率相关的文件。然后，我们验证了整合这些文件引用可以提高 SZZ 在精确定位引入错误的提交方面的精确度。然而，这并不会明显影响召回率。这些结果加深了我们对错误讨论对 SZZ 有用性的理解。未来的工作可以利用我们的数据集，探索其他技术，以进一步解决纠结提交和幽灵提交的问题。数据& 材料：https://zenodo.org/records/11484723。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.

期刊最新文献

An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues Quality issues in machine learning software systems An empirical study of token-based micro commits Software product line testing: a systematic literature review Consensus task interaction trace recommender to guide developers’ software navigation