A more accurate bug localization technique for bugs with multiple buggy code files

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Information and Software Technology Pub Date : 2025-01-31 DOI:10.1016/j.infsof.2025.107675

Hui Xu , Zhaodan Wang , Weiqin Zou

{"title":"A more accurate bug localization technique for bugs with multiple buggy code files","authors":"Hui Xu , Zhaodan Wang , Weiqin Zou","doi":"10.1016/j.infsof.2025.107675","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Bug localization is a key step in bug fixing. Despite considerable progress, existing bug localization techniques still perform unsatisfactorily in situations where the complete fix to a bug involves touching multiple buggy code files. That is, for such bugs, those techniques tend to locate correctly only one or at least not all buggy code files, leaving other buggy code files undetected.</div></div><div><h3>Objective:</h3><div>This study aims to improve bug localization in cases where resolving a bug requires modifications to multiple buggy code files by proposing HitMore to rank more truly buggy files higher in the recommendation list.</div></div><div><h3>Method:</h3><div>The basic idea of HitMore is to attempt to retrieve a subset of truly buggy code files first, then use these files to retrieve other buggy code files based on code relation analysis. For the first part, we designed three kinds of domain-specific features to build a machine-learning model to identify the truly buggy code file subset. For the second part, we make use of three types of code relations between the code base and the buggy file subset to better retrieve the remaining truly buggy code files.</div></div><div><h3>Results:</h3><div>The experiments on six widely open-source projects show that: Our technique is effective in identifying the subset of truly buggy code files, with a weighted prediction F1-Score of 86.1%–92.1%. By leveraging the code relations to the retrieved subset and the code base, our HitMore could retrieve all truly buggy code files for 29.31%–69.56% of bugs across six projects. For multiple-buggy-code-file bugs, HitMore could completely localize such bugs by up to 15.38%, 19.36%, and 11.86% more than three representative IRBL baselines across six projects.</div></div><div><h3>Conclusion:</h3><div>The experimental results demonstrate the potential of HitMore in reducing developers’ burden of locating and further fixing relatively complex bugs such as those with multiple buggy code files in practice.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107675"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095058492500014X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

Bug localization is a key step in bug fixing. Despite considerable progress, existing bug localization techniques still perform unsatisfactorily in situations where the complete fix to a bug involves touching multiple buggy code files. That is, for such bugs, those techniques tend to locate correctly only one or at least not all buggy code files, leaving other buggy code files undetected.

Objective:

This study aims to improve bug localization in cases where resolving a bug requires modifications to multiple buggy code files by proposing HitMore to rank more truly buggy files higher in the recommendation list.

Method:

The basic idea of HitMore is to attempt to retrieve a subset of truly buggy code files first, then use these files to retrieve other buggy code files based on code relation analysis. For the first part, we designed three kinds of domain-specific features to build a machine-learning model to identify the truly buggy code file subset. For the second part, we make use of three types of code relations between the code base and the buggy file subset to better retrieve the remaining truly buggy code files.

Results:

The experiments on six widely open-source projects show that: Our technique is effective in identifying the subset of truly buggy code files, with a weighted prediction F1-Score of 86.1%–92.1%. By leveraging the code relations to the retrieved subset and the code base, our HitMore could retrieve all truly buggy code files for 29.31%–69.56% of bugs across six projects. For multiple-buggy-code-file bugs, HitMore could completely localize such bugs by up to 15.38%, 19.36%, and 11.86% more than three representative IRBL baselines across six projects.

Conclusion:

The experimental results demonstrate the potential of HitMore in reducing developers’ burden of locating and further fixing relatively complex bugs such as those with multiple buggy code files in practice.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对具有多个错误代码文件的错误的更准确的错误定位技术

背景：Bug定位是Bug修复的关键步骤。尽管取得了相当大的进展，但是现有的bug定位技术在完整修复一个bug需要触及多个bug代码文件的情况下仍然不能令人满意地执行。也就是说，对于这样的bug，这些技术倾向于只正确定位一个或至少不是所有有bug的代码文件，而不检测其他有bug的代码文件。目的：在解决一个bug需要修改多个有bug的代码文件的情况下，本研究提出HitMore在推荐列表中将更多真正有bug的文件排在更高的位置，从而提高bug的定位。方法：HitMore的基本思想是首先尝试检索真正有bug的代码文件的子集，然后根据代码关系分析使用这些文件检索其他有bug的代码文件。在第一部分中，我们设计了三种特定于领域的功能来构建机器学习模型，以识别真正有缺陷的代码文件子集。对于第二部分，我们使用代码库和有bug的文件子集之间的三种类型的代码关系来更好地检索剩下的真正有bug的代码文件。结果：在6个广泛开源项目上的实验表明：我们的技术在识别真正有bug的代码文件子集方面是有效的，加权预测F1-Score为86.1%-92.1%。通过利用与检索子集和代码库的代码关系，我们的HitMore可以检索六个项目中29.31%-69.56%的bug的所有真正有bug的代码文件。对于多个bug代码文件的bug， HitMore可以在六个项目中的三个代表性IRBL基线上分别以15.38%、19.36%和11.86%的幅度完全本地化这些bug。结论：实验结果表明，HitMore在实际应用中可以减轻开发人员查找和进一步修复相对复杂的bug（如包含多个bug的代码文件）的负担。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.

期刊最新文献

Test automation with selenium: A survey AI-gile: Revisiting Agile principles in the era of AI SEDMR: A spreadsheet error detection approach based on metamorphic testing Exploring and characterizing cross-service defects in microservice projects SRSPSQL: A dual-stage Text-to-SQL framework with semantic rewriting and schema pruning