Boosting fault localization of statements by combining topic modeling and Ochiai

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Information and Software Technology Pub Date : 2024-05-24 DOI:10.1016/j.infsof.2024.107499

Romain Vacheret , Francisca Pérez , Tewfik Ziadi , Lom Hillah

{"title":"Boosting fault localization of statements by combining topic modeling and Ochiai","authors":"Romain Vacheret , Francisca Pérez , Tewfik Ziadi , Lom Hillah","doi":"10.1016/j.infsof.2024.107499","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><p>Reducing the cost of maintenance tasks by fixing bugs automatically is the cornerstone of Automated Program Repair (APR). To do this, automated Fault Localization (FL) is essential. Two families of FL techniques are Spectrum-based Fault Localization (SBFL) and Information Retrieval Fault Localization (IRFL). In SBFL, the coverage information and execution results of test cases are utilized. Ochiai is one of the most effective and used SBFL strategies. In IRFL, the bug report information is utilized as well as the identifier names and comments in source code files. Latent Dirichlet Allocation (LDA) is a generative statistical model and one of the most popular topic modeling methods. However, LDA has been used at the method level of granularity as IRFL technique, whereas most existing APR tools are focused on the statement level.</p></div><div><h3>Objective:</h3><p>This paper presents our approach that combines topic modeling and Ochiai to boost FL at the statement level.</p></div><div><h3>Method:</h3><p>We evaluate our approach considering five different projects in Defects4J benchmark. We report the performance of our approach in terms of hit@k and MRR. To study the impact on the results, we compare our approach against five baselines: two SBFL approaches (Ochiai and Dstar), two IRFL approaches (LDA and Blues), and one hybrid approach (SBIR). In addition, we compare the number of bugs that are found by our approach with the baselines.</p></div><div><h3>Results:</h3><p>Our approach significantly outperforms the baselines in all metrics. Especially, when hit@1, hit@3 and hit@5 are compared. Also, our approach locates more bugs than Ochiai and Blues.</p></div><div><h3>Conclusion:</h3><p>The results of our approach indicate that the integration of topic modeling with Ochiai boosts FL. This uncovers the potential of topic modeling for FL at statement level, which is valuable for the APR community.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"173 ","pages":"Article 107499"},"PeriodicalIF":3.8000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001046","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

Reducing the cost of maintenance tasks by fixing bugs automatically is the cornerstone of Automated Program Repair (APR). To do this, automated Fault Localization (FL) is essential. Two families of FL techniques are Spectrum-based Fault Localization (SBFL) and Information Retrieval Fault Localization (IRFL). In SBFL, the coverage information and execution results of test cases are utilized. Ochiai is one of the most effective and used SBFL strategies. In IRFL, the bug report information is utilized as well as the identifier names and comments in source code files. Latent Dirichlet Allocation (LDA) is a generative statistical model and one of the most popular topic modeling methods. However, LDA has been used at the method level of granularity as IRFL technique, whereas most existing APR tools are focused on the statement level.

Objective:

This paper presents our approach that combines topic modeling and Ochiai to boost FL at the statement level.

Method:

We evaluate our approach considering five different projects in Defects4J benchmark. We report the performance of our approach in terms of hit@k and MRR. To study the impact on the results, we compare our approach against five baselines: two SBFL approaches (Ochiai and Dstar), two IRFL approaches (LDA and Blues), and one hybrid approach (SBIR). In addition, we compare the number of bugs that are found by our approach with the baselines.

Results:

Our approach significantly outperforms the baselines in all metrics. Especially, when hit@1, hit@3 and hit@5 are compared. Also, our approach locates more bugs than Ochiai and Blues.

Conclusion:

The results of our approach indicate that the integration of topic modeling with Ochiai boosts FL. This uncovers the potential of topic modeling for FL at statement level, which is valuable for the APR community.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

结合主题建模和落合技术，提高语句故障定位能力

背景：通过自动修复错误来降低维护任务的成本，是自动程序修复（APR）的基石。为此，自动故障定位（FL）至关重要。基于频谱的故障定位（SBFL）和信息检索故障定位（IRFL）是 FL 技术的两个系列。在 SBFL 中，利用了测试用例的覆盖信息和执行结果。Ochiai 是最有效、最常用的 SBFL 策略之一。在 IRFL 中，利用的是错误报告信息以及源代码文件中的标识符名称和注释。Latent Dirichlet Allocation（LDA）是一种生成统计模型，也是最流行的主题建模方法之一。目标：本文介绍了我们的方法，该方法结合了主题建模和 Ochiai，以提高语句级别的 FL。方法：我们在 Defects4J 基准中评估了五个不同的项目。我们报告了我们的方法在 hit@k 和 MRR 方面的性能。为了研究对结果的影响，我们将我们的方法与五种基线方法进行了比较：两种 SBFL 方法（Ochiai 和 Dstar）、两种 IRFL 方法（LDA 和 Blues）以及一种混合方法（SBIR）。此外，我们还比较了我们的方法与基线方法发现的错误数量。结果：我们的方法在所有指标上都明显优于基线方法。结果：在所有指标上，我们的方法都明显优于基准方法，尤其是在比较命中率@1、命中率@3 和命中率@5 时。结论：我们的方法结果表明，将主题建模与 Ochiai 整合后，FL 得到了提升。这揭示了主题建模在语句级 FL 方面的潜力，对 APR 界很有价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.