{"title":"Boosting fault localization of statements by combining topic modeling and Ochiai","authors":"Romain Vacheret , Francisca Pérez , Tewfik Ziadi , Lom Hillah","doi":"10.1016/j.infsof.2024.107499","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><p>Reducing the cost of maintenance tasks by fixing bugs automatically is the cornerstone of Automated Program Repair (APR). To do this, automated Fault Localization (FL) is essential. Two families of FL techniques are Spectrum-based Fault Localization (SBFL) and Information Retrieval Fault Localization (IRFL). In SBFL, the coverage information and execution results of test cases are utilized. Ochiai is one of the most effective and used SBFL strategies. In IRFL, the bug report information is utilized as well as the identifier names and comments in source code files. Latent Dirichlet Allocation (LDA) is a generative statistical model and one of the most popular topic modeling methods. However, LDA has been used at the method level of granularity as IRFL technique, whereas most existing APR tools are focused on the statement level.</p></div><div><h3>Objective:</h3><p>This paper presents our approach that combines topic modeling and Ochiai to boost FL at the statement level.</p></div><div><h3>Method:</h3><p>We evaluate our approach considering five different projects in Defects4J benchmark. We report the performance of our approach in terms of hit@k and MRR. To study the impact on the results, we compare our approach against five baselines: two SBFL approaches (Ochiai and Dstar), two IRFL approaches (LDA and Blues), and one hybrid approach (SBIR). In addition, we compare the number of bugs that are found by our approach with the baselines.</p></div><div><h3>Results:</h3><p>Our approach significantly outperforms the baselines in all metrics. Especially, when hit@1, hit@3 and hit@5 are compared. Also, our approach locates more bugs than Ochiai and Blues.</p></div><div><h3>Conclusion:</h3><p>The results of our approach indicate that the integration of topic modeling with Ochiai boosts FL. This uncovers the potential of topic modeling for FL at statement level, which is valuable for the APR community.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"173 ","pages":"Article 107499"},"PeriodicalIF":3.8000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001046","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context:
Reducing the cost of maintenance tasks by fixing bugs automatically is the cornerstone of Automated Program Repair (APR). To do this, automated Fault Localization (FL) is essential. Two families of FL techniques are Spectrum-based Fault Localization (SBFL) and Information Retrieval Fault Localization (IRFL). In SBFL, the coverage information and execution results of test cases are utilized. Ochiai is one of the most effective and used SBFL strategies. In IRFL, the bug report information is utilized as well as the identifier names and comments in source code files. Latent Dirichlet Allocation (LDA) is a generative statistical model and one of the most popular topic modeling methods. However, LDA has been used at the method level of granularity as IRFL technique, whereas most existing APR tools are focused on the statement level.
Objective:
This paper presents our approach that combines topic modeling and Ochiai to boost FL at the statement level.
Method:
We evaluate our approach considering five different projects in Defects4J benchmark. We report the performance of our approach in terms of hit@k and MRR. To study the impact on the results, we compare our approach against five baselines: two SBFL approaches (Ochiai and Dstar), two IRFL approaches (LDA and Blues), and one hybrid approach (SBIR). In addition, we compare the number of bugs that are found by our approach with the baselines.
Results:
Our approach significantly outperforms the baselines in all metrics. Especially, when hit@1, hit@3 and hit@5 are compared. Also, our approach locates more bugs than Ochiai and Blues.
Conclusion:
The results of our approach indicate that the integration of topic modeling with Ochiai boosts FL. This uncovers the potential of topic modeling for FL at statement level, which is valuable for the APR community.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.