Using a probabilistic model to predict bug fixes

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2018-03-20 DOI:10.1109/SANER.2018.8330211

Mauricio Soto, Claire Le Goues

{"title":"Using a probabilistic model to predict bug fixes","authors":"Mauricio Soto, Claire Le Goues","doi":"10.1109/SANER.2018.8330211","DOIUrl":null,"url":null,"abstract":"Automatic Software Repair (APR) has significant potential to reduce software maintenance costs by reducing the human effort required to localize and fix bugs. State-of-the-art generate-and-validate APR techniques select between and instantiate various mutation operators to construct candidate patches, informed largely by heuristic probability distributions. This may reduce effectiveness in terms of both efficiency and output quality. In practice, human developers have many options in terms of how to edit code to fix bugs, some of which are far more common than others (e.g., deleting a line of code is more common than adding a new class). We mined the most recent 100 bug-fixing commits from each of the 500 most popular Java projects in GitHub (the largest dataset to date) to create a probabilistic model describing edit distributions. We categorize, compare and evaluate the different mutation operators used in state-of-the-art approaches. We find that a probabilistic modelbased APR approach patches bugs more quickly in the majority of bugs studied, and that the resulting patches are of higher quality than those produced by previous approaches. Finally, we mine association rules for multi-edit source code changes, an understudied but important problem. We validate the association rules by analyzing how much of our corpus can be built from them. Our evaluation indicates that 84.6% of the multi-edit patches from the corpus can be built from the association rules, while maintaining 90% confidence.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"40 1","pages":"221-231"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2018.8330211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 36

Abstract

Automatic Software Repair (APR) has significant potential to reduce software maintenance costs by reducing the human effort required to localize and fix bugs. State-of-the-art generate-and-validate APR techniques select between and instantiate various mutation operators to construct candidate patches, informed largely by heuristic probability distributions. This may reduce effectiveness in terms of both efficiency and output quality. In practice, human developers have many options in terms of how to edit code to fix bugs, some of which are far more common than others (e.g., deleting a line of code is more common than adding a new class). We mined the most recent 100 bug-fixing commits from each of the 500 most popular Java projects in GitHub (the largest dataset to date) to create a probabilistic model describing edit distributions. We categorize, compare and evaluate the different mutation operators used in state-of-the-art approaches. We find that a probabilistic modelbased APR approach patches bugs more quickly in the majority of bugs studied, and that the resulting patches are of higher quality than those produced by previous approaches. Finally, we mine association rules for multi-edit source code changes, an understudied but important problem. We validate the association rules by analyzing how much of our corpus can be built from them. Our evaluation indicates that 84.6% of the multi-edit patches from the corpus can be built from the association rules, while maintaining 90% confidence.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用概率模型来预测bug修复

自动软件修复(Automatic Software Repair, APR)具有显著的潜力，可以通过减少本地化和修复错误所需的人力来降低软件维护成本。最先进的生成和验证APR技术选择并实例化各种突变操作符来构建候选补丁，主要由启发式概率分布提供信息。这可能会降低效率和产出质量方面的有效性。在实践中，人类开发人员在如何编辑代码以修复错误方面有许多选择，其中一些比其他更常见(例如，删除一行代码比添加一个新类更常见)。我们从GitHub(迄今为止最大的数据集)中500个最受欢迎的Java项目中挖掘了最近的100个bug修复提交，以创建描述编辑发行版的概率模型。我们对最先进的方法中使用的不同突变算子进行分类，比较和评估。我们发现基于概率模型的APR方法在研究的大多数错误中更快地修补错误，并且所得到的补丁质量比以前的方法更高。最后，我们挖掘了多编辑源代码更改的关联规则，这是一个尚未得到充分研究但很重要的问题。我们通过分析有多少语料库可以由它们构建来验证关联规则。我们的评估表明，84.6%的多编辑补丁可以从关联规则中构建，同时保持90%的置信度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量

期刊最新文献

Exploring the integration of user feedback in automated testing of Android applications The Statechart Workbench: Enabling scalable software event log analysis using process mining Detecting code smells using machine learning techniques: Are we there yet? Classifying stack overflow posts on API issues Re-evaluating method-level bug prediction