Learning to rank relevant files for bug reports using domain knowledge

Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering Pub Date : 2014-11-11 DOI:10.1145/2635868.2635874

Xin Ye, Razvan C. Bunescu, Chang Liu

{"title":"Learning to rank relevant files for bug reports using domain knowledge","authors":"Xin Ye, Razvan C. Bunescu, Chang Liu","doi":"10.1145/2635868.2635874","DOIUrl":null,"url":null,"abstract":"When a new bug report is received, developers usually need to reproduce the bug and perform code reviews to find the cause, a process that can be tedious and time consuming. A tool for ranking all the source files of a project with respect to how likely they are to contain the cause of the bug would enable developers to narrow down their search and potentially could lead to a substantial increase in productivity. This paper introduces an adaptive ranking approach that leverages domain knowledge through functional decompositions of source code files into methods, API descriptions of library components used in the code, the bug-fixing history, and the code change history. Given a bug report, the ranking score of each source file is computed as a weighted combination of an array of features encoding domain knowledge, where the weights are trained automatically on previously solved bug reports using a learning-to-rank technique. We evaluated our system on six large scale open source Java projects, using the before-fix version of the project for every bug report. The experimental results show that the newly introduced learning-to-rank approach significantly outperforms two recent state-of-the-art methods in recommending relevant files for bug reports. In particular, our method makes correct recommendations within the top 10 ranked source files for over 70% of the bug reports in the Eclipse Platform and Tomcat projects.","PeriodicalId":250543,"journal":{"name":"Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"257","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2635868.2635874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 257

Abstract

When a new bug report is received, developers usually need to reproduce the bug and perform code reviews to find the cause, a process that can be tedious and time consuming. A tool for ranking all the source files of a project with respect to how likely they are to contain the cause of the bug would enable developers to narrow down their search and potentially could lead to a substantial increase in productivity. This paper introduces an adaptive ranking approach that leverages domain knowledge through functional decompositions of source code files into methods, API descriptions of library components used in the code, the bug-fixing history, and the code change history. Given a bug report, the ranking score of each source file is computed as a weighted combination of an array of features encoding domain knowledge, where the weights are trained automatically on previously solved bug reports using a learning-to-rank technique. We evaluated our system on six large scale open source Java projects, using the before-fix version of the project for every bug report. The experimental results show that the newly introduced learning-to-rank approach significantly outperforms two recent state-of-the-art methods in recommending relevant files for bug reports. In particular, our method makes correct recommendations within the top 10 ranked source files for over 70% of the bug reports in the Eclipse Platform and Tomcat projects.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学习使用领域知识对bug报告的相关文件进行排序

当收到新的错误报告时，开发人员通常需要重新生成错误并执行代码审查以找到原因，这是一个乏味且耗时的过程。如果有一个工具可以根据包含错误原因的可能性对项目的所有源文件进行排序，这将使开发人员能够缩小搜索范围，并可能大大提高生产力。本文介绍了一种自适应排序方法，该方法通过将源代码文件分解为方法、代码中使用的库组件的API描述、错误修复历史和代码更改历史来利用领域知识。给定一个错误报告，每个源文件的排名分数被计算为编码领域知识的特征数组的加权组合，其中权重是使用学习排序技术在先前解决的错误报告上自动训练的。我们在六个大型开源Java项目上评估了我们的系统，对每个bug报告使用修复前的项目版本。实验结果表明，新引入的学习排序方法在为bug报告推荐相关文件方面明显优于最近两种最先进的方法。特别是，我们的方法对Eclipse平台和Tomcat项目中超过70%的bug报告中排名前10位的源文件给出了正确的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

自引率

0.00%

发文量