基于监督主题建模的Bug定位

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI:10.1109/ICDM.2018.00076

Yaojing Wang, Yuan Yao, Hanghang Tong, Xuan Huo, Min Li, F. Xu, Jian Lu

{"title":"基于监督主题建模的Bug定位","authors":"Yaojing Wang, Yuan Yao, Hanghang Tong, Xuan Huo, Min Li, F. Xu, Jian Lu","doi":"10.1109/ICDM.2018.00076","DOIUrl":null,"url":null,"abstract":"Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Bug Localization via Supervised Topic Modeling\",\"authors\":\"Yaojing Wang, Yuan Yao, Hanghang Tong, Xuan Huo, Min Li, F. Xu, Jian Lu\",\"doi\":\"10.1109/ICDM.2018.00076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.\",\"PeriodicalId\":286444,\"journal\":{\"name\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"107 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2018.00076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

Bug跟踪系统用于跟踪报告的软件Bug，在软件开发和维护中得到了广泛应用。在这些系统中，对于软件开发人员来说，在给定错误报告的大量源文件中识别相关的源文件是一项耗时且费力的任务。为了解决这个问题，信息检索方法被广泛用于捕获错误报告和源文件之间的文本相似性或语义相似性。然而，这两种类型的相似性通常是分开考虑的，并且历史错误修复在很大程度上被现有方法所忽略。在本文中，我们提出了一种监督主题建模方法(STMLOCATOR)，用于自动定位给定bug报告的相关源文件。特别地，提出的模型是建立在三个关键观察的基础上的。首先，监督建模可以有效地利用现有的修复历史。其次，bug报告中的某些词往往会在相关的源文件中出现多次。第三，较长的源文件往往有更多的错误。通过集成上述三个观察结果，所提出的STMLOCATOR以监督的方式利用历史修复，并学习bug报告和源文件之间的文本相似性和语义相似性。我们进一步考虑在bug报告中使用堆栈跟踪的一种特殊类型的bug报告，并提出STMLOCATOR的一个变体来定制这种bug报告。在三个真实数据集上的实验评估表明，所提出的STMLOCATOR在预测精度方面比其最佳竞争对手提高了23.6%，并且与数据大小呈线性扩展。此外，在那些带有堆栈跟踪的错误报告上，提议的变体进一步将STMLOCATOR提高了76.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Bug Localization via Supervised Topic Modeling

Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助