基于bug报告的根本原因预测

2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW) Pub Date : 2020-10-01 DOI:10.1109/ISSREW51248.2020.00067

Thomas Hirsch, Birgit Hofer

{"title":"基于bug报告的根本原因预测","authors":"Thomas Hirsch, Birgit Hofer","doi":"10.1109/ISSREW51248.2020.00067","DOIUrl":null,"url":null,"abstract":"This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process—either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs’ root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Root cause prediction based on bug reports\",\"authors\":\"Thomas Hirsch, Birgit Hofer\",\"doi\":\"10.1109/ISSREW51248.2020.00067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process—either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs’ root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.\",\"PeriodicalId\":202247,\"journal\":{\"name\":\"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSREW51248.2020.00067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW51248.2020.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文提出了一种有监督的机器学习方法来预测给定bug报告的根本原因。了解bug的根本原因可以帮助开发人员在调试过程中直接或间接地为调试任务选择合适的工具支持。我们从103个GitHub项目的问题跟踪器中挖掘了54755个已关闭的bug报告，并应用了一组启发式方法来创建一个由10459个报告组成的基准。根据错误的根本原因，将一个子集手动分为三组(语义组、内存组和并发组)。由于根本原因的类型不是均匀分布的，所以采用了关键词搜索和随机选择相结合的方法。我们的机器学习方法的数据集由369个错误报告组成(122个并发错误，121个内存错误和126个语义错误)。错误报告被用作自然语言处理算法的输入。我们评估了几个分类器的性能，以预测给定bug报告的根本原因。线性支持向量机获得了最高的平均精度(0.74)和召回率(0.72)分数。所创建的错误数据集和分类是公开的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Root cause prediction based on bug reports

This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process—either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs’ root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)

自引率

0.00%

发文量