基于bug报告的根本原因预测

Thomas Hirsch, Birgit Hofer
{"title":"基于bug报告的根本原因预测","authors":"Thomas Hirsch, Birgit Hofer","doi":"10.1109/ISSREW51248.2020.00067","DOIUrl":null,"url":null,"abstract":"This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process—either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs’ root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Root cause prediction based on bug reports\",\"authors\":\"Thomas Hirsch, Birgit Hofer\",\"doi\":\"10.1109/ISSREW51248.2020.00067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process—either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs’ root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.\",\"PeriodicalId\":202247,\"journal\":{\"name\":\"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSREW51248.2020.00067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW51248.2020.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

本文提出了一种有监督的机器学习方法来预测给定bug报告的根本原因。了解bug的根本原因可以帮助开发人员在调试过程中直接或间接地为调试任务选择合适的工具支持。我们从103个GitHub项目的问题跟踪器中挖掘了54755个已关闭的bug报告,并应用了一组启发式方法来创建一个由10459个报告组成的基准。根据错误的根本原因,将一个子集手动分为三组(语义组、内存组和并发组)。由于根本原因的类型不是均匀分布的,所以采用了关键词搜索和随机选择相结合的方法。我们的机器学习方法的数据集由369个错误报告组成(122个并发错误,121个内存错误和126个语义错误)。错误报告被用作自然语言处理算法的输入。我们评估了几个分类器的性能,以预测给定bug报告的根本原因。线性支持向量机获得了最高的平均精度(0.74)和召回率(0.72)分数。所创建的错误数据集和分类是公开的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Root cause prediction based on bug reports
This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process—either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs’ root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
BP-IDS: Using business process specification to leverage intrusion detection in critical infrastructures Techniques and Tools for Advanced Software Vulnerability Detection Challenges Faced with Application Performance Monitoring (APM) when Migrating to the Cloud AHPCap: A Framework for Automated Hardware Profiling and Capture of Mobile Application States Unit Lemmas for Detecting Requirement and Specification Flaws
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1