从报告到bug修复提交:来自55个Apache开源项目的10年bug修复活动数据集

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI:10.1145/3345629.3345639

Renan Vieira, Antônio da Silva, L. Rocha, J. Gomes

{"title":"从报告到bug修复提交:来自55个Apache开源项目的10年bug修复活动数据集","authors":"Renan Vieira, Antônio da Silva, L. Rocha, J. Gomes","doi":"10.1145/3345629.3345639","DOIUrl":null,"url":null,"abstract":"Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git). We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects\",\"authors\":\"Renan Vieira, Antônio da Silva, L. Rocha, J. Gomes\",\"doi\":\"10.1145/3345629.3345639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git). We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.\",\"PeriodicalId\":424201,\"journal\":{\"name\":\"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering\",\"volume\":\"2016 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3345629.3345639\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3345629.3345639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

几乎在任何软件开发中都会出现bug。解决所有或至少大部分问题需要大量的时间、精力和预算。软件项目通常使用问题跟踪系统作为报告和监视错误修复任务的一种方式。近年来，一些研究人员一直在进行bug跟踪分析，以更好地了解问题，从而提供降低成本和提高bug修复任务效率的方法。在本文中，我们介绍了一个新的数据集，该数据集由Apache软件基金会55个项目10年来的错误修复活动中的70,000多个错误修复报告组成，分为9个类别。我们从Jira问题跟踪系统中挖掘了这些信息，这些信息涉及具有关闭/已解决状态的报告的两个不同视角:静态(报告的最新版本)和动态(随着时间的推移报告中发生的变化)。我们还从各自的版本控制系统(Git)中修复这些错误的提交(如果存在的话)中提取信息。我们还提供了报告中发生的变化分析，作为说明和描述建议数据集的一种方式。一旦数据提取过程是一个容易出错的重要任务，我们相信这样的倡议可以帮助研究人员进行更详细的调查。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects

Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git). We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering

自引率

0.00%

发文量

期刊最新文献

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering An Evaluation of Parameter Pruning Approaches for Software Estimation Which Refactoring Reduces Bug Rate? Reviewer Recommendation using Software Artifact Traceability Graphs Prioritizing automated user interface tests using reinforcement learning