A Replication Study: Just-in-Time Defect Prediction with Ensemble Learning

2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE) Pub Date : 2018-05-28 DOI:10.1145/3194104.3194110

Steven Young, T. Abdou, A. Bener

{"title":"A Replication Study: Just-in-Time Defect Prediction with Ensemble Learning","authors":"Steven Young, T. Abdou, A. Bener","doi":"10.1145/3194104.3194110","DOIUrl":null,"url":null,"abstract":"Just-in-time defect prediction, which is also known as change-level defect prediction, can be used to efficiently allocate resources and manage project schedules in the software testing and debugging process. Just-in-time defect prediction can reduce the amount of code to review and simplify the assignment of developers to bug fixes. This paper reports a replicated experiment and an extension comparing the prediction of defect-prone changes using traditional machine learning techniques and ensemble learning. Using datasets from six open source projects, namely Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL we replicate the original approach to verify the results of the original experiment and use them as a basis for comparison for alternatives in the approach. Our results from the replicated experiment are consistent with the original. The original approach uses a combination of data preprocessing and a two-layer ensemble of decision trees. The first layer uses bagging to form multiple random forests. The second layer stacks the forests together with equal weights. Generalizing the approach to allow the use of any arbitrary set of classifiers in the ensemble, optimizing the weights of the classifiers, and allowing additional layers, we apply a new deep ensemble approach, called deep super learner, to test the depth of the original study. The deep super learner achieves statistically significantly better results than the original approach on five of the six projects in predicting defects as measured by F1 score.","PeriodicalId":249268,"journal":{"name":"2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3194104.3194110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Just-in-time defect prediction, which is also known as change-level defect prediction, can be used to efficiently allocate resources and manage project schedules in the software testing and debugging process. Just-in-time defect prediction can reduce the amount of code to review and simplify the assignment of developers to bug fixes. This paper reports a replicated experiment and an extension comparing the prediction of defect-prone changes using traditional machine learning techniques and ensemble learning. Using datasets from six open source projects, namely Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL we replicate the original approach to verify the results of the original experiment and use them as a basis for comparison for alternatives in the approach. Our results from the replicated experiment are consistent with the original. The original approach uses a combination of data preprocessing and a two-layer ensemble of decision trees. The first layer uses bagging to form multiple random forests. The second layer stacks the forests together with equal weights. Generalizing the approach to allow the use of any arbitrary set of classifiers in the ensemble, optimizing the weights of the classifiers, and allowing additional layers, we apply a new deep ensemble approach, called deep super learner, to test the depth of the original study. The deep super learner achieves statistically significantly better results than the original approach on five of the six projects in predicting defects as measured by F1 score.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一项复制研究:集成学习的即时缺陷预测

即时缺陷预测，也称为变更级缺陷预测，可以用于在软件测试和调试过程中有效地分配资源和管理项目进度。及时缺陷预测可以减少需要审查的代码数量，并简化开发人员对错误修复的分配。本文报告了一个重复实验和一个扩展，比较了使用传统机器学习技术和集成学习预测容易出现缺陷的变化。使用来自六个开源项目的数据集，即Bugzilla, Columba, JDT, Platform, Mozilla和PostgreSQL，我们复制了原始方法来验证原始实验的结果，并将它们作为比较方法替代方案的基础。我们从重复实验中得到的结果与原来的一致。最初的方法结合了数据预处理和决策树的两层集合。第一层使用套袋来形成多个随机森林。第二层以相同的重量将森林堆叠在一起。推广该方法以允许在集成中使用任意一组分类器，优化分类器的权重，并允许额外的层，我们应用了一种新的深度集成方法，称为深度超级学习器，来测试原始研究的深度。在预测缺陷方面，深度超级学习器在六个项目中的五个项目上取得了统计上显著优于原始方法的结果(以F1分数衡量)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)

自引率

0.00%

发文量