A Replication Study: Just-in-Time Defect Prediction with Ensemble Learning

Steven Young, T. Abdou, A. Bener
{"title":"A Replication Study: Just-in-Time Defect Prediction with Ensemble Learning","authors":"Steven Young, T. Abdou, A. Bener","doi":"10.1145/3194104.3194110","DOIUrl":null,"url":null,"abstract":"Just-in-time defect prediction, which is also known as change-level defect prediction, can be used to efficiently allocate resources and manage project schedules in the software testing and debugging process. Just-in-time defect prediction can reduce the amount of code to review and simplify the assignment of developers to bug fixes. This paper reports a replicated experiment and an extension comparing the prediction of defect-prone changes using traditional machine learning techniques and ensemble learning. Using datasets from six open source projects, namely Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL we replicate the original approach to verify the results of the original experiment and use them as a basis for comparison for alternatives in the approach. Our results from the replicated experiment are consistent with the original. The original approach uses a combination of data preprocessing and a two-layer ensemble of decision trees. The first layer uses bagging to form multiple random forests. The second layer stacks the forests together with equal weights. Generalizing the approach to allow the use of any arbitrary set of classifiers in the ensemble, optimizing the weights of the classifiers, and allowing additional layers, we apply a new deep ensemble approach, called deep super learner, to test the depth of the original study. The deep super learner achieves statistically significantly better results than the original approach on five of the six projects in predicting defects as measured by F1 score.","PeriodicalId":249268,"journal":{"name":"2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3194104.3194110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Just-in-time defect prediction, which is also known as change-level defect prediction, can be used to efficiently allocate resources and manage project schedules in the software testing and debugging process. Just-in-time defect prediction can reduce the amount of code to review and simplify the assignment of developers to bug fixes. This paper reports a replicated experiment and an extension comparing the prediction of defect-prone changes using traditional machine learning techniques and ensemble learning. Using datasets from six open source projects, namely Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL we replicate the original approach to verify the results of the original experiment and use them as a basis for comparison for alternatives in the approach. Our results from the replicated experiment are consistent with the original. The original approach uses a combination of data preprocessing and a two-layer ensemble of decision trees. The first layer uses bagging to form multiple random forests. The second layer stacks the forests together with equal weights. Generalizing the approach to allow the use of any arbitrary set of classifiers in the ensemble, optimizing the weights of the classifiers, and allowing additional layers, we apply a new deep ensemble approach, called deep super learner, to test the depth of the original study. The deep super learner achieves statistically significantly better results than the original approach on five of the six projects in predicting defects as measured by F1 score.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一项复制研究:集成学习的即时缺陷预测
即时缺陷预测,也称为变更级缺陷预测,可以用于在软件测试和调试过程中有效地分配资源和管理项目进度。及时缺陷预测可以减少需要审查的代码数量,并简化开发人员对错误修复的分配。本文报告了一个重复实验和一个扩展,比较了使用传统机器学习技术和集成学习预测容易出现缺陷的变化。使用来自六个开源项目的数据集,即Bugzilla, Columba, JDT, Platform, Mozilla和PostgreSQL,我们复制了原始方法来验证原始实验的结果,并将它们作为比较方法替代方案的基础。我们从重复实验中得到的结果与原来的一致。最初的方法结合了数据预处理和决策树的两层集合。第一层使用套袋来形成多个随机森林。第二层以相同的重量将森林堆叠在一起。推广该方法以允许在集成中使用任意一组分类器,优化分类器的权重,并允许额外的层,我们应用了一种新的深度集成方法,称为深度超级学习器,来测试原始研究的深度。在预测缺陷方面,深度超级学习器在六个项目中的五个项目上取得了统计上显著优于原始方法的结果(以F1分数衡量)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Semi-Automatic Generation of Active Ontologies from Web Forms for Intelligent Assistants Evaluating the Adaptive Selection of Classifiers for Cross-Project Bug Prediction A Replication Study: Just-in-Time Defect Prediction with Ensemble Learning Integrating a Dialog Component into a Framework for Spoken Language Understanding Complementing Machine Learning Classifiers via Dynamic Symbolic Execution: "Human vs. Bot Generated" Tweets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1