Building an Ensemble for Software Defect Prediction Based on Diversity Selection

Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo
{"title":"Building an Ensemble for Software Defect Prediction Based on Diversity Selection","authors":"Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo","doi":"10.1145/2961111.2962610","DOIUrl":null,"url":null,"abstract":"Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2961111.2962610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 52

Abstract

Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多样性选择的软件缺陷预测集成构建
背景:集成技术在各个科学领域得到了广泛的关注。缺陷预测研究人员已经研究了许多最先进的集成模型,并得出结论,在许多情况下,这些模型优于标准的单一分类器技术。几乎所有先前使用集成技术进行缺陷预测的工作都依赖于多数投票方案来组合预测输出,以及单个分类器之间的隐式多样性。目的:考虑到不同的分类器识别不同的缺陷集,研究是否可以使用带有堆叠集成的显式多样性技术来改进缺陷预测。方法:采用四科分类器和加权精度多样性(WAD)技术来挖掘分类器之间的多样性。为了结合单个预测,我们使用了堆叠集成技术。我们在软件缺陷预测中使用最先进的知识来构建我们的集成模型,并针对8个公开可用的数据集测试了它们的预测能力。结论:与其他缺陷预测模型相比,使用堆叠集成模型的性能有所提高。用于建筑集成的分类器之间的多样性对于实现这些性能改进至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evidence Briefings: Towards a Medium to Transfer Knowledge from Systematic Reviews to Practitioners The Obscure Process of Innovation Assessment: A Report of an Industrial Survey Sustainable Software Development through Overlapping Pair Rotation DIGS: A Framework for Discovering Goals for Security Requirements Engineering The Impact of Task Granularity on Co-evolution Analyses
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1