Building an Ensemble for Software Defect Prediction Based on Diversity Selection

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement Pub Date : 2016-09-08 DOI:10.1145/2961111.2962610

Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo

{"title":"Building an Ensemble for Software Defect Prediction Based on Diversity Selection","authors":"Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo","doi":"10.1145/2961111.2962610","DOIUrl":null,"url":null,"abstract":"Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2961111.2962610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 52

Abstract

Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于多样性选择的软件缺陷预测集成构建

背景:集成技术在各个科学领域得到了广泛的关注。缺陷预测研究人员已经研究了许多最先进的集成模型，并得出结论，在许多情况下，这些模型优于标准的单一分类器技术。几乎所有先前使用集成技术进行缺陷预测的工作都依赖于多数投票方案来组合预测输出，以及单个分类器之间的隐式多样性。目的:考虑到不同的分类器识别不同的缺陷集，研究是否可以使用带有堆叠集成的显式多样性技术来改进缺陷预测。方法:采用四科分类器和加权精度多样性(WAD)技术来挖掘分类器之间的多样性。为了结合单个预测，我们使用了堆叠集成技术。我们在软件缺陷预测中使用最先进的知识来构建我们的集成模型，并针对8个公开可用的数据集测试了它们的预测能力。结论:与其他缺陷预测模型相比，使用堆叠集成模型的性能有所提高。用于建筑集成的分类器之间的多样性对于实现这些性能改进至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

自引率

0.00%

发文量