结合PCA和SMOTE的软件缺陷预测与可视化分析方法

Rizal Broer Bahaweres, Mutia Salsabila, Nurul Faizah Rozy, I. Hermadi, A. Suroso, Y. Arkeman
{"title":"结合PCA和SMOTE的软件缺陷预测与可视化分析方法","authors":"Rizal Broer Bahaweres, Mutia Salsabila, Nurul Faizah Rozy, I. Hermadi, A. Suroso, Y. Arkeman","doi":"10.1109/CITSM56380.2022.9935831","DOIUrl":null,"url":null,"abstract":"Software defect prediction enables efficient management of time and resources in the form of improving software quality. Therefore, research to improve the performance or accuracy score of the software defect prediction model is still being carried out. However, datasets for SDP often have a large number of attributes and imbalance between the defective and non-defective class samples, which reduces classification performance. In this study, we propose combining PCA with SMOTE with aims to produce models with better performance and visualization approach to represent the model created to help understanding and analysis for modeling in the future. The SVM, RF, NB, and NN classification algorithms which the best parameters are sought, are evaluated based on the Recall, AUC and G-Mean values in five different NASA datasets. The authors then compare the results of the evaluation of the proposed model with the PCA model without SMOTE to find out whether the performance of the model has improved. Visual analytics is successfully built after the model is created for all stages of the model building so it provides confidence, helps users understand and gain insights from the resulting model. The findings indicate that the proposed method outperforms the model using PCA alone on average by 60%, 47%, and 16% for Recall, AUC, and G-Mean scores, respectively. SMOTE is proven to overcome the effect of class imbalance by increasing the g-mean score in all models and NN is the best algorithm based on the average score in the proposed model.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Combining PCA and SMOTE for Software Defect Prediction with Visual Analytics Approach\",\"authors\":\"Rizal Broer Bahaweres, Mutia Salsabila, Nurul Faizah Rozy, I. Hermadi, A. Suroso, Y. Arkeman\",\"doi\":\"10.1109/CITSM56380.2022.9935831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software defect prediction enables efficient management of time and resources in the form of improving software quality. Therefore, research to improve the performance or accuracy score of the software defect prediction model is still being carried out. However, datasets for SDP often have a large number of attributes and imbalance between the defective and non-defective class samples, which reduces classification performance. In this study, we propose combining PCA with SMOTE with aims to produce models with better performance and visualization approach to represent the model created to help understanding and analysis for modeling in the future. The SVM, RF, NB, and NN classification algorithms which the best parameters are sought, are evaluated based on the Recall, AUC and G-Mean values in five different NASA datasets. The authors then compare the results of the evaluation of the proposed model with the PCA model without SMOTE to find out whether the performance of the model has improved. Visual analytics is successfully built after the model is created for all stages of the model building so it provides confidence, helps users understand and gain insights from the resulting model. The findings indicate that the proposed method outperforms the model using PCA alone on average by 60%, 47%, and 16% for Recall, AUC, and G-Mean scores, respectively. SMOTE is proven to overcome the effect of class imbalance by increasing the g-mean score in all models and NN is the best algorithm based on the average score in the proposed model.\",\"PeriodicalId\":342813,\"journal\":{\"name\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CITSM56380.2022.9935831\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9935831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

软件缺陷预测能够以改进软件质量的形式有效地管理时间和资源。因此,提高软件缺陷预测模型的性能或准确性分数的研究仍在进行中。然而,用于SDP的数据集往往具有大量的属性,并且缺陷类样本与非缺陷类样本之间存在不平衡,从而降低了分类性能。在本研究中,我们提出将PCA与SMOTE相结合,旨在产生具有更好性能的模型,并采用可视化方法来表示所创建的模型,以帮助理解和分析未来的建模。基于五个不同NASA数据集的Recall、AUC和G-Mean值,对寻找最佳参数的SVM、RF、NB和NN分类算法进行了评估。然后,作者将提出的模型的评估结果与没有SMOTE的PCA模型进行比较,以确定模型的性能是否有所提高。在为模型构建的所有阶段创建模型之后,成功构建了可视化分析,因此它提供了信心,帮助用户理解并从结果模型中获得见解。研究结果表明,所提出的方法在召回率、AUC和G-Mean得分方面分别比单独使用PCA的模型平均高出60%、47%和16%。SMOTE通过提高所有模型的g均值来克服类不平衡的影响,而NN是基于该模型中均值的最佳算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Combining PCA and SMOTE for Software Defect Prediction with Visual Analytics Approach
Software defect prediction enables efficient management of time and resources in the form of improving software quality. Therefore, research to improve the performance or accuracy score of the software defect prediction model is still being carried out. However, datasets for SDP often have a large number of attributes and imbalance between the defective and non-defective class samples, which reduces classification performance. In this study, we propose combining PCA with SMOTE with aims to produce models with better performance and visualization approach to represent the model created to help understanding and analysis for modeling in the future. The SVM, RF, NB, and NN classification algorithms which the best parameters are sought, are evaluated based on the Recall, AUC and G-Mean values in five different NASA datasets. The authors then compare the results of the evaluation of the proposed model with the PCA model without SMOTE to find out whether the performance of the model has improved. Visual analytics is successfully built after the model is created for all stages of the model building so it provides confidence, helps users understand and gain insights from the resulting model. The findings indicate that the proposed method outperforms the model using PCA alone on average by 60%, 47%, and 16% for Recall, AUC, and G-Mean scores, respectively. SMOTE is proven to overcome the effect of class imbalance by increasing the g-mean score in all models and NN is the best algorithm based on the average score in the proposed model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fault Detection in Wireless Sensor Networks Data Using Random Under Sampling and Extra-Tree Algorithm Automated House Budget Plan Application Analysis of E-Government Service Quality using E-GovQual and Importance Performance Analysis (IPA) Analysis of Public Sentiment Using The K-Nearest Neighbor (k-NN) Algorithm and Lexicon Based on Indonesian Television Shows on Social Media Twitter Heuristic and Webuse Method to Evaluate UI/UX of Faculty Website
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1