基于机器学习技术的失衡数据集混合软件缺陷预测模型(S-SVM模型)

Mohd. Mustaqeem, Tamanna Siddiqui
{"title":"基于机器学习技术的失衡数据集混合软件缺陷预测模型(S-SVM模型)","authors":"Mohd. Mustaqeem, Tamanna Siddiqui","doi":"10.32629/jai.v6i1.559","DOIUrl":null,"url":null,"abstract":"Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.","PeriodicalId":70721,"journal":{"name":"自主智能(英文)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Hybrid Software Defects Prediction Model for Imbalance Datasets Using Machine Learning Techniques: (S-SVM Model)\",\"authors\":\"Mohd. Mustaqeem, Tamanna Siddiqui\",\"doi\":\"10.32629/jai.v6i1.559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.\",\"PeriodicalId\":70721,\"journal\":{\"name\":\"自主智能(英文)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"自主智能(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.32629/jai.v6i1.559\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"自主智能(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.32629/jai.v6i1.559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

软件缺陷预测(SDP)是开发高质量软件的一项重要任务,为此已经开发了各种模型。然而,软件缺陷数据集的不平衡性质对这些模型提出了挑战,导致性能下降。为了应对这一挑战,作者提出了一种将合成少数过采样技术(SMOTE)与支持向量机(SVM)相结合的混合机器学习模型——SMOTE-SVM(S-SVM)模型。作者使用PROMISE和AEEEM存储库中的多个数据集(CM1、PC1、JM1、PC3、KC1、EQ和JDT)对SDP进行了实证检验。实验研究表明,S-SVM模型涉及训练,并使用四个评估指标(Precision、Recall、F1分数和Accuracy)与先前开发的平衡和不平衡测试数据集进行比较。对于平衡数据集,S-SVM模型的精度值在70到96之间,召回率值在52到94之间,F1得分值在67到90之间,准确度值在69到98之间。对于不平衡数据集,S-SVM模型的精度值在60到93之间,召回率值在64到97之间,F1得分值在69到91之间,准确度值在67到87之间。所提出的S-SVM模型优于其他模型对软件缺陷的分类和预测能力。因此,当提供足够的缺陷和无缺陷数据时,SMOTE和SVM的混合提高了模型分类和预测平衡和不平衡数据集的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Hybrid Software Defects Prediction Model for Imbalance Datasets Using Machine Learning Techniques: (S-SVM Model)
Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.40
自引率
0.00%
发文量
25
期刊最新文献
Conditioning and monitoring of grinding wheels: A state-of-the-art review Design and implementation of secured file delivery protocol using enhanced elliptic curve cryptography for class I and class II transactions An improved fuzzy c-means-raindrop optimizer for brain magnetic resonance image segmentation Key management and access control based on combination of cipher text-policy attribute-based encryption with Proxy Re-Encryption for cloud data Novel scientific design of hybrid opposition based—Chaotic little golden-mantled flying fox, White-winged chough search optimization algorithm for real power loss reduction and voltage stability expansion
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1