基于机器学习技术的失衡数据集混合软件缺陷预测模型(S-SVM模型)

自主智能(英文) Pub Date : 2023-06-16 DOI:10.32629/jai.v6i1.559

Mohd. Mustaqeem, Tamanna Siddiqui

{"title":"基于机器学习技术的失衡数据集混合软件缺陷预测模型(S-SVM模型)","authors":"Mohd. Mustaqeem, Tamanna Siddiqui","doi":"10.32629/jai.v6i1.559","DOIUrl":null,"url":null,"abstract":"Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.","PeriodicalId":70721,"journal":{"name":"自主智能(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Hybrid Software Defects Prediction Model for Imbalance Datasets Using Machine Learning Techniques: (S-SVM Model)\",\"authors\":\"Mohd. Mustaqeem, Tamanna Siddiqui\",\"doi\":\"10.32629/jai.v6i1.559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.\",\"PeriodicalId\":70721,\"journal\":{\"name\":\"自主智能(英文)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"自主智能(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.32629/jai.v6i1.559\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"自主智能(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.32629/jai.v6i1.559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

软件缺陷预测（SDP）是开发高质量软件的一项重要任务，为此已经开发了各种模型。然而，软件缺陷数据集的不平衡性质对这些模型提出了挑战，导致性能下降。为了应对这一挑战，作者提出了一种将合成少数过采样技术（SMOTE）与支持向量机（SVM）相结合的混合机器学习模型——SMOTE-SVM（S-SVM）模型。作者使用PROMISE和AEEEM存储库中的多个数据集（CM1、PC1、JM1、PC3、KC1、EQ和JDT）对SDP进行了实证检验。实验研究表明，S-SVM模型涉及训练，并使用四个评估指标（Precision、Recall、F1分数和Accuracy）与先前开发的平衡和不平衡测试数据集进行比较。对于平衡数据集，S-SVM模型的精度值在70到96之间，召回率值在52到94之间，F1得分值在67到90之间，准确度值在69到98之间。对于不平衡数据集，S-SVM模型的精度值在60到93之间，召回率值在64到97之间，F1得分值在69到91之间，准确度值在67到87之间。所提出的S-SVM模型优于其他模型对软件缺陷的分类和预测能力。因此，当提供足够的缺陷和无缺陷数据时，SMOTE和SVM的混合提高了模型分类和预测平衡和不平衡数据集的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Hybrid Software Defects Prediction Model for Imbalance Datasets Using Machine Learning Techniques: (S-SVM Model)

Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

自主智能(英文)

CiteScore

0.40

自引率

0.00%

发文量