{"title":"基于机器学习技术的失衡数据集混合软件缺陷预测模型(S-SVM模型)","authors":"Mohd. Mustaqeem, Tamanna Siddiqui","doi":"10.32629/jai.v6i1.559","DOIUrl":null,"url":null,"abstract":"Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.","PeriodicalId":70721,"journal":{"name":"自主智能(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Hybrid Software Defects Prediction Model for Imbalance Datasets Using Machine Learning Techniques: (S-SVM Model)\",\"authors\":\"Mohd. Mustaqeem, Tamanna Siddiqui\",\"doi\":\"10.32629/jai.v6i1.559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.\",\"PeriodicalId\":70721,\"journal\":{\"name\":\"自主智能(英文)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"自主智能(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.32629/jai.v6i1.559\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"自主智能(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.32629/jai.v6i1.559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Hybrid Software Defects Prediction Model for Imbalance Datasets Using Machine Learning Techniques: (S-SVM Model)
Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.