Software Defect Prediction Based on Optimized Machine Learning Models: A Comparative Study

Teknika Pub Date : 2023-06-30 DOI:10.34148/teknika.v12i2.634
M. Z. Siswantoro, Umi Laili Yuhana
{"title":"Software Defect Prediction Based on Optimized Machine Learning Models: A Comparative Study","authors":"M. Z. Siswantoro, Umi Laili Yuhana","doi":"10.34148/teknika.v12i2.634","DOIUrl":null,"url":null,"abstract":"Software defect prediction is crucial used for detecting possible defects in software before they manifest. While machine learning models have become more prevalent in software defect prediction, their effectiveness may vary based on the dataset and hyperparameters of the model. Difficulties arise in determining the most suitable hyperparameters for the model, as well as identifying the prominent features that serve as input to the classifier. This research aims to evaluate various traditional machine learning models that are optimized for software defect prediction on NASA MDP (Metrics Data Program) datasets. The datasets were classified using k-nearest neighbors (k-NN), decision trees, logistic regression, linear discriminant analysis (LDA), single hidden layer multilayer perceptron (SHL-MLP), and Support Vector Machine (SVM). The hyperparameters of the models were fine-tuned using random search, and the feature dimensionality was decreased by utilizing principal component analysis (PCA). The synthetic minority oversampling technique (SMOTE) was implemented to oversample the minority class in order to correct the class imbalance. k-NN was found to be the most suitable for software defect prediction on several datasets, while SHL-MLP and SVM were also effective on certain datasets. It is noteworthy that logistic regression and LDA did not perform as well as the other models. Moreover, the optimized models outperform the baseline models in terms of classification accuracy. The choice of model for software defect prediction should be based on the specific characteristics of the dataset. Furthermore, hyperparameter tuning can improve the accuracy of machine learning models in predicting software defects.","PeriodicalId":52620,"journal":{"name":"Teknika","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Teknika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34148/teknika.v12i2.634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Software defect prediction is crucial used for detecting possible defects in software before they manifest. While machine learning models have become more prevalent in software defect prediction, their effectiveness may vary based on the dataset and hyperparameters of the model. Difficulties arise in determining the most suitable hyperparameters for the model, as well as identifying the prominent features that serve as input to the classifier. This research aims to evaluate various traditional machine learning models that are optimized for software defect prediction on NASA MDP (Metrics Data Program) datasets. The datasets were classified using k-nearest neighbors (k-NN), decision trees, logistic regression, linear discriminant analysis (LDA), single hidden layer multilayer perceptron (SHL-MLP), and Support Vector Machine (SVM). The hyperparameters of the models were fine-tuned using random search, and the feature dimensionality was decreased by utilizing principal component analysis (PCA). The synthetic minority oversampling technique (SMOTE) was implemented to oversample the minority class in order to correct the class imbalance. k-NN was found to be the most suitable for software defect prediction on several datasets, while SHL-MLP and SVM were also effective on certain datasets. It is noteworthy that logistic regression and LDA did not perform as well as the other models. Moreover, the optimized models outperform the baseline models in terms of classification accuracy. The choice of model for software defect prediction should be based on the specific characteristics of the dataset. Furthermore, hyperparameter tuning can improve the accuracy of machine learning models in predicting software defects.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于优化机器学习模型的软件缺陷预测比较研究
软件缺陷预测对于在软件中可能的缺陷出现之前检测它们是至关重要的。虽然机器学习模型在软件缺陷预测中变得越来越普遍,但它们的有效性可能会根据模型的数据集和超参数而变化。在为模型确定最合适的超参数以及识别作为分类器输入的突出特征方面出现了困难。本研究旨在评估各种传统机器学习模型,这些模型针对NASA MDP (Metrics Data Program)数据集上的软件缺陷预测进行了优化。使用k近邻(k-NN)、决策树、逻辑回归、线性判别分析(LDA)、单隐层多层感知器(SHL-MLP)和支持向量机(SVM)对数据集进行分类。利用随机搜索对模型的超参数进行微调,利用主成分分析(PCA)对特征维数进行降维。采用合成少数派过采样技术(SMOTE)对少数派类进行过采样,以纠正类不平衡。在一些数据集上,发现k-NN最适合软件缺陷预测,而SHL-MLP和SVM在某些数据集上也很有效。值得注意的是,逻辑回归和LDA的表现不如其他模型。此外,优化后的模型在分类精度方面优于基线模型。软件缺陷预测模型的选择应基于数据集的具体特征。此外,超参数调优可以提高机器学习模型预测软件缺陷的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
22
审稿时长
6 weeks
期刊最新文献
Classification of Lung Cancer with Convolutional Neural Network Method Using ResNet Architecture Algoritma Machine Learning Dalam Melakukan Prediksi Pemilihan Konfigurasi Kapal Tunda di Pelabuhan Tanjung Priok Exploration of Software as a Service (SaaS) as a Project Management Tools Innovative Approach of 2D Platformer Mobile Game Development “Super Journey” Klasifikasi Penyakit Paru-Paru Berdasarkan Peningkatan Kualitas Kontras dan EfficientNet Menggunakan Gambar X-Ray
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1