Performance Evaluation of various ML techniques for Software Fault Prediction using NASA dataset

Baraah Alsangari, Göksel Bi̇rci̇k
{"title":"Performance Evaluation of various ML techniques for Software Fault Prediction using NASA dataset","authors":"Baraah Alsangari, Göksel Bi̇rci̇k","doi":"10.1109/HORA58378.2023.10156708","DOIUrl":null,"url":null,"abstract":"In order to improve software dependability, Software Fault Prediction (SFP) has become an important research topic in the area of software engineering. To improve program dependability, program defect predictions are being utilized to aid developers in anticipating prospective issues and optimizing testing resources. As a result of this method, the amount of software defects may be forecast, and software testing resources are directed toward the software modules that have the greatest issues, enabling the defects to be fixed as soon as possible. As a result, this paper handles the issue related for SFP based on using a dataset known as JM1 provided by NASA, with 21 features. In this study, several Machine Learning (ML) techniques will be studied, which include Logistic Regression (LR), Random Forest (RF), Naive Bias (NB), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) with three distance metric, Decision Tree (DT). Three cases of normalization will be involved with investigation which are the without sampling, Random over Sample and the SMOTE. Performance evaluation will be based on various parameters such as the ACC, Recall, Precision, and F1-Score. Results obtained indicate that RF achieve the higher ACC with values of 0.81%, 0.92%, and 0.88% respectively. The comprehensive findings of this study may be utilized as a baseline for subsequent studies, allowing any claim of improved prediction using any new approach, model, or framework to be compared and confirmed. In future, the variation of feature number will be involved with performance evaluation in handling SFP.","PeriodicalId":247679,"journal":{"name":"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","volume":"291 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HORA58378.2023.10156708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In order to improve software dependability, Software Fault Prediction (SFP) has become an important research topic in the area of software engineering. To improve program dependability, program defect predictions are being utilized to aid developers in anticipating prospective issues and optimizing testing resources. As a result of this method, the amount of software defects may be forecast, and software testing resources are directed toward the software modules that have the greatest issues, enabling the defects to be fixed as soon as possible. As a result, this paper handles the issue related for SFP based on using a dataset known as JM1 provided by NASA, with 21 features. In this study, several Machine Learning (ML) techniques will be studied, which include Logistic Regression (LR), Random Forest (RF), Naive Bias (NB), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) with three distance metric, Decision Tree (DT). Three cases of normalization will be involved with investigation which are the without sampling, Random over Sample and the SMOTE. Performance evaluation will be based on various parameters such as the ACC, Recall, Precision, and F1-Score. Results obtained indicate that RF achieve the higher ACC with values of 0.81%, 0.92%, and 0.88% respectively. The comprehensive findings of this study may be utilized as a baseline for subsequent studies, allowing any claim of improved prediction using any new approach, model, or framework to be compared and confirmed. In future, the variation of feature number will be involved with performance evaluation in handling SFP.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于NASA数据集的各种机器学习技术在软件故障预测中的性能评估
为了提高软件可靠性,软件故障预测(SFP)已成为软件工程领域的一个重要研究课题。为了提高程序的可靠性,程序缺陷预测被用来帮助开发人员预测潜在的问题并优化测试资源。由于这种方法,可以预测软件缺陷的数量,并且软件测试资源被指向具有最大问题的软件模块,从而使缺陷能够尽快被修复。因此,本文基于NASA提供的JM1数据集处理SFP相关问题,该数据集具有21个特征。在本研究中,将研究几种机器学习(ML)技术,包括逻辑回归(LR),随机森林(RF),朴素偏差(NB),支持向量机(SVM),具有三个距离度量的k -最近邻(KNN),决策树(DT)。调查将涉及三种归一化情况,即无抽样、随机抽样和SMOTE。绩效评估将基于各种参数,如ACC,召回率,精度和f1分数。结果表明,RF达到较高的ACC值,分别为0.81%、0.92%和0.88%。本研究的综合发现可作为后续研究的基线,允许使用任何新方法、模型或框架改进预测的任何主张进行比较和确认。在未来,在处理SFP时,特性数的变化将涉及到性能评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Classification of Urban Sounds with PSO and WO Based Feature Selection Methods Modeling a system determining the fastest way to get from one point to another by public transport NNA and Activation Equation-Based Prediction of New COVID-19 Infections Plaka tanıma sistemleri ve hibrit bir sistem önerisi Color Image Encryption Using a Sine Variation of the Logistic Map for S-Box and Key Generation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1