Software defect prediction using a bidirectional LSTM network combined with oversampling techniques

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2023-10-28 DOI:10.1007/s10586-023-04170-z
Nasraldeen Alnor Adam Khleel, Károly Nehéz
{"title":"Software defect prediction using a bidirectional LSTM network combined with oversampling techniques","authors":"Nasraldeen Alnor Adam Khleel, Károly Nehéz","doi":"10.1007/s10586-023-04170-z","DOIUrl":null,"url":null,"abstract":"Abstract Software defects are a critical issue in software development that can lead to system failures and cause significant financial losses. Predicting software defects is a vital aspect of ensuring software quality. This can significantly impact both saving time and reducing the overall cost of software testing. During the software defect prediction (SDP) process, automated tools attempt to predict defects in the source codes based on software metrics. Several SDP models have been proposed to identify and prevent defects before they occur. In recent years, recurrent neural network (RNN) techniques have gained attention for their ability to handle sequential data and learn complex patterns. Still, these techniques are not always suitable for predicting software defects due to the problem of imbalanced data. To deal with this problem, this study aims to combine a bidirectional long short-term memory (Bi-LSTM) network with oversampling techniques. To establish the effectiveness and efficiency of the proposed model, the experiments have been conducted on benchmark datasets obtained from the PROMISE repository. The experimental results have been compared and evaluated in terms of accuracy, precision, recall, f-measure, Matthew’s correlation coefficient (MCC), the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR) and mean square error (MSE). The average accuracy of the proposed model on the original and balanced datasets (using random oversampling and SMOTE) was 88%, 94%, And 92%, respectively. The results showed that the proposed Bi-LSTM on the balanced datasets (using random oversampling and SMOTE) improves the average accuracy by 6 and 4% compared to the original datasets. The average F-measure of the proposed model on the original and balanced datasets (using random oversampling and SMOTE) were 51%, 94%, And 92%, respectively. The results showed that the proposed Bi-LSTM on the balanced datasets (using random oversampling and SMOTE) improves the average F-measure by 43 and 41% compared to the original datasets. The experimental results demonstrated that combining the Bi-LSTM network with oversampling techniques positively affects defect prediction performance in datasets with imbalanced class distributions.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2023-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10586-023-04170-z","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Software defects are a critical issue in software development that can lead to system failures and cause significant financial losses. Predicting software defects is a vital aspect of ensuring software quality. This can significantly impact both saving time and reducing the overall cost of software testing. During the software defect prediction (SDP) process, automated tools attempt to predict defects in the source codes based on software metrics. Several SDP models have been proposed to identify and prevent defects before they occur. In recent years, recurrent neural network (RNN) techniques have gained attention for their ability to handle sequential data and learn complex patterns. Still, these techniques are not always suitable for predicting software defects due to the problem of imbalanced data. To deal with this problem, this study aims to combine a bidirectional long short-term memory (Bi-LSTM) network with oversampling techniques. To establish the effectiveness and efficiency of the proposed model, the experiments have been conducted on benchmark datasets obtained from the PROMISE repository. The experimental results have been compared and evaluated in terms of accuracy, precision, recall, f-measure, Matthew’s correlation coefficient (MCC), the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR) and mean square error (MSE). The average accuracy of the proposed model on the original and balanced datasets (using random oversampling and SMOTE) was 88%, 94%, And 92%, respectively. The results showed that the proposed Bi-LSTM on the balanced datasets (using random oversampling and SMOTE) improves the average accuracy by 6 and 4% compared to the original datasets. The average F-measure of the proposed model on the original and balanced datasets (using random oversampling and SMOTE) were 51%, 94%, And 92%, respectively. The results showed that the proposed Bi-LSTM on the balanced datasets (using random oversampling and SMOTE) improves the average F-measure by 43 and 41% compared to the original datasets. The experimental results demonstrated that combining the Bi-LSTM network with oversampling techniques positively affects defect prediction performance in datasets with imbalanced class distributions.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结合过采样技术的双向LSTM网络软件缺陷预测
软件缺陷是软件开发中的一个关键问题,它可能导致系统故障并造成重大的经济损失。预测软件缺陷是保证软件质量的一个重要方面。这对节省时间和降低软件测试的总体成本都有很大的影响。在软件缺陷预测(SDP)过程中,自动化工具尝试基于软件度量来预测源代码中的缺陷。已经提出了几个SDP模型来在缺陷发生之前识别和预防缺陷。近年来,递归神经网络(RNN)技术因其处理序列数据和学习复杂模式的能力而受到关注。然而,由于数据不平衡的问题,这些技术并不总是适合于预测软件缺陷。为了解决这一问题,本研究旨在将双向长短期记忆(Bi-LSTM)网络与过采样技术相结合。为了验证所提模型的有效性和效率,在PROMISE存储库中获得的基准数据集上进行了实验。从正确率、精密度、召回率、f-measure、马修相关系数(MCC)、ROC曲线下面积(AUC)、精密度-召回率曲线下面积(AUCPR)和均方误差(MSE)等方面对实验结果进行了比较和评价。该模型在原始数据集和平衡数据集(使用随机过采样和SMOTE)上的平均准确率分别为88%、94%和92%。结果表明,在平衡数据集(使用随机过采样和SMOTE)上提出的Bi-LSTM比原始数据集的平均准确率提高了6%和4%。该模型在原始数据集和平衡数据集(使用随机过采样和SMOTE)上的平均f值分别为51%、94%和92%。结果表明,在平衡数据集(使用随机过采样和SMOTE)上提出的Bi-LSTM与原始数据集相比,平均F-measure分别提高了43%和41%。实验结果表明,将Bi-LSTM网络与过采样技术相结合,对类分布不平衡的数据集的缺陷预测性能有积极的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Hyperbaric oxygen treatment promotes tendon-bone interface healing in a rabbit model of rotator cuff tears. Oxygen-ozone therapy for myocardial ischemic stroke and cardiovascular disorders. Comparative study on the anti-inflammatory and protective effects of different oxygen therapy regimens on lipopolysaccharide-induced acute lung injury in mice. Heme oxygenase/carbon monoxide system and development of the heart. Hyperbaric oxygen for moderate-to-severe traumatic brain injury: outcomes 5-8 years after injury.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1