Combining CNN with DS3 for Detecting Bug-prone Modules in Cross-version Projects

Andrea Fiore, Alfonso Russo, C. Gravino, M. Risi
{"title":"Combining CNN with DS3 for Detecting Bug-prone Modules in Cross-version Projects","authors":"Andrea Fiore, Alfonso Russo, C. Gravino, M. Risi","doi":"10.1109/SEAA53835.2021.00021","DOIUrl":null,"url":null,"abstract":"The paper focuses on Cross-Version Defect Prediction (CVDP) where the classification model is trained on information of the prior version and then tested to predict defects in the components of the last release. To avoid the distribution differences which could negatively impact the performances of machine learning based model, we consider Dissimilarity-based Sparse Subset Selection (DS3) technique for selecting meaningful representatives to be included in the training set. Furthermore, we employ a Convolutional Neural Network (CNN) to generate structural and semantic features to be merged with the traditional software measures to obtain a more comprehensive list of predictors. To evaluate the usefulness of our proposal for the CVDP scenario, we perform an empirical study on a total of 20 cross-version pairs from 10 different software projects. To build prediction models we consider Logistic Regression (LR) and Random Forest (RF) and we adopt 3 evaluation criteria (i.e., F-measure, G-mean, Balance) to assess the prediction accuracy. Our results show that the use of CNN with both LR and RF models has a significant impact, with an improvement of ∼20% for each evaluation criteria. Differently, we notice that DS3 does not impact significantly in improving prediction accuracy.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEAA53835.2021.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The paper focuses on Cross-Version Defect Prediction (CVDP) where the classification model is trained on information of the prior version and then tested to predict defects in the components of the last release. To avoid the distribution differences which could negatively impact the performances of machine learning based model, we consider Dissimilarity-based Sparse Subset Selection (DS3) technique for selecting meaningful representatives to be included in the training set. Furthermore, we employ a Convolutional Neural Network (CNN) to generate structural and semantic features to be merged with the traditional software measures to obtain a more comprehensive list of predictors. To evaluate the usefulness of our proposal for the CVDP scenario, we perform an empirical study on a total of 20 cross-version pairs from 10 different software projects. To build prediction models we consider Logistic Regression (LR) and Random Forest (RF) and we adopt 3 evaluation criteria (i.e., F-measure, G-mean, Balance) to assess the prediction accuracy. Our results show that the use of CNN with both LR and RF models has a significant impact, with an improvement of ∼20% for each evaluation criteria. Differently, we notice that DS3 does not impact significantly in improving prediction accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结合CNN和DS3检测跨版本项目中容易出错的模块
本文的重点是跨版本缺陷预测(CVDP),其中分类模型是在先前版本的信息上训练的,然后测试以预测上一个版本组件中的缺陷。为了避免可能对基于机器学习的模型的性能产生负面影响的分布差异,我们考虑了基于不相似度的稀疏子集选择(DS3)技术来选择包含在训练集中的有意义的代表。此外,我们使用卷积神经网络(CNN)来生成结构和语义特征,并与传统的软件度量合并,以获得更全面的预测因子列表。为了评估我们的建议对CVDP场景的有用性,我们对来自10个不同软件项目的总共20个跨版本对进行了实证研究。为了建立预测模型,我们考虑了Logistic回归(LR)和随机森林(RF),并采用3个评价标准(即F-measure, G-mean, Balance)来评估预测准确性。我们的研究结果表明,将CNN与LR和RF模型一起使用具有显著的影响,每个评估标准都提高了约20%。不同的是,我们注意到DS3对提高预测精度没有显著影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Systematic Study as Foundation for a Variability Modeling Body of Knowledge Technical Debt Impacting Lead-Times: An Exploratory Study Combining CNN with DS3 for Detecting Bug-prone Modules in Cross-version Projects Towards MLOps: A Framework and Maturity Model An Approach for Ranking Feature-based Clustering Methods and its Application in Multi-System Infrastructure Monitoring
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1