A novel instance-based method for cross-project just-in-time defect prediction

Xiaoyan Zhu, Tian Qiu, Jiayin Wang, Xin Lai
{"title":"A novel instance-based method for cross-project just-in-time defect prediction","authors":"Xiaoyan Zhu, Tian Qiu, Jiayin Wang, Xin Lai","doi":"10.1002/spe.3316","DOIUrl":null,"url":null,"abstract":"Cross-project (CP) just-in-time software defect prediction (JIT-SDP) uses CP data to overcome initial data scarcity for training high-performing JIT-SDP classifiers in the early stages of software projects. The primary challenge faced by JIT-SDP in a cross-project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross-project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open-source projects, reveal that the ISKMM algorithm outperforms existing CP single-source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross-project data demonstrate an overall performance comparable to predictors learned from within-project data.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cross-project (CP) just-in-time software defect prediction (JIT-SDP) uses CP data to overcome initial data scarcity for training high-performing JIT-SDP classifiers in the early stages of software projects. The primary challenge faced by JIT-SDP in a cross-project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross-project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open-source projects, reveal that the ISKMM algorithm outperforms existing CP single-source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross-project data demonstrate an overall performance comparable to predictors learned from within-project data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于实例的跨项目及时缺陷预测新方法
跨项目(CP)及时软件缺陷预测(JIT-SDP)利用 CP 数据克服初始数据稀缺的问题,在软件项目的早期阶段训练高性能的 JIT-SDP 分类器。JIT-SDP 在跨项目背景下面临的主要挑战在于训练数据和测试数据之间的不同分布。为了解决这个问题,我们选择了与目标数据非常相似的源数据实例来构建分类器。软件数据集通常会表现出类不平衡问题,即缺陷类与干净类的比例明显偏低。这种不平衡通常会降低分类器的性能。在本研究中,我们提出了一种利用核均值匹配(ISKMM)的实例选择方法,该方法能同时解决跨项目缺陷预测(CPDP)中的知识转移和类不平衡问题。该方法采用核均值匹配(KMM)技术来评估训练数据和目标数据之间的相似性。它选择具有高相似性的实例,保留它们,并根据相似性加权对数据进行重新采样,以缓解类不平衡问题。我们在 10 个开源项目上进行的实验表明,ISKMM 算法优于现有的 CP 单源软件缺陷预测 (SDP) 算法。此外,在使用所提出的算法时,从跨项目数据构建的缺陷预测器的整体性能可与从项目内数据学习的预测器相媲美。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Algorithms for generating small random samples A comprehensive survey of UPPAAL‐assisted formal modeling and verification Large scale system design aided by modelling and DES simulation: A Petri net approach Empowering software startups with agile methods and practices: A design science research Space‐efficient data structures for the inference of subsumption and disjointness relations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1