A novel instance-based method for cross-project just-in-time defect prediction

Software: Practice and Experience Pub Date : 2024-01-24 DOI:10.1002/spe.3316

Xiaoyan Zhu, Tian Qiu, Jiayin Wang, Xin Lai

{"title":"A novel instance-based method for cross-project just-in-time defect prediction","authors":"Xiaoyan Zhu, Tian Qiu, Jiayin Wang, Xin Lai","doi":"10.1002/spe.3316","DOIUrl":null,"url":null,"abstract":"Cross-project (CP) just-in-time software defect prediction (JIT-SDP) uses CP data to overcome initial data scarcity for training high-performing JIT-SDP classifiers in the early stages of software projects. The primary challenge faced by JIT-SDP in a cross-project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross-project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open-source projects, reveal that the ISKMM algorithm outperforms existing CP single-source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross-project data demonstrate an overall performance comparable to predictors learned from within-project data.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"122 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cross-project (CP) just-in-time software defect prediction (JIT-SDP) uses CP data to overcome initial data scarcity for training high-performing JIT-SDP classifiers in the early stages of software projects. The primary challenge faced by JIT-SDP in a cross-project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross-project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open-source projects, reveal that the ISKMM algorithm outperforms existing CP single-source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross-project data demonstrate an overall performance comparable to predictors learned from within-project data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于实例的跨项目及时缺陷预测新方法

跨项目（CP）及时软件缺陷预测（JIT-SDP）利用 CP 数据克服初始数据稀缺的问题，在软件项目的早期阶段训练高性能的 JIT-SDP 分类器。JIT-SDP 在跨项目背景下面临的主要挑战在于训练数据和测试数据之间的不同分布。为了解决这个问题，我们选择了与目标数据非常相似的源数据实例来构建分类器。软件数据集通常会表现出类不平衡问题，即缺陷类与干净类的比例明显偏低。这种不平衡通常会降低分类器的性能。在本研究中，我们提出了一种利用核均值匹配（ISKMM）的实例选择方法，该方法能同时解决跨项目缺陷预测（CPDP）中的知识转移和类不平衡问题。该方法采用核均值匹配（KMM）技术来评估训练数据和目标数据之间的相似性。它选择具有高相似性的实例，保留它们，并根据相似性加权对数据进行重新采样，以缓解类不平衡问题。我们在 10 个开源项目上进行的实验表明，ISKMM 算法优于现有的 CP 单源软件缺陷预测 (SDP) 算法。此外，在使用所提出的算法时，从跨项目数据构建的缺陷预测器的整体性能可与从项目内数据学习的预测器相媲美。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Software: Practice and Experience

自引率

0.00%

发文量