揭示不同版本中不变模块对项目内缺陷预测模型评估的影响

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Software-Evolution and Process Pub Date : 2024-08-03 DOI:10.1002/smr.2715

Xutong Liu, Yufei Zhou, Zeyu Lu, Yuanqing Mei, Yibiao Yang, Junyan Qian, Yuming Zhou

{"title":"揭示不同版本中不变模块对项目内缺陷预测模型评估的影响","authors":"Xutong Liu, Yufei Zhou, Zeyu Lu, Yuanqing Mei, Yibiao Yang, Junyan Qian, Yuming Zhou","doi":"10.1002/smr.2715","DOIUrl":null,"url":null,"abstract":"BackgroundSoftware defect prediction (SDP) is a topic actively researched in the software engineering community. Within‐project defect prediction (WPDP) involves using labeled modules from previous versions of the same project to train classifiers. Over time, many defect prediction models have been evaluated under the WPDP scenario.ProblemData duplication poses a significant challenge in current WPDP evaluation procedures. Unchanged modules, characterized by identical executable source code, are frequently present in both target and source versions during experimentation. However, it is still unclear how and to what extent the presence of unchanged modules affects the performance assessment of WPDP models and the comparison of multiple WPDP models.MethodIn this paper, we provide a method to detect and remove unchanged modules from defect datasets and unveil the impact of data duplication in WPDP on model evaluation.ResultsThe experiments conducted on 481 target versions from 62 projects provide evidence that data duplication significantly affects the reported performance values of individual learners in WPDP. However, when ranking multiple WPDP models based on prediction performance, the impact of removing unchanged instances is not substantial. Nevertheless, it is important to note that removing unchanged instances does have a slight influence on the selection of models with better generalization.ConclusionWe recommend that future WPDP studies take into consideration the removal of unchanged modules from target versions when evaluating the performance of their models. This practice will enhance the reliability and validity of the results obtained in WPDP research, leading to improved understanding and advancements in defect prediction models.","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"80 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unveiling the impact of unchanged modules across versions on the evaluation of within‐project defect prediction models\",\"authors\":\"Xutong Liu, Yufei Zhou, Zeyu Lu, Yuanqing Mei, Yibiao Yang, Junyan Qian, Yuming Zhou\",\"doi\":\"10.1002/smr.2715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"BackgroundSoftware defect prediction (SDP) is a topic actively researched in the software engineering community. Within‐project defect prediction (WPDP) involves using labeled modules from previous versions of the same project to train classifiers. Over time, many defect prediction models have been evaluated under the WPDP scenario.ProblemData duplication poses a significant challenge in current WPDP evaluation procedures. Unchanged modules, characterized by identical executable source code, are frequently present in both target and source versions during experimentation. However, it is still unclear how and to what extent the presence of unchanged modules affects the performance assessment of WPDP models and the comparison of multiple WPDP models.MethodIn this paper, we provide a method to detect and remove unchanged modules from defect datasets and unveil the impact of data duplication in WPDP on model evaluation.ResultsThe experiments conducted on 481 target versions from 62 projects provide evidence that data duplication significantly affects the reported performance values of individual learners in WPDP. However, when ranking multiple WPDP models based on prediction performance, the impact of removing unchanged instances is not substantial. Nevertheless, it is important to note that removing unchanged instances does have a slight influence on the selection of models with better generalization.ConclusionWe recommend that future WPDP studies take into consideration the removal of unchanged modules from target versions when evaluating the performance of their models. This practice will enhance the reliability and validity of the results obtained in WPDP research, leading to improved understanding and advancements in defect prediction models.\",\"PeriodicalId\":48898,\"journal\":{\"name\":\"Journal of Software-Evolution and Process\",\"volume\":\"80 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Software-Evolution and Process\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1002/smr.2715\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/smr.2715","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

背景软件缺陷预测（SDP）是软件工程界积极研究的一个课题。项目内缺陷预测（WPDP）涉及使用同一项目以前版本的标注模块来训练分类器。随着时间的推移，许多缺陷预测模型都在 WPDP 情景下进行了评估。问题数据重复给当前的 WPDP 评估程序带来了巨大挑战。在实验过程中，目标版本和源代码版本中经常会出现未改变的模块，这些模块的特点是可执行源代码完全相同。方法在本文中，我们提供了一种从缺陷数据集中检测和移除未修改模块的方法，并揭示了 WPDP 中数据重复对模型评估的影响。结果在 62 个项目的 481 个目标版本上进行的实验提供了证据，证明数据重复会显著影响 WPDP 中单个学习者的报告性能值。不过，在根据预测性能对多个 WPDP 模型进行排名时，删除未更改实例的影响并不大。结论我们建议，未来的 WPDP 研究在评估模型性能时，应考虑从目标版本中删除未改变的模块。这种做法将提高 WPDP 研究结果的可靠性和有效性，从而加深对缺陷预测模型的理解并推动缺陷预测模型的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Unveiling the impact of unchanged modules across versions on the evaluation of within‐project defect prediction models

BackgroundSoftware defect prediction (SDP) is a topic actively researched in the software engineering community. Within‐project defect prediction (WPDP) involves using labeled modules from previous versions of the same project to train classifiers. Over time, many defect prediction models have been evaluated under the WPDP scenario.ProblemData duplication poses a significant challenge in current WPDP evaluation procedures. Unchanged modules, characterized by identical executable source code, are frequently present in both target and source versions during experimentation. However, it is still unclear how and to what extent the presence of unchanged modules affects the performance assessment of WPDP models and the comparison of multiple WPDP models.MethodIn this paper, we provide a method to detect and remove unchanged modules from defect datasets and unveil the impact of data duplication in WPDP on model evaluation.ResultsThe experiments conducted on 481 target versions from 62 projects provide evidence that data duplication significantly affects the reported performance values of individual learners in WPDP. However, when ranking multiple WPDP models based on prediction performance, the impact of removing unchanged instances is not substantial. Nevertheless, it is important to note that removing unchanged instances does have a slight influence on the selection of models with better generalization.ConclusionWe recommend that future WPDP studies take into consideration the removal of unchanged modules from target versions when evaluating the performance of their models. This practice will enhance the reliability and validity of the results obtained in WPDP research, leading to improved understanding and advancements in defect prediction models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助