用度量选择和平衡方法进行跨项目缺陷预测

IF 0.8 Q4 COMPUTER SCIENCE, THEORY & METHODS Applied Computer Systems Pub Date : 2022-12-01 DOI:10.2478/acss-2022-0015

Meetesh Nevendra, Pradeep Singh

{"title":"用度量选择和平衡方法进行跨项目缺陷预测","authors":"Meetesh Nevendra, Pradeep Singh","doi":"10.2478/acss-2022-0015","DOIUrl":null,"url":null,"abstract":"Abstract In software development, defects influence the quality and cost in an undesirable way. Software defect prediction (SDP) is one of the techniques which improves the software quality and testing efficiency by early identification of defects(bug/fault/error). Thus, several experiments have been suggested for defect prediction (DP) techniques. Mainly DP method utilises historical project data for constructing prediction models. SDP performs well within projects until there is an adequate amount of data accessible to train the models. However, if the data are inadequate or limited for the same project, the researchers mainly use Cross-Project Defect Prediction (CPDP). CPDP is a possible alternative option that refers to anticipating defects using prediction models built on historical data from other projects. CPDP is challenging due to its data distribution and domain difference problem. The proposed framework is an effective two-stage approach for CPDP, i.e., model generation and prediction process. In model generation phase, the conglomeration of different pre-processing, including feature selection and class reweights technique, is used to improve the initial data quality. Finally, a fine-tuned efficient bagging and boosting based hybrid ensemble model is developed, which avoids model over -fitting/under-fitting and helps enhance the prediction performance. In the prediction process phase, the generated model predicts the historical data from other projects, which has defects or clean. The framework is evaluated using25 software projects obtained from public repositories. The result analysis shows that the proposed model has achieved a 0.71±0.03 f1-score, which significantly improves the state-of-the-art approaches by 23 % to 60 %.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"58 1","pages":"137 - 148"},"PeriodicalIF":0.8000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Project Defect Prediction with Metrics Selection and Balancing Approach\",\"authors\":\"Meetesh Nevendra, Pradeep Singh\",\"doi\":\"10.2478/acss-2022-0015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In software development, defects influence the quality and cost in an undesirable way. Software defect prediction (SDP) is one of the techniques which improves the software quality and testing efficiency by early identification of defects(bug/fault/error). Thus, several experiments have been suggested for defect prediction (DP) techniques. Mainly DP method utilises historical project data for constructing prediction models. SDP performs well within projects until there is an adequate amount of data accessible to train the models. However, if the data are inadequate or limited for the same project, the researchers mainly use Cross-Project Defect Prediction (CPDP). CPDP is a possible alternative option that refers to anticipating defects using prediction models built on historical data from other projects. CPDP is challenging due to its data distribution and domain difference problem. The proposed framework is an effective two-stage approach for CPDP, i.e., model generation and prediction process. In model generation phase, the conglomeration of different pre-processing, including feature selection and class reweights technique, is used to improve the initial data quality. Finally, a fine-tuned efficient bagging and boosting based hybrid ensemble model is developed, which avoids model over -fitting/under-fitting and helps enhance the prediction performance. In the prediction process phase, the generated model predicts the historical data from other projects, which has defects or clean. The framework is evaluated using25 software projects obtained from public repositories. The result analysis shows that the proposed model has achieved a 0.71±0.03 f1-score, which significantly improves the state-of-the-art approaches by 23 % to 60 %.\",\"PeriodicalId\":41960,\"journal\":{\"name\":\"Applied Computer Systems\",\"volume\":\"58 1\",\"pages\":\"137 - 148\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/acss-2022-0015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/acss-2022-0015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

摘要在软件开发中，缺陷对软件的质量和成本产生了不利的影响。软件缺陷预测(SDP)是一种通过早期识别缺陷(bug/fault/error)来提高软件质量和测试效率的技术。因此，有几个实验建议缺陷预测(DP)技术。DP方法主要是利用历史工程数据构建预测模型。SDP在项目中表现良好，直到有足够数量的可访问数据来训练模型。然而，如果同一项目的数据不充分或有限，研究人员主要使用跨项目缺陷预测(CPDP)。CPDP是一种可能的替代选择，它指的是使用基于其他项目的历史数据构建的预测模型来预测缺陷。CPDP由于其数据分布和领域差异问题而具有挑战性。提出的框架是一种有效的两阶段CPDP方法，即模型生成和预测过程。在模型生成阶段，采用特征选择和类重权技术等多种预处理技术的组合，提高初始数据质量。最后，建立了一种基于微调的高效套袋和增压混合集成模型，避免了模型的过拟合/欠拟合，提高了预测性能。在预测过程阶段，生成的模型预测来自其他项目的历史数据，这些数据有缺陷或干净。该框架使用从公共存储库获得的25个软件项目进行评估。结果分析表明，所提出的模型达到了0.71±0.03 f1得分，显著提高了目前最先进的方法23%至60%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Cross-Project Defect Prediction with Metrics Selection and Balancing Approach

Abstract In software development, defects influence the quality and cost in an undesirable way. Software defect prediction (SDP) is one of the techniques which improves the software quality and testing efficiency by early identification of defects(bug/fault/error). Thus, several experiments have been suggested for defect prediction (DP) techniques. Mainly DP method utilises historical project data for constructing prediction models. SDP performs well within projects until there is an adequate amount of data accessible to train the models. However, if the data are inadequate or limited for the same project, the researchers mainly use Cross-Project Defect Prediction (CPDP). CPDP is a possible alternative option that refers to anticipating defects using prediction models built on historical data from other projects. CPDP is challenging due to its data distribution and domain difference problem. The proposed framework is an effective two-stage approach for CPDP, i.e., model generation and prediction process. In model generation phase, the conglomeration of different pre-processing, including feature selection and class reweights technique, is used to improve the initial data quality. Finally, a fine-tuned efficient bagging and boosting based hybrid ensemble model is developed, which avoids model over -fitting/under-fitting and helps enhance the prediction performance. In the prediction process phase, the generated model predicts the historical data from other projects, which has defects or clean. The framework is evaluated using25 software projects obtained from public repositories. The result analysis shows that the proposed model has achieved a 0.71±0.03 f1-score, which significantly improves the state-of-the-art approaches by 23 % to 60 %.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Computer Systems COMPUTER SCIENCE, THEORY & METHODS-

自引率

10.00%

发文量

审稿时长

30 weeks