用度量选择和平衡方法进行跨项目缺陷预测

IF 0.5 Q4 COMPUTER SCIENCE, THEORY & METHODS Applied Computer Systems Pub Date : 2022-12-01 DOI:10.2478/acss-2022-0015
Meetesh Nevendra, Pradeep Singh
{"title":"用度量选择和平衡方法进行跨项目缺陷预测","authors":"Meetesh Nevendra, Pradeep Singh","doi":"10.2478/acss-2022-0015","DOIUrl":null,"url":null,"abstract":"Abstract In software development, defects influence the quality and cost in an undesirable way. Software defect prediction (SDP) is one of the techniques which improves the software quality and testing efficiency by early identification of defects(bug/fault/error). Thus, several experiments have been suggested for defect prediction (DP) techniques. Mainly DP method utilises historical project data for constructing prediction models. SDP performs well within projects until there is an adequate amount of data accessible to train the models. However, if the data are inadequate or limited for the same project, the researchers mainly use Cross-Project Defect Prediction (CPDP). CPDP is a possible alternative option that refers to anticipating defects using prediction models built on historical data from other projects. CPDP is challenging due to its data distribution and domain difference problem. The proposed framework is an effective two-stage approach for CPDP, i.e., model generation and prediction process. In model generation phase, the conglomeration of different pre-processing, including feature selection and class reweights technique, is used to improve the initial data quality. Finally, a fine-tuned efficient bagging and boosting based hybrid ensemble model is developed, which avoids model over -fitting/under-fitting and helps enhance the prediction performance. In the prediction process phase, the generated model predicts the historical data from other projects, which has defects or clean. The framework is evaluated using25 software projects obtained from public repositories. The result analysis shows that the proposed model has achieved a 0.71±0.03 f1-score, which significantly improves the state-of-the-art approaches by 23 % to 60 %.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"58 1","pages":"137 - 148"},"PeriodicalIF":0.5000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Project Defect Prediction with Metrics Selection and Balancing Approach\",\"authors\":\"Meetesh Nevendra, Pradeep Singh\",\"doi\":\"10.2478/acss-2022-0015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In software development, defects influence the quality and cost in an undesirable way. Software defect prediction (SDP) is one of the techniques which improves the software quality and testing efficiency by early identification of defects(bug/fault/error). Thus, several experiments have been suggested for defect prediction (DP) techniques. Mainly DP method utilises historical project data for constructing prediction models. SDP performs well within projects until there is an adequate amount of data accessible to train the models. However, if the data are inadequate or limited for the same project, the researchers mainly use Cross-Project Defect Prediction (CPDP). CPDP is a possible alternative option that refers to anticipating defects using prediction models built on historical data from other projects. CPDP is challenging due to its data distribution and domain difference problem. The proposed framework is an effective two-stage approach for CPDP, i.e., model generation and prediction process. In model generation phase, the conglomeration of different pre-processing, including feature selection and class reweights technique, is used to improve the initial data quality. Finally, a fine-tuned efficient bagging and boosting based hybrid ensemble model is developed, which avoids model over -fitting/under-fitting and helps enhance the prediction performance. In the prediction process phase, the generated model predicts the historical data from other projects, which has defects or clean. The framework is evaluated using25 software projects obtained from public repositories. The result analysis shows that the proposed model has achieved a 0.71±0.03 f1-score, which significantly improves the state-of-the-art approaches by 23 % to 60 %.\",\"PeriodicalId\":41960,\"journal\":{\"name\":\"Applied Computer Systems\",\"volume\":\"58 1\",\"pages\":\"137 - 148\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/acss-2022-0015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/acss-2022-0015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

摘要在软件开发中,缺陷对软件的质量和成本产生了不利的影响。软件缺陷预测(SDP)是一种通过早期识别缺陷(bug/fault/error)来提高软件质量和测试效率的技术。因此,有几个实验建议缺陷预测(DP)技术。DP方法主要是利用历史工程数据构建预测模型。SDP在项目中表现良好,直到有足够数量的可访问数据来训练模型。然而,如果同一项目的数据不充分或有限,研究人员主要使用跨项目缺陷预测(CPDP)。CPDP是一种可能的替代选择,它指的是使用基于其他项目的历史数据构建的预测模型来预测缺陷。CPDP由于其数据分布和领域差异问题而具有挑战性。提出的框架是一种有效的两阶段CPDP方法,即模型生成和预测过程。在模型生成阶段,采用特征选择和类重权技术等多种预处理技术的组合,提高初始数据质量。最后,建立了一种基于微调的高效套袋和增压混合集成模型,避免了模型的过拟合/欠拟合,提高了预测性能。在预测过程阶段,生成的模型预测来自其他项目的历史数据,这些数据有缺陷或干净。该框架使用从公共存储库获得的25个软件项目进行评估。结果分析表明,所提出的模型达到了0.71±0.03 f1得分,显著提高了目前最先进的方法23%至60%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Cross-Project Defect Prediction with Metrics Selection and Balancing Approach
Abstract In software development, defects influence the quality and cost in an undesirable way. Software defect prediction (SDP) is one of the techniques which improves the software quality and testing efficiency by early identification of defects(bug/fault/error). Thus, several experiments have been suggested for defect prediction (DP) techniques. Mainly DP method utilises historical project data for constructing prediction models. SDP performs well within projects until there is an adequate amount of data accessible to train the models. However, if the data are inadequate or limited for the same project, the researchers mainly use Cross-Project Defect Prediction (CPDP). CPDP is a possible alternative option that refers to anticipating defects using prediction models built on historical data from other projects. CPDP is challenging due to its data distribution and domain difference problem. The proposed framework is an effective two-stage approach for CPDP, i.e., model generation and prediction process. In model generation phase, the conglomeration of different pre-processing, including feature selection and class reweights technique, is used to improve the initial data quality. Finally, a fine-tuned efficient bagging and boosting based hybrid ensemble model is developed, which avoids model over -fitting/under-fitting and helps enhance the prediction performance. In the prediction process phase, the generated model predicts the historical data from other projects, which has defects or clean. The framework is evaluated using25 software projects obtained from public repositories. The result analysis shows that the proposed model has achieved a 0.71±0.03 f1-score, which significantly improves the state-of-the-art approaches by 23 % to 60 %.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Computer Systems
Applied Computer Systems COMPUTER SCIENCE, THEORY & METHODS-
自引率
10.00%
发文量
9
审稿时长
30 weeks
期刊最新文献
Multimodal Biometric System Based on the Fusion in Score of Fingerprint and Online Handwritten Signature Multichannel Approach for Sentiment Analysis Using Stack of Neural Network with Lexicon Based Padding and Attention Mechanism BRS-based Model for the Specification of Multi-view Point Ontology Empirical Analysis of Supervised and Unsupervised Machine Learning Algorithms with Aspect-Based Sentiment Analysis Approximate Nearest Neighbour-based Index Tree: A Case Study for Instrumental Music Search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1