集成式半监督软件缺陷预测模型

網際網路技術學刊 Pub Date : 2023-11-01 DOI:10.53106/160792642023112406013

Fanqi Meng Fanqi Meng, Wenying Cheng Fanqi Meng, Jingdong Wang Wenying Cheng

{"title":"集成式半监督软件缺陷预测模型","authors":"Fanqi Meng Fanqi Meng, Wenying Cheng Fanqi Meng, Jingdong Wang Wenying Cheng","doi":"10.53106/160792642023112406013","DOIUrl":null,"url":null,"abstract":"A novel semi-supervised software defect prediction model FFeSSTri (Filtered Feature Selecting, Sample and Tri-training) is proposed to address the problem that class imbalance and too many irrelevant or redundant features in labelled samples lower the accuracy of semi-supervised software defect prediction. Its innovation lies in that the construction of FFeSSTri integrates an oversampling technique, a new feature selection method, and a Tri-training algorithm, thus it can effectively improve the accuracy. Firstly, the oversampling technique is applied to expand the class of inadequate samples, thus it solves the unbalanced classification of the labelled samples. Secondly, a new filtered feature selection method based on relevance and redundancy is proposed, which can exclude those irrelevant or redundant features from labelled samples. Finally, the Tri-training algorithm is used to learn the labelled training samples to build the defect prediction model FFeSSTri. The experiments conducted on the NASA software defect prediction dataset show that FFeSSTri outperforms the existing four supervised learning methods and one semi-supervised learning method in terms of F-Measure values and AUC values.","PeriodicalId":442331,"journal":{"name":"網際網路技術學刊","volume":"23 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Integrated Semi-supervised Software Defect Prediction Model\",\"authors\":\"Fanqi Meng Fanqi Meng, Wenying Cheng Fanqi Meng, Jingdong Wang Wenying Cheng\",\"doi\":\"10.53106/160792642023112406013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel semi-supervised software defect prediction model FFeSSTri (Filtered Feature Selecting, Sample and Tri-training) is proposed to address the problem that class imbalance and too many irrelevant or redundant features in labelled samples lower the accuracy of semi-supervised software defect prediction. Its innovation lies in that the construction of FFeSSTri integrates an oversampling technique, a new feature selection method, and a Tri-training algorithm, thus it can effectively improve the accuracy. Firstly, the oversampling technique is applied to expand the class of inadequate samples, thus it solves the unbalanced classification of the labelled samples. Secondly, a new filtered feature selection method based on relevance and redundancy is proposed, which can exclude those irrelevant or redundant features from labelled samples. Finally, the Tri-training algorithm is used to learn the labelled training samples to build the defect prediction model FFeSSTri. The experiments conducted on the NASA software defect prediction dataset show that FFeSSTri outperforms the existing four supervised learning methods and one semi-supervised learning method in terms of F-Measure values and AUC values.\",\"PeriodicalId\":442331,\"journal\":{\"name\":\"網際網路技術學刊\",\"volume\":\"23 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"網際網路技術學刊\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.53106/160792642023112406013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"網際網路技術學刊","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53106/160792642023112406013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

针对半监督软件缺陷预测中存在的类不平衡、标记样本中不相关或冗余特征过多等问题，提出了一种新型半监督软件缺陷预测模型 FFeSSTri（过滤特征选择、样本和三训练）。其创新之处在于 FFeSSTri 的构建集成了一种超采样技术、一种新的特征选择方法和一种 Tri-training 算法，因此能有效提高预测精度。首先，超采样技术用于扩大样本不足的类别，从而解决了标签样本分类不均衡的问题。其次，提出了一种基于相关性和冗余性的新过滤特征选择方法，可以从标记样本中排除那些不相关或冗余的特征。最后，使用 Tri-training 算法来学习标注的训练样本，从而建立缺陷预测模型 FFeSSTri。在 NASA 软件缺陷预测数据集上进行的实验表明，就 F-Measure 值和 AUC 值而言，FFeSSTri 优于现有的四种监督学习方法和一种半监督学习方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Integrated Semi-supervised Software Defect Prediction Model

A novel semi-supervised software defect prediction model FFeSSTri (Filtered Feature Selecting, Sample and Tri-training) is proposed to address the problem that class imbalance and too many irrelevant or redundant features in labelled samples lower the accuracy of semi-supervised software defect prediction. Its innovation lies in that the construction of FFeSSTri integrates an oversampling technique, a new feature selection method, and a Tri-training algorithm, thus it can effectively improve the accuracy. Firstly, the oversampling technique is applied to expand the class of inadequate samples, thus it solves the unbalanced classification of the labelled samples. Secondly, a new filtered feature selection method based on relevance and redundancy is proposed, which can exclude those irrelevant or redundant features from labelled samples. Finally, the Tri-training algorithm is used to learn the labelled training samples to build the defect prediction model FFeSSTri. The experiments conducted on the NASA software defect prediction dataset show that FFeSSTri outperforms the existing four supervised learning methods and one semi-supervised learning method in terms of F-Measure values and AUC values.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

網際網路技術學刊

自引率

0.00%

发文量