MASTER：用于多源跨项目缺陷预测的多源转移加权集合学习

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-03-25 DOI:10.1109/TSE.2024.3381235

Haonan Tong;Dalin Zhang;Jiqiang Liu;Weiwei Xing;Lingyun Lu;Wei Lu;Yumei Wu

{"title":"MASTER：用于多源跨项目缺陷预测的多源转移加权集合学习","authors":"Haonan Tong;Dalin Zhang;Jiqiang Liu;Weiwei Xing;Lingyun Lu;Wei Lu;Yumei Wu","doi":"10.1109/TSE.2024.3381235","DOIUrl":null,"url":null,"abstract":"Multi-source cross-project defect prediction (MSCPDP) attempts to transfer defect knowledge learned from multiple source projects to the target project. MSCPDP has drawn increasing attention from academic and industry communities owing to its advantages compared with single-source cross-project defect prediction (SSCPDP). However, two main problems, which are how to effectively extract the transferable knowledge from each source dataset and how to measure the amount of knowledge transferred from each source dataset to the target dataset, seriously restrict the performance of existing MSCPDP models. In this paper, we propose a novel \nm\nulti-source tr\na\nn\ns\nfer weigh\nt\ned \ne\nnsemble lea\nr\nning (MASTER) method for MSCPDP. MASTER measures the weight of each source dataset based on feature importance and distribution difference and then extracts the transferable knowledge based on the proposed feature-weighted transfer learning algorithm. Experiments are performed on 30 software projects. We compare MASTER with the latest state-of-the-art MSCPDP methods with statistical test in terms of famous effort-unaware measures (i.e., PD, PF, AUC, and MCC) and two widely used effort-aware measures (\n<inline-formula><tex-math>$P_{opt}20\\%$</tex-math></inline-formula>\n and IFA). The experiment results show that: 1) MASTER can substantially improve the prediction performance compared with the baselines, e.g., an improvement of at least 49.1% in MCC, 48.1% in IFA; 2) MASTER significantly outperforms each baseline on most datasets in terms of AUC, MCC, \n<inline-formula><tex-math>$P_{opt}20\\%$</tex-math></inline-formula>\n and IFA; 3) MSCPDP model significantly performs better than the mean case of SSCPDP model on most datasets and even outperforms the best case of SSCPDP on some datasets. It can be concluded that 1) it is very necessary to conduct MSCPDP, and 2) the proposed MASTER is a more promising alternative for MSCPDP.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 5","pages":"1281-1305"},"PeriodicalIF":5.6000,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MASTER: Multi-Source Transfer Weighted Ensemble Learning for Multiple Sources Cross-Project Defect Prediction\",\"authors\":\"Haonan Tong;Dalin Zhang;Jiqiang Liu;Weiwei Xing;Lingyun Lu;Wei Lu;Yumei Wu\",\"doi\":\"10.1109/TSE.2024.3381235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-source cross-project defect prediction (MSCPDP) attempts to transfer defect knowledge learned from multiple source projects to the target project. MSCPDP has drawn increasing attention from academic and industry communities owing to its advantages compared with single-source cross-project defect prediction (SSCPDP). However, two main problems, which are how to effectively extract the transferable knowledge from each source dataset and how to measure the amount of knowledge transferred from each source dataset to the target dataset, seriously restrict the performance of existing MSCPDP models. In this paper, we propose a novel \\nm\\nulti-source tr\\na\\nn\\ns\\nfer weigh\\nt\\ned \\ne\\nnsemble lea\\nr\\nning (MASTER) method for MSCPDP. MASTER measures the weight of each source dataset based on feature importance and distribution difference and then extracts the transferable knowledge based on the proposed feature-weighted transfer learning algorithm. Experiments are performed on 30 software projects. We compare MASTER with the latest state-of-the-art MSCPDP methods with statistical test in terms of famous effort-unaware measures (i.e., PD, PF, AUC, and MCC) and two widely used effort-aware measures (\\n<inline-formula><tex-math>$P_{opt}20\\\\%$</tex-math></inline-formula>\\n and IFA). The experiment results show that: 1) MASTER can substantially improve the prediction performance compared with the baselines, e.g., an improvement of at least 49.1% in MCC, 48.1% in IFA; 2) MASTER significantly outperforms each baseline on most datasets in terms of AUC, MCC, \\n<inline-formula><tex-math>$P_{opt}20\\\\%$</tex-math></inline-formula>\\n and IFA; 3) MSCPDP model significantly performs better than the mean case of SSCPDP model on most datasets and even outperforms the best case of SSCPDP on some datasets. It can be concluded that 1) it is very necessary to conduct MSCPDP, and 2) the proposed MASTER is a more promising alternative for MSCPDP.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"50 5\",\"pages\":\"1281-1305\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10479078/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10479078/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

多源跨项目缺陷预测（MSCPDP）试图将从多个源项目中学到的缺陷知识转移到目标项目中。与单源跨项目缺陷预测（SSCPDP）相比，MSCPDP 的优势日益受到学术界和工业界的关注。然而，如何有效地从每个源数据集中提取可转移知识以及如何测量从每个源数据集转移到目标数据集的知识量这两个主要问题严重制约了现有 MSCPDP 模型的性能。在本文中，我们为 MSCPDP 提出了一种新颖的多源转移加权集合学习（MASTER）方法。MASTER 根据特征的重要性和分布差异来衡量每个源数据集的权重，然后根据提出的特征加权转移学习算法提取可转移的知识。我们在 30 个软件项目上进行了实验。我们将 MASTER 与最新的最先进的 MSCPDP 方法进行了比较，并在著名的不感知努力度量（即 PD、PF、AUC 和 MCC）和两种广泛使用的感知努力度量（$P_{opt}20\%$ 和 IFA）方面进行了统计检验。实验结果表明1）与基线相比，MASTER 可以大幅提高预测性能，例如，MCC 至少提高了 49.1%，IFA 至少提高了 48.1%；2）在大多数数据集上，MASTER 在 AUC、MCC、$P_{opt}20\%$ 和 IFA 方面的表现明显优于各基线；3）在大多数数据集上，MSCPDP 模型的表现明显优于 SSCPDP 模型的平均值，在某些数据集上甚至优于 SSCPDP 的最佳值。由此可以得出结论：1）进行 MSCPDP 非常必要；2）提议的 MASTER 是 MSCPDP 更有前途的替代方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MASTER: Multi-Source Transfer Weighted Ensemble Learning for Multiple Sources Cross-Project Defect Prediction

Multi-source cross-project defect prediction (MSCPDP) attempts to transfer defect knowledge learned from multiple source projects to the target project. MSCPDP has drawn increasing attention from academic and industry communities owing to its advantages compared with single-source cross-project defect prediction (SSCPDP). However, two main problems, which are how to effectively extract the transferable knowledge from each source dataset and how to measure the amount of knowledge transferred from each source dataset to the target dataset, seriously restrict the performance of existing MSCPDP models. In this paper, we propose a novel m ulti-source tr a n s fer weigh t ed e nsemble lea r ning (MASTER) method for MSCPDP. MASTER measures the weight of each source dataset based on feature importance and distribution difference and then extracts the transferable knowledge based on the proposed feature-weighted transfer learning algorithm. Experiments are performed on 30 software projects. We compare MASTER with the latest state-of-the-art MSCPDP methods with statistical test in terms of famous effort-unaware measures (i.e., PD, PF, AUC, and MCC) and two widely used effort-aware measures (

$P_{opt}20\%$

and IFA). The experiment results show that: 1) MASTER can substantially improve the prediction performance compared with the baselines, e.g., an improvement of at least 49.1% in MCC, 48.1% in IFA; 2) MASTER significantly outperforms each baseline on most datasets in terms of AUC, MCC,

$P_{opt}20\%$

and IFA; 3) MSCPDP model significantly performs better than the mean case of SSCPDP model on most datasets and even outperforms the best case of SSCPDP on some datasets. It can be concluded that 1) it is very necessary to conduct MSCPDP, and 2) the proposed MASTER is a more promising alternative for MSCPDP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.