A novel communication-efficient heterogeneous federated positive and unlabeled learning method for credit scoring

IF 4.3 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Operations Research Pub Date : 2025-05-01 Epub Date: 2025-01-18 DOI:10.1016/j.cor.2025.106982
Yongqin Qiu , Yuanxing Chen , Kan Fang , Kuangnan Fang
{"title":"A novel communication-efficient heterogeneous federated positive and unlabeled learning method for credit scoring","authors":"Yongqin Qiu ,&nbsp;Yuanxing Chen ,&nbsp;Kan Fang ,&nbsp;Kuangnan Fang","doi":"10.1016/j.cor.2025.106982","DOIUrl":null,"url":null,"abstract":"<div><div>Customer records include only customers in default (positive samples) and rejected customers (unlabeled samples), or positive and unlabeled (PU) data, which is a common scenario in emerging financial institutions. However, building credit scoring models using multiple small sample PU datasets with high dimensionality poses significant challenges, especially in light of the privacy constraints associated with transferring raw data. To tackle these challenges, this paper introduces a novel methodology called heterogeneous federated PU learning. This approach utilizes a fused penalty function to automatically divide coefficients into multiple clusters, while an efficient proximal gradient descent algorithm is introduced for model training, relying solely on gradients from local servers. Theoretical analysis establishes the oracle property of our proposed estimator. The simulation results show that, in terms of variable selection, parameter estimation, and prediction performance, our method is close to the Oracle estimator and outperforms the other alternatives. Empirical results indicate that our method can improve prediction performance and facilitate the identification of heterogeneity across datasets. Moreover, the estimated clustering structures further reveal that provinces that are geographically closer exhibit greater similarity in credit risk. This implies that the proposed methodology can effectively assist nascent financial institutions in identifying differences in risk factors across datasets and enhancing predictive accuracy.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"177 ","pages":"Article 106982"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054825000103","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/18 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Customer records include only customers in default (positive samples) and rejected customers (unlabeled samples), or positive and unlabeled (PU) data, which is a common scenario in emerging financial institutions. However, building credit scoring models using multiple small sample PU datasets with high dimensionality poses significant challenges, especially in light of the privacy constraints associated with transferring raw data. To tackle these challenges, this paper introduces a novel methodology called heterogeneous federated PU learning. This approach utilizes a fused penalty function to automatically divide coefficients into multiple clusters, while an efficient proximal gradient descent algorithm is introduced for model training, relying solely on gradients from local servers. Theoretical analysis establishes the oracle property of our proposed estimator. The simulation results show that, in terms of variable selection, parameter estimation, and prediction performance, our method is close to the Oracle estimator and outperforms the other alternatives. Empirical results indicate that our method can improve prediction performance and facilitate the identification of heterogeneity across datasets. Moreover, the estimated clustering structures further reveal that provinces that are geographically closer exhibit greater similarity in credit risk. This implies that the proposed methodology can effectively assist nascent financial institutions in identifying differences in risk factors across datasets and enhancing predictive accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种新的高效沟通的异构联邦正无标记学习方法
客户记录仅包括默认客户(阳性样本)和拒绝客户(未标记样本),或阳性和未标记(PU)数据,这是新兴金融机构中的常见场景。然而,使用多个具有高维的小样本PU数据集构建信用评分模型带来了重大挑战,特别是考虑到与传输原始数据相关的隐私限制。为了应对这些挑战,本文介绍了一种称为异构联邦PU学习的新方法。该方法利用融合惩罚函数自动将系数划分为多个聚类,同时引入有效的近端梯度下降算法进行模型训练,仅依赖于本地服务器的梯度。理论分析证实了该估计器的oracle性。仿真结果表明,在变量选择、参数估计和预测性能方面,我们的方法接近Oracle估计器,优于其他替代方法。实证结果表明,该方法可以提高预测性能,并有助于识别数据集之间的异质性。此外,估计的聚类结构进一步揭示了地理上更接近的省份在信用风险方面表现出更大的相似性。这意味着所提出的方法可以有效地帮助新兴金融机构识别数据集之间风险因素的差异,并提高预测的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers & Operations Research
Computers & Operations Research 工程技术-工程:工业
CiteScore
8.60
自引率
8.70%
发文量
292
审稿时长
8.5 months
期刊介绍: Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.
期刊最新文献
Resilience enhancement in power distribution systems: Chance-constrained model with decision-dependent atoms A parallel branch-and-bound-and-check algorithm for nesting The vehicle routing problem with driver scheduling Modeling and optimization of machine deterioration effects in low-carbon flexible job shop scheduling problems Data-driven interdiction with asymmetric cost uncertainty: A distributionally robust optimization approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1